linux-yocto

mirror of git://git.yoctoproject.org/linux-yocto.git synced 2025-08-21 16:31:14 +02:00

Author	SHA1	Message	Date
Ian Rogers	32559b99e0	perf test: Add set of perf record LBR tests Adds coverage for LBR operations and LBR callgraph. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240808054644.1286065-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 17:30:38 -03:00
Ian Rogers	599c19397b	perf callchain: Fix stitch LBR memory leaks The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps and map members are reference counted. Ensure these values use a _get routine to increment the reference counts and use map_symbol__exit() to release the reference counts. Do similar for 'struct thread's prev_lbr_cursor, but save the size of the prev_lbr_cursor array so that it may be iterated. Ensure that when stitch_nodes are placed on the free list the map_symbols are exited. Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to list_splice_init(), so the whole list is moved and nodes aren't leaked. A reproduction of the memory leaks is possible with a leak sanitizer build in the perf report command of: ``` $ perf record -e cycles --call-graph lbr perf test -w thloop $ perf report --stitch-lbr ``` Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Fixes: `ff165628d7` ("perf callchain: Stitch LBR call stack") Signed-off-by: Ian Rogers <irogers@google.com> [ Basic tests after applying the patch, repeating the example above ] Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 17:30:27 -03:00
Veronika Molnarova	37e2a19c98	perf test pmu: Set uninitialized PMU alias to null Commit `3e0bf9fde2` ("perf pmu: Restore full PMU name wildcard support") adds a test case "PMU cmdline match" that covers PMU name wildcard support provided by function perf_pmu__match(). The test works with a wide range of supported combinations of PMU name matching but omits the case that if the perf_pmu__match() cannot match the PMU name to the wildcard, it tries to match its alias. However, this variable is not set up, causing the test case to fail when run with subprocesses or to segfault if run as a single process. ./perf test -vv 9 9: Sysfs PMU tests : 9.1: Parsing with PMU format directory : Ok 9.2: Parsing with PMU event : Ok 9.3: PMU event names : Ok 9.4: PMU name combining : Ok 9.5: PMU name comparison : Ok 9.6: PMU cmdline match : FAILED! ./perf test -F 9 9.1: Parsing with PMU format directory : Ok 9.2: Parsing with PMU event : Ok 9.3: PMU event names : Ok 9.4: PMU name combining : Ok 9.5: PMU name comparison : Ok Segmentation fault (core dumped) Initialize the PMU alias to null for all tests of perf_pmu__match() as this functionality is not being tested and the alias matching works exactly the same as the matching of the PMU name. ./perf test -F 9 9.1: Parsing with PMU format directory : Ok 9.2: Parsing with PMU event : Ok 9.3: PMU event names : Ok 9.4: PMU name combining : Ok 9.5: PMU name comparison : Ok 9.6: PMU cmdline match : Ok Fixes: `3e0bf9fde2` ("perf pmu: Restore full PMU name wildcard support") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Radostin Stoyanov <rstoyano@redhat.com> Link: https://lore.kernel.org/r/20240808103749.9356-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 11:01:54 -03:00
Arnaldo Carvalho de Melo	2df5484bbf	perf tests ftrace: Add pattern check for time, count In 'perf ftrace profile sleep 0.1' we know that we'll have an specific kernel function that will take a bit more than 0.1 seconds and will take place just one time, so we can add a check for that so that we validate more than just the presence of some functions in the profile. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/ZrTBo7KACZeuCyLj@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 09:59:40 -03:00
Namhyung Kim	ed5bb548cc	perf test: Add a new shell test for perf ftrace $ sudo ./perf test ftrace -vv 86: perf ftrace tests: --- start --- test child forked, pid 1772223 perf ftrace list test syscalls for sleep: __x64_sys_nanosleep __ia32_sys_nanosleep __x64_sys_clock_nanosleep __ia32_sys_clock_nanosleep perf ftrace list test [Success] perf ftrace trace test # tracer: function_graph # # CPU DURATION FUNCTION CALLS # \| \| \| \| \| \| \| 0) \| __x64_sys_clock_nanosleep() { 0) \| common_nsleep() { 0) \| hrtimer_nanosleep() { 0) \| do_nanosleep() { perf ftrace trace test [Success] perf ftrace latency test target function: __x64_sys_clock_nanosleep # DURATION \| COUNT \| GRAPH \| 32 - 64 ms \| 1 \| ############################################## \| perf ftrace latency test [Success] perf ftrace profile test # Total (us) Avg (us) Max (us) Count Function 100136.400 100136.400 100136.400 1 __x64_sys_clock_nanosleep 100135.200 100135.200 100135.200 1 common_nsleep 100134.700 100134.700 100134.700 1 hrtimer_nanosleep 100133.700 100133.700 100133.700 1 do_nanosleep 100130.600 100130.600 100130.600 1 schedule 166.868 55.623 80.299 3 scheduler_tick 5.926 5.926 5.926 1 native_smp_send_reschedule 301.941 301.941 301.941 1 __x64_sys_execve 295.786 295.786 295.786 1 do_execveat_common.isra.0 71.397 35.699 46.403 2 bprm_execve 2.519 1.260 1.547 2 sched_mm_cid_before_execve 1.098 0.549 0.686 2 sched_mm_cid_after_execve perf ftrace profile test [Success] ---- end(0) ---- 86: perf ftrace tests : Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240808044954.1775333-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 09:41:35 -03:00
Namhyung Kim	90d78e7b8e	perf annotate-data: Show typedef names properly The die_get_typename() would resolve typedef and get to the original type. But sometimes the original type is a struct without name and it makes the output confusing and hard to read. This is a diff of perf report -s type before and after the change. New types such as atomic{,64}_t and sigset_t appeared and the portion of unnamed struct was reduced. Also u32, u64 and size_t were splitted from the base types. --- b 2024-08-01 17:02:34.307809952 -0700 +++ a 2024-08-07 14:17:05.245853999 -0700 - 2.40% long unsigned int + 2.26% long unsigned int - 1.56% unsigned int + 1.27% unsigned int - 0.98% struct - 0.79% long long unsigned int + 0.58% long long unsigned int + 0.36% struct + 0.27% atomic64_t + 0.22% u32 + 0.21% u64 + 0.19% atomic_t + 0.13% size_t - 0.08% struct seqcount_spinlock + 0.08% seqcount_spinlock_t + 0.08% sigset_t + 0.08% __poll_t Let's use the typedef name directly and the resolved to get the size of the type. Committer testing: root@x1:~# diff -u before after \| head -30 --- before 2024-08-08 09:35:13.917325041 -0300 +++ after 2024-08-08 09:37:35.312257905 -0300 @@ -10,25 +10,27 @@ # ........ ......... # 79.40% (unknown) - 2.28% union 1.96% (stack operation) - 1.24% struct + 1.87% pthread_mutex_t 0.99% u32[] - 0.92% unsigned int 0.77% struct task_struct + 0.75% U32 0.75% struct pcpu_hot 0.63% struct qspinlock + 0.61% atomic_t 0.59% struct list_head - 0.58% int 0.53% struct cfs_rq 0.51% BYTE* - 0.48% unsigned char + 0.48% BYTE 0.48% long unsigned int 0.46% struct rq 0.41% struct worker 0.41% struct memcg_vmstats_percpu + 0.41% pthread_cond_t 0.37% _Bool + 0.36% int root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 09:36:52 -03:00
Namhyung Kim	037f1b67e8	perf annotate: Cache debuginfo for data type profiling In find_data_type(), it creates and deletes a debug info whenver it tries to find data type for a sample. This is inefficient and it most likely accesses the same binary again and again. Let's add a single entry cache the debug info structure for the last DSO. Depending on sample data, it usually gives me 2~3x (and sometimes more) speed ups. Note that this will introduce a little difference in the output due to the order of checking stack operations. It used to check the stack ops before checking the availability of debug info but I moved it after the symbol check. So it'll report stack operations in DSOs without debug info as unknown. But I think it's ok and better to have the checking near the caching logic. Committer testing: root@x1:~# perf mem record -a sleep 5s root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# diff -u before after --- before 2024-08-08 09:33:53.880780784 -0300 +++ after 2024-08-08 09:35:13.917325041 -0300 @@ -81,8 +81,8 @@ # Overhead Data Type # ........ ......... # - 55.43% (unknown) - 11.61% (stack operation) + 55.56% (unknown) + 11.48% (stack operation) 4.93% struct pcpu_hot 3.26% unsigned int 2.48% struct Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 09:34:43 -03:00
Ian Rogers	b2f70c99ed	perf hist: Fix reference counting of branch_info iter_finish_branch_entry() doesn't put the branch_info from/to map elements creating memory leaks. This can be seen with: ``` $ perf record -e cycles -b perf test -w noploop $ perf report -D ... Direct leak of 984344 byte(s) in 123043 object(s) allocated from: #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x564d3400d10b in map__get util/map.h:186 #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981 #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151 #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898 #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238 #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334 #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655 #8 0x564d3403ba52 in do_flush util/ordered-events.c:245 #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324 #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708 #11 0x564d34032480 in perf_session__process_event util/session.c:1877 #12 0x564d340336ad in reader__read_event util/session.c:2399 #13 0x564d34033fdc in reader__process_events util/session.c:2448 #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495 #15 0x564d34033fdc in perf_session__process_events util/session.c:2661 #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065 #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805 #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350 #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403 #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447 #21 0x564d33cdd827 in main tools/perf/perf.c:561 ... ``` Clearing up the map_symbols properly creates maps reference count issues so resolve those. Resolving this issue doesn't improve peak heap consumption for the test above. Committer testing: $ sudo dnf install libasan $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-08 09:32:02 -03:00
Namhyung Kim	845295f400	tools/include: Sync filesystem headers with the kernel sources To pick up changes from: `0f9ca80fa4` fs: Add initial atomic write support info to statx `f9af549d1f` fs: export mount options via statmount() `0a3deb1185` fs: Allow listmount() in foreign mount namespace `09b31295f8` fs: export the mount ns id via statmount `d04bccd8c1` listmount: allow listing in reverse order `bfc69fd05e` fs/procfs: add build ID fetching to PROCMAP_QUERY API `ed5d583a88` fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps This should be used to beautify FS syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h diff -u tools/perf/trace/beauty/include/uapi/linux/stat.h include/uapi/linux/stat.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-08-07 10:59:07 -07:00
Namhyung Kim	ed86525f1f	tools/include: Sync network socket headers with the kernel sources To pick up changes from: `d25a92ccae` net/smc: Introduce IPPROTO_SMC `060f4ba6e4` io_uring/net: move charging socket out of zc io_uring `bb6aaf7366` net: Split a __sys_listen helper for io_uring `dc2e779794` net: Split a __sys_bind helper for io_uring This should be used to beautify socket syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-08-07 10:59:07 -07:00
Namhyung Kim	568901e709	tools/include: Sync uapi/asm-generic/unistd.h with the kernel sources And arch syscall tables to pick up changes from: `b1e31c134a` powerpc: restore some missing spu syscalls `d3882564a7` syscalls: fix compat_sys_io_pgetevents_time64 usage `54233a4254` uretprobe: change syscall number, again `63ded11097` uprobe: Change uretprobe syscall scope and number `9142be9e64` x86/syscall: Mark exit[_group] syscall handlers __noreturn `9aae1baa1c` x86, arm: Add missing license tag to syscall tables files `5c28424e9a` syscalls: Fix to add sys_uretprobe to syscall.tbl `190fec72df` uprobe: Wire up uretprobe system call This should be used to beautify syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-arch@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-08-07 10:58:51 -07:00
Namhyung Kim	b973500676	tools/include: Sync uapi/sound/asound.h with the kernel sources To pick up changes from: `f05c1ffc27` ALSA: pcm: reinvent the stream synchronization ID API This should be used to beautify sound syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: linux-sound@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-08-06 14:36:02 -07:00
Arnaldo Carvalho de Melo	37ce8a562a	Merge remote-tracking branch 'torvalds/master' into perf-tools-next To pick a patch that albeit being for tools/perf/ directory went thru a different tree and ended up breaking some recent tests introduced in the perf-tools-next tree to validate duplicate events in the JSON performance event files. Link: https://lore.kernel.org/lkml/ZrIqDMg7cBVhstYU@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-06 14:01:06 -03:00
Ian Rogers	4bd380390f	perf jevents.py: Ensure event names aren't duplicated Duplicate event names break invariants in 'perf list'. Assert that an event name isn't duplicated so that broken JSON won't build. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-06 10:37:12 -03:00
Ian Rogers	c4f74bb61a	perf pmu-events: Remove duplicated ampereone event OP_SPEC is repeated twice in the file which will break invariants in 'perf list' as discussed in this thread: https://lore.kernel.org/linux-perf-users/20240719081651.24853-1-eric.lin@sifive.com/ Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-06 10:35:27 -03:00
Ian Rogers	b79f9a437a	perf pmu-events: Change dependencies for empty-pmu-events.c test Switch from $? (all the prerequisites that are newer than the target) to $^ (all the prerequisites) as touching jevents.py will mean that empty-pmu-events.c won't be passed to the diff command breaking the build. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-06 10:35:27 -03:00
Ian Rogers	2576b20abd	perf test: Add build test for JEVENTS_ARCH=all Building with JEVENTS_ARCH=all builds all CPU types and allows things like assertions to check the validity of the input JSON. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-06 10:35:27 -03:00
Namhyung Kim	ce533c9bc6	perf annotate: Add --skip-empty option Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report. For example, the following command would use 3 events including dummy. $ perf mem record -a -- perf test -w noploop $ perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u Just using perf annotate with --group will show the all 3 events. $ perf annotate --group --stdio \| head Percent \| Source code & Disassembly of ... -------------------------------------------------------------- : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 0.00 : e060: pushq %rbp 0.00 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 0.00 : e064: pushq %r15 0.00 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 0.00 : e069: pushq %r14 0.00 0.00 0.00 : e06b: pushq %r13 0.00 0.00 0.00 : e06d: movl %edx, %r13d Now with --skip-empty, it'll hide the last dummy event. $ perf annotate --group --stdio --skip-empty \| head Percent \| Source code & Disassembly of ... ------------------------------------------------------ : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 : e060: pushq %rbp 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 : e064: pushq %r15 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 : e069: pushq %r14 0.00 0.00 : e06b: pushq %r13 0.00 0.00 : e06d: movl %edx, %r13d Committer testing: root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# Before: root@x1:~# perf annotate --group --stdio2 do_lookup_x \| head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 0.00 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# After: root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x \| head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-05 16:14:01 -03:00
Namhyung Kim	bb588e3829	perf annotate: Set al->data_nr using the notes->src->nr_events This is a preparation to support skipping empty events. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-05 16:13:18 -03:00
Namhyung Kim	b00e4d0d93	perf annotate: Use annotation__pcnt_width() consistently The annotation__pcnt_width() calculates the screen width for the overhead (percent) area considering event groups properly. Use this function consistently so that we can make sure it has similar output in different modes. But there's a difference in stdio and tui output: stdio uses 8 and tui uses 7 for a percent. Let's use 8 and adjust the print width in __annotation_line__write() properly. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-05 16:11:42 -03:00
Namhyung Kim	cb1e8bfc79	perf annotate: Set notes->src->nr_events early We want to use it in different places so make sure it sets properly in symbol__annotate() before creating the disasm lines. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-05 16:11:03 -03:00
Namhyung Kim	2dc02c2641	perf annotate: Use al->data_nr if possible The data_nr keeps the number of entries in al->data[] so it should use it when it iterates the array. The notes->src->nr_events should have the same number but it'd be natural to use al->data_nr. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-05 16:07:02 -03:00
Namhyung Kim	13159a139d	perf mem: Update documentation for new options Add a common options section and move some items to the section. Also add description of new options to report options. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/20240802180913.1023886-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-05 11:40:20 -03:00
Linus Torvalds	948752d2e0	RISC-V Fixes for 6.11-rc2 * A fix to avoid dropping some of the internal pseudo-extensions, which breaks envcfg dependency parsing. The kernel entry address is now aligned in purgatory, which avoids a misaligned load that can lead to crash on systems that don't support misaligned accesses early in boot. * The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of perf JSON configurations, one of them been updated to FW_SFENCE_VMA_ASID_SENT. * The starfive cache driver is now restricted to 64-bit systems, as it isn't 32-bit clean. * A fix for to avoid aliasing legacy-mode perf counters with software perf counters. * VM_FAULT_SIGSEGV is now handled in the page fault code. * A fix for stalls during CPU hotplug due to IPIs being disabled. * A fix for memblock bounds checking. This manifests as a crash on systems with discontinuous memory maps that have regions that don't fit in the linear map. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmas/qwTHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiWp7EACDcorcihBG8uSsX//GKJPjkiGIbZkT MIMN3yqIzJuSftxpvgVxpyq2MFKYy7BK/75sK+4VoQpoCJEtdxbdh0JUqck/Nrgj Kn0hxWy7RO6Rp9ggf9dTdca64Tdxh32Eegpum3E46zuhYQBMcNze4z4NsOXs6ems 254ww8+v7V5R7FGsxm1PG4Hs3soxZ9FPdWE69ndxmjr9N5FFkchk5YbV8AgKYtSJ sfu5Q+68zh58GVZhn0usug0fHNgVzdvwy3PIBDGD58hqIDAs9WlF80MiW3sESTIe PrJcAFBU4tHp+8h+OMaKw2xfybrZpNmqobx7dED34PJu0R4+Uvz7MUKMMPUJeB+q 7UOZokjF2Hvd5VsAeTc1PisvzVsWkWpkzJqZmdaTr2m8J4m5z7/nby+ZcXmoOlVz JiMDgrkM4KIziq++9bYbBfcxsS9dMsvNtEQAHByL/zdVfAFTvWUMUmAgg27C3K9Z QbHfbpxqQ/pEu4CsRUIx4GnkEKnWPLuGovnYboGmC3BCDwQkkV8H0tcEhJtWMKte 6h+vvKBX2POS4l8467ElmcTRv5Cfpi/dmhZrC9SHHQhNF5OiHHM2CmSEOKS1bUPj e4+k/QGmVQOAJGRRPkpD+DFMhHT/jhvbYV4kDXr/h9AKJQ2eWRGMSOMaPJ/X311N R5W1yiJilIhXuQ== =K52W -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix to avoid dropping some of the internal pseudo-extensions, which breaks envcfg dependency parsing - The kernel entry address is now aligned in purgatory, which avoids a misaligned load that can lead to crash on systems that don't support misaligned accesses early in boot - The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of perf JSON configurations, one of them been updated to FW_SFENCE_VMA_ASID_SENT - The starfive cache driver is now restricted to 64-bit systems, as it isn't 32-bit clean - A fix for to avoid aliasing legacy-mode perf counters with software perf counters - VM_FAULT_SIGSEGV is now handled in the page fault code - A fix for stalls during CPU hotplug due to IPIs being disabled - A fix for memblock bounds checking. This manifests as a crash on systems with discontinuous memory maps that have regions that don't fit in the linear map tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix linear mapping checks for non-contiguous memory regions RISC-V: Enable the IPI before workqueue_online_cpu() riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error() perf: riscv: Fix selecting counters in legacy mode cache: StarFive: Require a 64-bit system perf arch events: Fix duplicate RISC-V SBI firmware event name riscv/purgatory: align riscv_kernel_entry riscv: cpufeature: Do not drop Linux-internal extensions	2024-08-02 09:33:35 -07:00
Namhyung Kim	7320ad9725	perf mem: Add -T/--data-type option to report subcommand This is just a shortcut to have 'type' in the sort key and use more compact output format like below. $ perf mem report -T ... # # Overhead Samples Memory access Snoop TLB access Data Type # ........ ............ ....................................... ............ ...................... ......... # 14.84% 22 L1 hit None L1 or L2 hit (unknown) 7.68% 8 LFB/MAB hit None L1 or L2 hit (unknown) 7.17% 3 RAM hit Hit L2 miss (unknown) 6.29% 12 L1 hit None L1 or L2 hit (stack operation) 4.85% 5 RAM hit Hit L1 or L2 hit (unknown) 3.97% 5 LFB/MAB hit None L1 or L2 hit struct psi_group_cpu 3.18% 3 LFB/MAB hit None L1 or L2 hit (stack operation) 2.58% 3 L1 hit None L1 or L2 hit unsigned int 2.36% 2 L1 hit None L1 or L2 hit struct 2.31% 2 L1 hit None L1 or L2 hit struct psi_group_cpu ... Users also can use their own sort keys and -T option makes sure it has the 'type' sort key at the end. $ perf mem report -T -s mem Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Namhyung Kim	2d99a99133	perf mem: Add -s/--sort option So that users can set the sort key manually as they want. $ perf mem report -s Error: switch `s' requires a value Usage: perf mem report [<options>] -s, --sort <key[,key2...]> sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period weight1 weight2 weight3 ins_lat retire_lat p_stage_cyc pid comm dso symbol parent cpu socket srcline srcfile local_weight weight transaction trace symbol_size dso_size cgroup cgroup_id ipc_null time code_page_size local_ins_lat ins_lat local_p_stage_cyc p_stage_cyc addr local_retire_lat retire_lat simd type typeoff symoff symbol_daddr dso_daddr locked tlb mem snoop dcacheline symbol_iaddr phys_daddr data_page_size blocked Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Namhyung Kim	871893d748	perf tools: Add mode argument to sort_help() Some sort keys are meaningful only in a specific mode - like branch stack and memory (data-src). Add the mode to skip unnecessary ones. This will be used for 'perf mem report' later. While at it, change the prefix for the -F/--fields option to remove the duplicate part. Before: $ perf report -F Error: switch `F' requires a value Usage: perf report [<options>] -F, --fields <key[,keys...]> output field(s): overhead period sample overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period weight1 weight2 weight3 ins_lat retire_lat ... After: $ perf report -F Error: switch `F' requires a value Usage: perf report [<options>] -F, --fields <key[,keys...]> output field(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period weight1 weight2 weight3 ins_lat retire_lat ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Namhyung Kim	35b38a71c9	perf mem: Rework command option handling Split the common option and ones for record or report. Otherwise -U in the record option cannot be used because it clashes with in the common (or report) option. Also rename report_events() to __cmd_report() to follow the convention and to be sync with the record part. Also set the flag PARSE_OPT_STOP_AT_NON_OPTION for the common option so that it can show the help message in the subcommand like below: $ perf mem record -h Usage: perf mem record [<options>] [<command>] or: perf mem record [<options>] -- <command> [<options>] -C, --cpu <cpu> list of cpus to profile -e, --event <event> event selector. use 'perf mem record -e list' to list available events -f, --force don't complain, do it -K, --all-kernel collect only kernel level data -p, --phys-data Record/Report sample physical addresses -t, --type <type> memory operations(load,store) Default load,store -U, --all-user collect only user level data -v, --verbose be more verbose (show counter open errors, etc) --data-page-size Record/Report sample data address page size --ldlat <n> mem-loads latency Cc: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Namhyung Kim	3da209bb11	perf mem: Free the allocated sort string, fixing a leak The get_sort_order() returns either a new string (from strdup) or NULL but it never gets freed. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Fixes: `2e7f545096` ("perf mem: Factor out a function to generate sort order") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-3-namhyung@kernel.org [ Added Fixes tag ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Namhyung Kim	96465e0179	perf hist: Correct hist_entry->mem_info refcounts The 'struct mem_info' is created by iter_prepare_mem_entry() at the beginning and destroyed by iter_finish_mem_entry() at the end. So if it's used in a new hist_entry, it should be cloned. Simplify (hopefully) the logic by adding some helper functions and by not holding the refcount in the temporary entry. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Ian Rogers	7c5dd51bbb	perf python: Remove PYTHON_PERF ifdefs When perf code was compiled one way for the binary and another for the python module, the PYTHON_PERF ifdef was used to remove some code from the python module. Since switching to building the perf code as a series of libraries, with the same libraries being used for the python module, the ifdefs became unused as PYTHON_PERF is never defined. As such remove the ifdefs. Fixes: `9dabf40034` ("perf python: Switch module to linking libraries from building source") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240731230005.12295-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Ian Rogers	0fe881f10c	perf jevents: Autogenerate empty-pmu-events.c empty-pmu-events.c exists so that builds may occur without python being installed on a system. Manually updating empty-pmu-events.c to be in sync with jevents.py is a pain, let's use jevents.py to generate empty-pmu-events.c. 1) change jevents.py so that an arch and model of none cause generation of a pmu-events.c without any json. Add a SPDX and autogenerated warning to the start of the file. 2) change Build so that if a generated pmu-events.c for arch none and model none doesn't match empty-pmu-events.c the build fails with a cat of the differences. Update Makefile.perf to clean up the files used for this. 3) update empty-pmu-events.c to match the output of jevents.py with arch and mode of none. Committer notes: The firtst paragraph is confusing, so I asked and Ian further clarified: --- The requirement for python hasn't changed. Case 1: no python or NO_JEVENTS=1 Build happens using empty-pmu-events.c that is checked in, no python is required. Case 2: python pmu-events.c is created by jevents.py (requiring python) and then built. This change adds a step where the empty-pmu-events.c is created using jevents.py and that file is diffed against the checked in version. This stops the checked in empty-pmu-events.c diverging if changes are made to jevents.py. If the diff causes the build to fail then you just copy the diff empty-pmu-events.c over the checked in one. --- Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Sang <oliver.sang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philip Li <philip.li@intel.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240730191744.3097329-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:55:55 -03:00
Arnaldo Carvalho de Melo	ea59b70a84	perf bpf: Move BPF disassembly routines to separate file to avoid clash with capstone bpf headers There is a clash of the libbpf and capstone libraries, that ends up with: In file included from /usr/include/capstone/capstone.h:325, from util/disasm.c:1513: /usr/include/capstone/bpf.h:94:14: error: ‘bpf_insn’ defined as wrong kind of tag 94 \| typedef enum bpf_insn { So far we're just trying to avoid this by not having both headers included in the same .c or .h file, do it one more time by moving the BPF diassembly routines from util/disasm.c to util/disasm_bpf.c. This is only being hit when building with BUILD_NONDISTRO=1, i.e. building with binutils-devel, that isn't the in the default build due to a licencing clash. We need to reimplement what is now isolated in util/disasm_bpf.c using some other library to have BPF annotation feature that now only is available with BUILD_NONDISTRO=1. Fixes: `6d17edc113` ("perf annotate: Use libcapstone to disassemble") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZqpUSKPxMwaQKORr@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 18:54:19 -03:00
Namhyung Kim	9cb3549b73	perf test: Update sample filtering test Now it can run the BPF filtering test with normal user if the BPF objects are pinned by 'sudo perf record --setup-filter pin'. Let's update the test case to verify the behavior. It'll skip the test if the filter check is failed from a normal user, but it shows a message how to set up the filters. First, run the test as a normal user and it fails. $ perf test -vv filtering 95: perf record sample filtering (by BPF) tests: --- start --- test child forked, pid 425677 Checking BPF-filter privilege try 'sudo perf record --setup-filter pin' first. <<<--- here bpf-filter test [Skipped permission] ---- end(-2) ---- 95: perf record sample filtering (by BPF) tests : Skip According to the message, run the perf record command to pin the BPF objects. $ sudo perf record --setup-filter pin And re-run the test as a normal user. $ perf test -vv filtering 95: perf record sample filtering (by BPF) tests: --- start --- test child forked, pid 424486 Checking BPF-filter privilege Basic bpf-filter test Basic bpf-filter test [Success] Failing bpf-filter test Error: task-clock event does not have PERF_SAMPLE_CPU Failing bpf-filter test [Success] Group bpf-filter test Error: task-clock event does not have PERF_SAMPLE_CPU Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE Group bpf-filter test [Success] ---- end(0) ---- 95: perf record sample filtering (by BPF) tests : Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Namhyung Kim	3dee4b83a6	perf record: Add --setup-filter option To allow BPF filters for unprivileged users it needs to pin the BPF objects to BPF-fs first. Let's add a new option to pin and unpin the objects easily. I'm not sure 'perf record' is a right place to do this but I don't have a better idea right now. $ sudo perf record --setup-filter pin The above command would pin BPF program and maps for the filter when the system has BPF-fs (usually at /sys/fs/bpf/). To unpin the objects, users can run the following command (as root). $ sudo perf record --setup-filter unpin Committer testing: root@number:~# perf record --setup-filter pin root@number:~# ls -la /sys/fs/bpf/perf_filter/ total 0 drwxr-xr-x. 2 root root 0 Jul 31 10:43 . drwxr-xr-t. 3 root root 0 Jul 31 10:43 .. -rw-rw-rw-. 1 root root 0 Jul 31 10:43 dropped -rw-rw-rw-. 1 root root 0 Jul 31 10:43 filters -rwxrwxrwx. 1 root root 0 Jul 31 10:43 perf_sample_filter -rw-rw-rw-. 1 root root 0 Jul 31 10:43 pid_hash -rw-------. 1 root root 0 Jul 31 10:43 sample_f_rodata root@number:~# ls -la /sys/fs/bpf/perf_filter/perf_sample_filter -rwxrwxrwx. 1 root root 0 Jul 31 10:43 /sys/fs/bpf/perf_filter/perf_sample_filter root@number:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Namhyung Kim	73bf63a475	perf record: Fix a potential error handling issue The evlist is allocated at the beginning of cmd_record(). Also free-ing thread masks should be paired with record__init_thread_masks() which is called right before __cmd_record(). Let's change the order of these functions to release the resources correctly in case of errors. This is maybe fine as the process exits, but it might be a problem if it manages some system-wide resources that live longer than the process. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Namhyung Kim	1ec6fd34e0	perf bpf-filter: Support separate lost counts for each filter As the BPF filter is shared between other processes, it should have its own counter for each invocation. Add a new array map (lost_count) to save the count using the same index as the filter. It should clear the count before running the filter. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Namhyung Kim	0715f65e94	perf bpf-filter: Support pin/unpin BPF object And use the pinned objects for unprivileged users to profile their own tasks. The BPF objects need to be pinned in the BPF-fs by root first and it'll be handled in the later patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Namhyung Kim	eb1693b115	perf bpf-filter: Split per-task filter use case If the target is a list of tasks, it can use a shared hash map for filter expressions. The key of the filter map is an integer index like in an array. A separate pid_hash map is added to get the index for the filter map using the tgid. For system-wide mode including per-cpu or per-user targets are handled by the single entry map like before. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Namhyung Kim	966854e72f	perf bpf-filter: Pass 'target' to perf_bpf_filter__prepare() This is needed to prepare target-specific actions in the later patch. We want to reuse the pinned BPF program and map for regular users to profile their own processes. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Namhyung Kim	edb08cdd10	perf bpf-filter: Make filters map a single entry hashmap And the value is now an array. This is to support multiple filter entries in the map later. No functional changes intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Ian Rogers	0f2c0400b5	perf jevents: Use name for special find value (PMU_EVENTS__NOT_FOUND) -1000 was used as a special value added in Commit `3d5045492a` ("perf pmu-events: Add pmu_events_table__find_event()") to show that 1 table lacked a PMU/event but that didn't terminate the search in other tables. Add a new constant PMU_EVENTS__NOT_FOUND for this value and use it. Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Sang <oliver.sang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philip Li <philip.li@intel.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240730191744.3097329-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Tiezhu Yang	b48543c451	perf list: Give clues if failed to open tracing events directory When executing the command "perf list", I met "Error: failed to open tracing events directory" twice, the first reason is that there is no "/sys/kernel/tracing/events" directory due to it does not enable the kernel tracing infrastructure with CONFIG_FTRACE, the second reason is that there is no root privileges. Add the error string to tell the users what happened and what should to do, and also call put_tracing_file() to free events_path a little later to avoid messy code in the error message. At the same time, just remove the redundant "/" of the file path in the function get_tracing_file(), otherwise it shows something like "/sys/kernel/tracing//events". Before: $ ./perf list Error: failed to open tracing events directory After: (1) Without CONFIG_FTRACE $ ./perf list Error: failed to open tracing events directory /sys/kernel/tracing/events: No such file or directory (2) With CONFIG_FTRACE but no root privileges $ ./perf list Error: failed to open tracing events directory /sys/kernel/tracing/events: Permission denied Committer testing: Redirect stdout to null to quickly test the patch: Before: $ perf list > /dev/null Error: failed to open tracing events directory $ After: $ perf list > /dev/null Error: failed to open tracing events directory /sys/kernel/tracing/events: Permission denied $ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20240730062301.23244-3-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Tiezhu Yang	839b1832e6	perf tools: Fix wrong message when running "make JOBS=1" There is only one job when running "make JOBS=1", it should print "sequential build" rather than "parallel build". Before: $ cd tools/perf && make JOBS=1 BUILD: Doing 'make -j1' parallel build After: $ cd tools/perf && make JOBS=1 BUILD: Doing 'make -j1' sequential build Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20240730062301.23244-2-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:33 -03:00
Leo Yan	1635bdca4b	perf arm-spe: Support multiple Arm SPE events As the flag 'auxtrace' has been set for Arm SPE events, now it is ready to use evsel__is_aux_event() to check if an event is AUX trace event or not. Use this function to replace the old checking for only the first Arm SPE event. Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: <coresight@lists.linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: <linux-perf-users@vger.kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:32 -03:00
Leo Yan	ccd6fcda25	perf arm-spe: Extract evsel setting up The evsel for Arm SPE PMU needs to be set up. Extract the setting up into a function arm_spe_setup_evsel(). Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: <coresight@lists.linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: <linux-perf-users@vger.kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-08-01 12:11:32 -03:00
Eric Lin	63ba5b0fb4	perf arch events: Fix duplicate RISC-V SBI firmware event name Currently, the RISC-V firmware JSON file has duplicate event name "FW_SFENCE_VMA_RECEIVED". According to the RISC-V SBI PMU extension[1], the event name should be "FW_SFENCE_VMA_ASID_SENT". Before this patch: $ perf list firmware: fw_access_load [Load access trap event. Unit: cpu] fw_access_store [Store access trap event. Unit: cpu] .... fw_set_timer [Set timer event. Unit: cpu] fw_sfence_vma_asid_received [Received SFENCE.VMA with ASID request from other HART event. Unit: cpu] fw_sfence_vma_received [Sent SFENCE.VMA with ASID request to other HART event. Unit: cpu] After this patch: $ perf list firmware: fw_access_load [Load access trap event. Unit: cpu] fw_access_store [Store access trap event. Unit: cpu] ..... fw_set_timer [Set timer event. Unit: cpu] fw_sfence_vma_asid_received [Received SFENCE.VMA with ASID request from other HART event. Unit: cpu] fw_sfence_vma_asid_sent [Sent SFENCE.VMA with ASID request to other HART event. Unit: cpu] fw_sfence_vma_received [Received SFENCE.VMA request from other HART event. Unit: cpu] Link: https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-pmu.adoc#event-firmware-events-type-15 [1] Fixes: `8f0dcb4e73` ("perf arch events: riscv sbi firmware std event files") Fixes: `c4f769d409` ("perf vendor events riscv: add Sifive U74 JSON file") Fixes: `acbf6de674` ("perf vendor events riscv: Add StarFive Dubhe-80 JSON file") Fixes: `7340c6df49` ("perf vendor events riscv: add T-HEAD C9xx JSON file") Fixes: `f5102e31c2` ("riscv: andes: Support specifying symbolic firmware and hardware raw event") Signed-off-by: Eric Lin <eric.lin@sifive.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Nikita Shubin <n.shubin@yadro.com> Reviewed-by: Inochi Amaoto <inochiama@outlook.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240719115018.27356-1-eric.lin@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-08-01 07:14:45 -07:00
Weilin Wang	4ed0f392e7	perf test: make metric validation test return early when there is no metric supported on the test system Add a check to return the metric validation test early when perf list metric does not output any metric. This would happen when NO_JEVENTS=1 is set or in a system that there is no metric supported. Signed-off-by: Weilin Wang <weilin.wang@intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/lkml/20240522204254.1841420-1-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Namhyung Kim	74ae366c37	perf ftrace profile: Add -s/--sort option The -s/--sort option is to sort the output by given column. $ sudo perf ftrace profile -s max sync \| head # Total (us) Avg (us) Max (us) Count Function 6301.811 6301.811 6301.811 1 __do_sys_sync 6301.328 6301.328 6301.328 1 ksys_sync 5320.300 1773.433 2858.819 3 iterate_supers 2755.875 17.012 2610.633 162 sync_fs_one_sb 2728.351 682.088 2610.413 4 ext4_sync_fs [ext4] 2603.654 2603.654 2603.654 1 jbd2_log_wait_commit [jbd2] 4750.615 593.827 2597.427 8 schedule 2164.986 26.728 2115.673 81 sync_inodes_one_sb 2143.842 26.467 2115.438 81 sync_inodes_sb Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Namhyung Kim	0f223813ed	perf ftrace: Add 'profile' command The 'perf ftrace profile' command is to get function execution profiles using function-graph tracer so that users can see the total, average, max execution time as well as the number of invocations easily. The following is a profile for the perf_event_open syscall. $ sudo perf ftrace profile -G __x64_sys_perf_event_open -- \ perf stat -e cycles -C1 true 2> /dev/null \| head # Total (us) Avg (us) Max (us) Count Function 65.611 65.611 65.611 1 __x64_sys_perf_event_open 30.527 30.527 30.527 1 anon_inode_getfile 30.260 30.260 30.260 1 __anon_inode_getfile 29.700 29.700 29.700 1 alloc_file_pseudo 17.578 17.578 17.578 1 d_alloc_pseudo 17.382 17.382 17.382 1 __d_alloc 16.738 16.738 16.738 1 kmem_cache_alloc_lru 15.686 15.686 15.686 1 perf_event_alloc 14.012 7.006 11.264 2 obj_cgroup_charge # Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Namhyung Kim	608585f43f	perf ftrace: Factor out check_ftrace_capable() The check is a common part of the ftrace commands, let's move it out. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Namhyung Kim	c77800894b	perf ftrace: Add 'tail' option to --graph-opts The 'graph-tail' option is to print function name as a comment at the end. This is useful when a large function is mixed with other functions (possibly from different CPUs). For example, $ sudo perf ftrace -- perf stat true ... 1) \| get_unused_fd_flags() { 1) \| alloc_fd() { 1) 0.178 us \| _raw_spin_lock(); 1) 0.187 us \| expand_files(); 1) 0.169 us \| _raw_spin_unlock(); 1) 1.211 us \| } 1) 1.503 us \| } $ sudo perf ftrace --graph-opts tail -- perf stat true ... 1) \| get_unused_fd_flags() { 1) \| alloc_fd() { 1) 0.099 us \| _raw_spin_lock(); 1) 0.083 us \| expand_files(); 1) 0.081 us \| _raw_spin_unlock(); 1) 0.601 us \| } /* alloc_fd / 1) 0.751 us \| } / get_unused_fd_flags */ Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Dr. David Alan Gilbert	156e8dcfec	perf test pmu: Remove unused test_pmus Commit `aa1551f299` ("perf test pmu: Refactor format test and exposed test APIs") added the 'test_pmus' list, but didn't use it. (It seems to put them on the other_pmus list?) Remove it. Fixes: `aa1551f299` ("perf test pmu: Refactor format test and exposed test APIs") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ian Rogers <irogers@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20240727175919.1041468-1-linux@treblig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Adrian Hunter	feab89bf99	perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF evsel__is_aux_event() identifies AUX area tracing selected events. S390_CPUMSF uses a raw event type (PERF_TYPE_RAW - refer s390_cpumsf_evsel_is_auxtrace()) not a PMU type value that could be checked in evsel__is_aux_event(). However it sets needs_auxtrace_mmap (refer auxtrace_record__init()), so check that first. Currently, the features that use evsel__is_aux_event() are used only by Intel PT, but that may change in the future. Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240715160712.127117-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Adrian Hunter	c91928a8d5	perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64 Set pmu->auxtrace on ARM/ARM64 AUX area PMUs. evsel__is_aux_event() needs the setting to identify AUX area tracing selected events. Currently, the features that use evsel__is_aux_event() are used only by Intel PT, but that may change in the future. Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240715160712.127117-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
James Clark	ae8e4f4048	perf scripts python cs-etm: Restore first sample log in verbose mode The linked commit moved the early return on the first sample to before the verbose log, so move the log earlier too. Now the first sample is also logged and not skipped. Fixes: `2d98dbb4c9` ("perf scripts python arm-cs-trace-disasm.py: Do not ignore disam first sample") Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: coresight@lists.linaro.org Cc: gankulkarni@os.amperecomputing.com Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240723132858.12747-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
James Clark	4194744602	perf cs-etm: Output 0 instead of 0xdeadbeef when exception packets are flushed Normally exception packets don't directly output a branch sample, but if they're the last record in a buffer then they will. Because they don't have addresses set we'll see the placeholder value CS_ETM_INVAL_ADDR (0xdeadbeef) in the output. Since commit `6035b6804b` ("perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet") we've used 0 as an externally visible "not set" address value. For consistency reasons and to not make exceptions look like an error, change them to use 0 too. This is particularly visible when doing userspace only tracing because trace is disabled when jumping to the kernel, causing the flush and then forcing the last exception packet to be emitted as a branch. With kernel trace included, there is no flush so exception packets don't generate samples until the next range packet and they'll pick up the correct address. Before: $ perf record -e cs_etm//u -- stress -i 1 -t 1 $ perf script -F comm,ip,addr,flags stress syscall ffffb7eedbc0 => deadbeefdeadbeef stress syscall ffffb7f14a14 => deadbeefdeadbeef stress syscall ffffb7eedbc0 => deadbeefdeadbeef After: stress syscall ffffb7eedbc0 => 0 stress syscall ffffb7f14a14 => 0 stress syscall ffffb7eedbc0 => 0 Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: gankulkarni@os.amperecomputing.com Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240722152756.59453-2-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Chen Ni	496cae1b33	perf inject: Convert comma to semicolon Replace a comma between expression statements by a semicolon. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240716075347.969041-1-nichen@iscas.ac.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Chen Ni	e60fc19eab	perf daemon: Convert comma to semicolon Replace a comma between expression statements by a semicolon. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240716074340.968909-1-nichen@iscas.ac.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Chen Ni	050f2a03aa	perf annotate: Convert comma to semicolon Replace a comma between expression statements by a semicolon. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240716073405.968801-1-nichen@iscas.ac.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:58:18 -03:00
Kajol Jain	42d37fc0c8	perf vendor events power10: Update JSON/events Update JSON/events for power10 platform with additional events. Also move PM_VECTOR_LD_CMPL event from others.json to frontend.json file. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: hbathini@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20240723052154.96202-1-kjain@linux.ibm.com [ Remove alternative to ' char that made the build break in some distros with a unicode parsing python error ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:53:17 -03:00
Athira Rajeev	2c9db7475e	perf annotate: Set instruction name to be used with insn-stat when using raw instruction Since the "ins.name" is not set while using raw instruction, 'perf annotate' with insn-stat gives wrong data: Result from "./perf annotate --data-type --insn-stat": Annotate Instruction stats total 615, ok 419 (68.1%), bad 196 (31.9%) Name : Good Bad ----------------------------------------------------------- : 419 196 This patch sets "dl->ins.name" in arch specific function "check_ppc_insn" while initialising "struct disasm_line". Also update "ins_find" function to pass "struct disasm_line" as a parameter so as to set its name field in arch specific call. With the patch changes: Annotate Instruction stats total 609, ok 446 (73.2%), bad 163 (26.8%) Name/opcode : Good Bad ----------------------------------------------------------- 58 : 323 80 32 : 49 43 34 : 33 11 OP_31_XOP_LDX : 8 20 40 : 23 0 OP_31_XOP_LWARX : 5 1 OP_31_XOP_LWZX : 2 3 OP_31_XOP_LDARX : 3 0 33 : 0 2 OP_31_XOP_LBZX : 0 1 OP_31_XOP_LWAX : 0 1 OP_31_XOP_LHZX : 0 1 Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-16-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	c5d60de181	perf annotate: Add support to use libcapstone in powerpc Now perf uses the capstone library to disassemble the instructions in x86. capstone is used (if available) for perf annotate to speed up. Currently it only supports x86 architecture. This patch includes changes to enable this in powerpc. For now, only for data type sort keys, this method is used and only binary code (raw instruction) is read. This is because powerpc approach to understand instructions and reg fields uses raw instruction. The "cs_disasm" is currently not enabled. While attempting to do cs_disasm, observation is that some of the instructions were not identified (ex: extswsli, maddld) and it had to fallback to use objdump. Hence enabling "cs_disasm" is added in comment section as a TODO for powerpc. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-15-atrajeev@linux.vnet.ibm.com [ Use dso__nsinfo(dso) as required to match EXTRA_CFLAGS=-DREFCNT_CHECKING=1 build expectations ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	f1e9347c85	perf annotate: Use capstone_init and remove open_capstone_handle from disasm.c capstone_init is made availbale for all archs to use and updated to enable support for CS_ARCH_PPC as well. Patch removes open_capstone_handle and uses capstone_init in all the places. Committer notes: Avoid including capstone/capstone.h from print_insn.h to not break the build in builtin-script.c due to the namespace clash with libbpf: /usr/include/capstone/bpf.h:94:14: error: 'bpf_insn' defined as wrong kind of tag Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-14-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	1fe86bc245	perf annotate: Make capstone_init non-static so that it can be used during symbol disassemble symbol__disassemble_capstone in util/disasm.c calls function open_capstone_handle to open/init the capstone. We already have a capstone_init function in "util/print_insn.c". But capstone_init is defined as a static function in util/print_insn.c. Change this and also add the function in print_insn.h The open_capstone_handle checks the disassembler_style option from annotation_options to decide whether to set CS_OPT_SYNTAX_ATT. Add that logic in capstone_init also and by default set it to true. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-13-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	88444952bd	perf annotate: Update instruction tracking for powerpc Add instruction tracking function "update_insn_state_powerpc" for powerpc. Example sequence in powerpc: ld r10,264(r3) mr r31,r3 <<after some sequence> ld r9,312(r31) Consider ithe sample is pointing to: "ld r9,312(r31)". Here the memory reference is hit at "312(r31)" where 312 is the offset and r31 is the source register. Previous instruction sequence shows that register state of r3 is moved to r31. So to identify the data type for r31 access, the previous instruction ("mr") needs to be tracked and the state type entry has to be updated. Current instruction tracking support in perf tools infrastructure is specific to x86. Patch adds this support for powerpc as well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-12-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	539bfea3e0	perf annotate: Add more instructions for instruction tracking Add few more instructions and use opcode as search key to find if it is supported by the architecture. The added ones are: addi, addic, addic., addis, subfic and mulli Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-11-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	cd0b6f67c4	perf annotate: Add some of the arithmetic instructions to support instruction tracking in powerpc Data-type profiling has the concept of instruction tracking. Example sequence in powerpc: ld r10,264(r3) mr r31,r3 <<after some sequence> ld r9,312(r31) or differently lwz r10,264(r3) add r31, r3, RB lwz r9, 0(r31) If a sample is hit at "lwz r9, 0(r31)", data type of r31 depends on previous instruction sequence here. So to track the previous instructions, patch adds changes to identify some of the arithmetic instructions which are having opcode as 31. Since memory instructions also has cases with opcode 31, use the bits 22:30 to filter the arithmetic instructions here. Also there are instructions with just two operands like "addme", "addze". This patch adds new instructions ops "arithmetic_ops" to handle this Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-10-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	ace7d681d8	perf annotate: Add support to identify memory instructions of opcode 31 in powerpc There are memory instructions in powerpc with opcode as 31. Example: "ldx RT,RA,RB" , Its X form is as below: ______________________________________ \| 31 \| RT \| RA \| RB \| 21 \|/\| -------------------------------------- 0 6 11 16 21 30 31 The opcode for "ldx" is 31. There are other instructions also with opcode 31 which are memory insn like ldux, stbx, lwzx, lhaux But all instructions with opcode 31 are not memory. Example is add instruction: "add RT,RA,RB" The value in bit 21-30 [ 21 for ldx ] is different for these instructions. Patch uses this value to assign instruction ops for these cases. The naming convention and value to identify these are picked from defines in "arch/powerpc/include/asm/ppc-opcode.h" Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-9-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	1acdad6818	perf annotate: Add parse function for memory instructions in powerpc Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset. The implementation addresses the D-form, X-form, DS-form instructions. Two main functions are added. New parse function "load_store__parse" as instruction ops parser for memory instructions. Unlike other parsers (like mov__parse), this one fills in the "multi_regs" field for source/target and new added "mem_ref" field. No other fields are set because, here there is no need to parse the disassembled code and arch specific macros will take care of extracting offset and regs which is easier and will be precise. In powerpc, all instructions with a primary opcode from 32 to 63 are memory instructions. Update "ins__find" function to have "raw_insn" also as a parameter. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-8-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	1b4406d2a8	perf annotate: Update parameters for reg extract functions to use raw instruction on powerpc Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset. The implementation addresses the D-form, X-form, DS-form instructions. Adds "mem_ref" field to check whether source/target has memory reference. Add function "get_powerpc_regs" which will set these fields: reg1, reg2, offset depending of where it is source or target ops. Update "parse" callback for "struct ins_ops" to also pass "struct disasm_line" as argument. This is needed in parse functions where opcode is used to determine whether to set multi_regs and other fields Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-7-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	0b971e6bf1	perf annotate: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility Add support to capture and parse raw instruction in powerpc. Currently, the perf tool infrastructure uses two ways to disassemble and understand the instruction. One is objdump and other option is via libcapstone. Currently, the perf tool infrastructure uses "--no-show-raw-insn" option with "objdump" while disassemble. Example from powerpc with this option for an instruction address is: Snippet from: objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux> c0000000010224b4: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset. Also to find whether there is a memory reference in the operands, "memory_ref_char" field of objdump is used. For x86, "(" is used as memory_ref_char to tackle instructions of the form "mov (%rax), %rcx". In case of powerpc, not all instructions using "(" are the only memory instructions. Example, above instruction can also be of extended form (X form) "lwzx r10,0,r19". Inorder to easy identify the instruction category and extract the source/target registers, patch adds support to use raw instruction for powerpc. Approach used is to read the raw instruction directly from the DSO file using "dso__data_read_offset" utility which is already implemented in perf infrastructure in "util/dso.c". Example: 38 01 81 e8 ld r4,312(r1) Here "38 01 81 e8" is the raw instruction representation. In powerpc, this translates to instruction form: "ld RT,DS(RA)" and binary code as: \| 58 \| RT \| RA \| DS \| \| ------------------------------------- 0 6 11 16 30 31 Function "symbol__disassemble_dso" is updated to read raw instruction directly from DSO using dso__data_read_offset utility. In case of above example, this captures: line: 38 01 81 e8 The above works well when 'perf report' is invoked with only sort keys for data type ie type and typeoff. Because there is no instruction level annotation needed if only data type information is requested for. For annotating sample, along with type and typeoff sort key, "sym" sort key is also needed. And by default invoking just "perf report" uses sort key "sym" that displays the symbol information. With approach changes in powerpc which first reads DSO for raw instruction, "perf annotate" and "perf report" + a key breaks since it doesn't do the instruction level disassembly. Snippet of result from 'perf report': Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238 do_work /usr/bin/pmlogger [Percent: local period] Percent│ ea230010 │ 3a550010 │ 3a600000 │ 38f60001 │ 39490008 │ 42400438 51.44 │ 81290008 │ 7d485378 Here, raw instruction is displayed in the output instead of human readable annotated form. One way to get the appropriate data is to specify "--objdump path", by which code annotation will be done. But the default behaviour will be changed. To fix this breakage, check if "sym" sort key is set. If so fallback and use the libcapstone/objdump way of disassmbling the sample. With the changes and "perf report" Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238 do_work /usr/bin/pmlogger [Percent: local period] Percent│ ld r17,16(r3) │ addi r18,r21,16 │ li r19,0 │ 8b0: rldicl r10,r10,63,33 │ addi r10,r10,1 │ mtctr r10 │ ↓ b 8e4 │ 8c0: addi r7,r22,1 │ addi r10,r9,8 │ ↓ bdz d00 51.44 │ lwz r9,8(r9) │ mr r8,r10 │ cmpw r20,r9 Committer notes: Just add the extern for 'sort_order' in disasm.c so that we don't end up breaking the build due to this type colision with capstone and libbpf: In file included from /usr/include/capstone/capstone.h:325, from /git/perf-6.10.0/tools/perf/util/print_insn.h:23, from builtin-script.c:38: /usr/include/capstone/bpf.h:94:14: error: 'bpf_insn' defined as wrong kind of tag 94 \| typedef enum bpf_insn { I reported this to the bpf mailing list, see one of the links below. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-6-atrajeev@linux.vnet.ibm.com Link: https://lore.kernel.org/bpf/ZqOltPk9VQGgJZAA@x1/T/#u Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	06dd4c5a56	perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc Currently, the perf tool infrastructure uses the disasm_line__parse function to parse disassembled line. Example snippet from objdump: objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux> c0000000010224b4: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset. In powerpc, the approach for data type profiling uses raw instruction instead of result from objdump to identify the instruction category and extract the source/target registers. Example: 38 01 81 e8 ld r4,312(r1) Here "38 01 81 e8" is the raw instruction representation. Add function "disasm_line__parse_powerpc" to handle parsing of raw instruction. Also update "struct disasm_line" to save the binary code/ With the change, function captures: line -> "38 01 81 e8 ld r4,312(r1)" raw instruction "38 01 81 e8" Raw instruction is used later to extract the reg/offset fields. Macros are added to extract opcode and register fields. "struct disasm_line" is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw code (raw). Function "disasm_line__parse_powerpc fills the raw instruction hex value and can use macros to get opcode. There is no changes in existing code paths, which parses the disassembled code. The size of raw instruction depends on architecture. In case of powerpc, the parsing the disasm line needs to handle cases for reading binary code directly from DSO as well as parsing the objdump result. Hence adding the logic into separate function instead of updating "disasm_line__parse". The architecture using the instruction name and present approach is not altered. Since this approach targets powerpc, the macro implementation is added for powerpc as of now. Since the disasm_line__parse is used in other cases (perf annotate) and not only data tye profiling, the powerpc callback includes changes to work with binary code as well as mnemonic representation. Also in case if the DSO read fails and libcapstone is not supported, the approach fallback to use objdump as option. Hence as option, patch has changes to ensure objdump option also works well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com [ Add check for strndup() result ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	b1d8d968a7	perf annotate: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined to be 16. While checking if reg is valid using has_reg_type, max value is checked using TYPE_STATE_MAX_REGS value. Define this conditionally for powerpc. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-4-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	782959ac24	perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking Add "update_insn_state" callback to "struct arch" to handle instruction tracking. Currently updating instruction state is handled by static function "update_insn_state_x86" which is defined in "annotate-data.c". Make this as a callback for specific arch and move to archs specific file "arch/x86/annotate/instructions.c" . This will help to add helper function for other platforms in file: "arch/<platform>/annotate/instructions.c" and make changes/updates easier. Define callback "update_insn_state" as part of "struct arch", also make some of the debug functions non-static so that it can be referenced from other places. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-3-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Athira Rajeev	1d303deedb	perf annotate: Move the data structures related to register type to header file Data type profiling uses instruction tracking by checking each instruction and updating the register type state in some data structures. This is useful to find the data type in cases when the register state gets transferred from one reg to another. Example, in x86, "mov" instruction and in powerpc, "mr" instruction. Currently these structures are defined in annotate-data.c and instruction tracking is implemented only for x86. Move these data structures to "annotate-data.h" header file so that other arch implementations can use it in arch specific files as well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Ian Rogers	e293f4b1e5	perf test: Avoid python leak sanitizer test failures Leak sanitizer will report memory leaks from python and the leak sanitizer output causes tests to fail. For example: ``` $ perf test 98 -v 98: perf script tests: --- start --- test child forked, pid 1272962 DB test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.046 MB /tmp/perf-test-script.x0EktdCel8/perf.data (8 samples) ] call_path_table((1, 0, 0, 0) call_path_table((2, 1, 0, 140339508617447) call_path_table((3, 2, 2, 0) call_path_table((4, 3, 3, 0) call_path_table((5, 4, 4, 0) call_path_table((6, 5, 5, 0) call_path_table((7, 6, 6, 0) call_path_table((8, 7, 7, 0) call_path_table((9, 8, 8, 0) call_path_table((10, 9, 9, 0) call_path_table((11, 10, 10, 0) call_path_table((12, 11, 11, 0) call_path_table((13, 12, 1, 0) sample_table((1, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954119000, -1, 0, 0, 0, 0, 1, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1)) sample_table((2, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954137053, -1, 0, 0, 0, 0, 1, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1)) sample_table((3, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954140089, -1, 0, 0, 0, 0, 9, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1)) sample_table((4, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954142376, -1, 0, 0, 0, 0, 155, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1)) sample_table((5, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954144045, -1, 0, 0, 0, 0, 2493, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1)) sample_table((6, 1, 1, 1, 1, 1, 12, 77, -2046828595, 588306954145722, -1, 0, 0, 0, 0, 47555, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1)) call_path_table((14, 9, 14, 0) call_path_table((15, 14, 15, 0) call_path_table((16, 15, 0, -1040969624) call_path_table((17, 16, 16, 0) call_path_table((18, 17, 17, 0) call_path_table((19, 18, 18, 0) call_path_table((20, 19, 19, 0) call_path_table((21, 20, 13, 0) sample_table((7, 1, 1, 1, 2, 1, 13, 46, -2053700898, 588306954157436, -1, 0, 0, 0, 0, 964078, 0, 0, 128933429281, 0, 0, 21, 0, 0, 0, -1, -1)) call_path_table((22, 1, 21, 0) call_path_table((23, 22, 22, 0) call_path_table((24, 23, 23, 0) call_path_table((25, 24, 24, 0) call_path_table((26, 25, 25, 0) call_path_table((27, 26, 26, 0) call_path_table((28, 27, 27, 0) call_path_table((29, 28, 28, 0) call_path_table((30, 29, 29, 0) call_path_table((31, 30, 30, 0) call_path_table((32, 31, 31, 0) call_path_table((33, 32, 32, 0) call_path_table((34, 33, 33, 0) call_path_table((35, 34, 20, 0) sample_table((8, 1, 1, 1, 2, 1, 20, 49, -2046878127, 588306954378624, -1, 0, 0, 0, 0, 2534317, 0, 0, 128933429281, 0, 0, 35, 0, 0, 0, -1, -1)) ================================================================= ==1272975==ERROR: LeakSanitizer: detected memory leaks Direct leak of 13628 byte(s) in 6 object(s) allocated from: #0 0x56354f60c092 in malloc (/tmp/perf/perf+0x29c092) #1 0x7ff25c7d02e7 in _PyObject_Malloc /build/python3.11/../Objects/obmalloc.c:2003:11 #2 0x7ff25c7d02e7 in _PyObject_Malloc /build/python3.11/../Objects/obmalloc.c:1996:1 SUMMARY: AddressSanitizer: 13628 byte(s) leaked in 6 allocation(s). --- Cleaning up --- ---- end(-1) ---- 98: perf script tests : FAILED! ``` Disable leak sanitizer when running specific perf+python tests to avoid this. This causes the tests to pass when run with leak sanitizer. Reviewed-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Arnaldo Carvalho de Melo	c3d747134c	perf trace: Remove arg_fmt->is_enum, we can get that from the BTF type This is to pave the way for other BTF types, i.e. we try to find BTF type then use things like btf_is_enum(btf_type) that we cached to find the right strtoul and scnprintf routines. For now only enum is supported, all the other types simple return zero for scnprintf which makes it have the same behaviour as when BTF isn't available, i.e. fallback to no pretty printing. Ditto for strtoul. root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240624181345.124764-9-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Arnaldo Carvalho de Melo	62284329b1	perf trace: Introduce trace__btf_scnprintf() To have a central place that will look at the BTF type and call the right scnprintf routine or return zero, meaning BTF pretty printing isn't available or not implemented for a specific type. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240624181345.124764-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:59 -03:00
Howard Chu	d66763fed3	perf test trace_btf_enum: Add regression test for the BTF augmentation of enums in 'perf trace' Trace landlock_add_rule syscall to see if the output is desirable. Trace the non-syscall tracepoint 'timer:hrtimer_init' and 'timer:hrtimer_start', see if the 'mode' argument is augmented, the 'mode' enum argument has the prefix of 'HRTIMER_MODE_' in its name. Committer testing: root@x1:~# perf test enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf trace -e landlock_add_rule perf test -v enum 0.000 ( 0.010 ms): perf/749827 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd324171d4, flags: 45) = -1 EINVAL (Invalid argument) 0.012 ( 0.002 ms): perf/749827 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd324171e0, flags: 45) = -1 EINVAL (Invalid argument) 457.821 ( 0.007 ms): perf/749830 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd4acd31e4, flags: 45) = -1 EINVAL (Invalid argument) 457.832 ( 0.003 ms): perf/749830 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd4acd31f0, flags: 45) = -1 EINVAL (Invalid argument) 124: perf trace enum augmentation tests : Ok root@x1:~# Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240619082042.4173621-6-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240624181345.124764-7-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:58 -03:00
Howard Chu	3656e566cf	perf test: Add landlock workload We'll use it to add a regression test for the BTF augmentation of enum arguments for tracepoints in 'perf trace': root@x1:~# perf trace -e landlock_add_rule perf test -w landlock 0.000 ( 0.009 ms): perf/747160 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd8e258594, flags: 45) = -1 EINVAL (Invalid argument) 0.011 ( 0.002 ms): perf/747160 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd8e2585a0, flags: 45) = -1 EINVAL (Invalid argument) root@x1:~# Committer notes: It was agreed on the discussion (see Link below) to shorten then name of the workload from 'landlock_add_rule' to 'landlock', and I moved it to a separate patch. Also, to address a build failure from Namhyung, I stopped loading linux/landlock.h and instead added the used defines, enums and types to make this build in older systems. All we want is to emit the syscall and intercept it. Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAH0uvohaypdTV6Z7O5QSK+va_qnhZ6BP6oSJ89s1c1E0CjgxDA@mail.gmail.com Link: https://lore.kernel.org/r/20240624181345.124764-1-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240624181345.124764-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 16:12:46 -03:00
Howard Chu	9558658886	perf trace: Filter enum arguments with enum names Before: perf $ ./perf trace -e timer:hrtimer_start --filter='mode!=HRTIMER_MODE_ABS_PINNED_HARD' --max-events=1 No resolver (strtoul) for "mode" in "timer:hrtimer_start", can't set filter "(mode!=HRTIMER_MODE_ABS_PINNED_HARD) && (common_pid != 281988)" After: perf $ ./perf trace -e timer:hrtimer_start --filter='mode!=HRTIMER_MODE_ABS_PINNED_HARD' --max-events=1 0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 12351248764875, softexpires: 12351248764875, mode: HRTIMER_MODE_ABS) && and \|\|: perf $ ./perf trace -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS_PINNED_HARD && mode != HRTIMER_MODE_ABS' --max-events=1 0.000 Hyprland/534 timer:hrtimer_start(hrtimer: 0xffff9497801a84d0, function: 0xffffffffc04cdbe0, expires: 12639434638458, softexpires: 12639433638458, mode: HRTIMER_MODE_REL) perf $ ./perf trace -e timer:hrtimer_start --filter='mode == HRTIMER_MODE_REL \|\| mode == HRTIMER_MODE_PINNED' --max-events=1 0.000 ldlck-test/60639 timer:hrtimer_start(hrtimer: 0xffffb16404ee7bf8, function: 0xffffffffa7790420, expires: 12772614418016, softexpires: 12772614368016, mode: HRTIMER_MODE_REL) Switching it up, using both enum name and integer value(--filter='mode == HRTIMER_MODE_ABS_PINNED_HARD \|\| mode == 0'): perf $ ./perf trace -e timer:hrtimer_start --filter='mode == HRTIMER_MODE_ABS_PINNED_HARD \|\| mode == 0' --max-events=3 0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 12601748739825, softexpires: 12601748739825, mode: HRTIMER_MODE_ABS_PINNED_HARD) 0.036 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 12518758748124, softexpires: 12518758748124, mode: HRTIMER_MODE_ABS_PINNED_HARD) 0.172 tmux: server/41881 timer:hrtimer_start(hrtimer: 0xffffb164081e7838, function: 0xffffffffa7790420, expires: 12518768255836, softexpires: 12518768205836, mode: HRTIMER_MODE_ABS) P.S. perf $ pahole hrtimer_mode enum hrtimer_mode { HRTIMER_MODE_ABS = 0, HRTIMER_MODE_REL = 1, HRTIMER_MODE_PINNED = 2, HRTIMER_MODE_SOFT = 4, HRTIMER_MODE_HARD = 8, HRTIMER_MODE_ABS_PINNED = 2, HRTIMER_MODE_REL_PINNED = 3, HRTIMER_MODE_ABS_SOFT = 4, HRTIMER_MODE_REL_SOFT = 5, HRTIMER_MODE_ABS_PINNED_SOFT = 6, HRTIMER_MODE_REL_PINNED_SOFT = 7, HRTIMER_MODE_ABS_HARD = 8, HRTIMER_MODE_REL_HARD = 9, HRTIMER_MODE_ABS_PINNED_HARD = 10, HRTIMER_MODE_REL_PINNED_HARD = 11, }; Committer testing: root@x1:~# perf trace -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS' --max-events=2 0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff2a5050, function: 0xffffffff9e22ddd0, expires: 241502326000000, softexpires: 241502326000000, mode: HRTIMER_MODE_ABS_PINNED_HARD) 18446744073709.488 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff425050, function: 0xffffffff9e22ddd0, expires: 241501814000000, softexpires: 241501814000000, mode: HRTIMER_MODE_ABS_PINNED_HARD) root@x1:~# perf trace -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS && mode != HRTIMER_MODE_ABS_PINNED_HARD' --max-events=2 0.000 podman/510644 timer:hrtimer_start(hrtimer: 0xffffa2024f5f7dd0, function: 0xffffffff9e2170c0, expires: 241530497418194, softexpires: 241530497368194, mode: HRTIMER_MODE_REL) 40.251 gnome-shell/2484 timer:hrtimer_start(hrtimer: 0xffff8d48bda17650, function: 0xffffffffc0661550, expires: 241550528619247, softexpires: 241550527619247, mode: HRTIMER_MODE_REL) root@x1:~# perf trace -v -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS && mode != HRTIMER_MODE_ABS_PINNED_HARD && mode != HRTIMER_MODE_REL' --max-events=2 Using CPUID GenuineIntel-6-BA-3 vmlinux BTF loaded <SNIP> 0 0xa 0x1 New filter for timer:hrtimer_start: (mode != 0 && mode != 0xa && mode != 0x1) && (common_pid != 524049 && common_pid != 4041) mmap size 528384B ^Croot@x1:~# Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZnCcliuecJABD5FN@x1 Link: https://lore.kernel.org/r/20240624181345.124764-5-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 10:01:36 -03:00
Howard Chu	607bbdb49c	perf trace: Augment non-syscall tracepoints with enum arguments with BTF Before: perf $ ./perf trace -e timer:hrtimer_start --max-events=1 0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff974466c25f18, function: 0xffffffff89da5be0, expires: 377432432256753, softexpires: 377432432256753, mode: 10) After: perf $ ./perf trace -e timer:hrtimer_start --max-events=1 0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 4382442895089, softexpires: 4382442895089, mode: HRTIMER_MODE_ABS_PINNED_HARD) in which HRTIMER_MODE_ABS_PINNED_HARD is: perf $ pahole hrtimer_mode enum hrtimer_mode { HRTIMER_MODE_ABS = 0, HRTIMER_MODE_REL = 1, HRTIMER_MODE_PINNED = 2, HRTIMER_MODE_SOFT = 4, HRTIMER_MODE_HARD = 8, HRTIMER_MODE_ABS_PINNED = 2, HRTIMER_MODE_REL_PINNED = 3, HRTIMER_MODE_ABS_SOFT = 4, HRTIMER_MODE_REL_SOFT = 5, HRTIMER_MODE_ABS_PINNED_SOFT = 6, HRTIMER_MODE_REL_PINNED_SOFT = 7, HRTIMER_MODE_ABS_HARD = 8, HRTIMER_MODE_REL_HARD = 9, HRTIMER_MODE_ABS_PINNED_HARD = 10, HRTIMER_MODE_REL_PINNED_HARD = 11, }; Can also be tested by ./perf trace -e pagemap:mm_lru_insertion,timer:hrtimer_start,timer:hrtimer_init,skb:kfree_skb --max-events=10 (Chose these 4 events because they happen quite frequently.) However some enum arguments may not be contained in vmlinux BTF. To see what enum arguments are supported, use: vmlinux_dir $ bpftool btf dump file /sys/kernel/btf/vmlinux > vmlinux vmlinux_dir $ while read l; do grep "ENUM '$l'" vmlinux; done < <(grep field:enum /sys/kernel/tracing/events///format \| awk '{print $3}' \| sort \| uniq) \| awk '{print $3}' \| sed "s/'$.$'/\1/g" dev_pm_qos_req_type error_detector hrtimer_mode i2c_slave_event ieee80211_bss_type lru_list migrate_mode nl80211_auth_type nl80211_band nl80211_iftype numa_vmaskip_reason pm_qos_req_action pwm_polarity skb_drop_reason thermal_trip_type xen_lazy_mode xen_mc_extend_args xen_mc_flush_reason zone_type And what tracepoints have these enum types as their arguments: vmlinux_dir $ while read l; do grep "ENUM '$l'" vmlinux; done < <(grep field:enum /sys/kernel/tracing/events///format \| awk '{print $3}' \| sort \| uniq) \| awk '{print $3}' \| sed "s/'$.$'/\1/g" > good_enums vmlinux_dir $ cat good_enums dev_pm_qos_req_type error_detector hrtimer_mode i2c_slave_event ieee80211_bss_type lru_list migrate_mode nl80211_auth_type nl80211_band nl80211_iftype numa_vmaskip_reason pm_qos_req_action pwm_polarity skb_drop_reason thermal_trip_type xen_lazy_mode xen_mc_extend_args xen_mc_flush_reason zone_type vmlinux_dir $ grep -f good_enums -l /sys/kernel/tracing/events///format /sys/kernel/tracing/events/cfg80211/cfg80211_chandef_dfs_required/format /sys/kernel/tracing/events/cfg80211/cfg80211_ch_switch_notify/format /sys/kernel/tracing/events/cfg80211/cfg80211_ch_switch_started_notify/format /sys/kernel/tracing/events/cfg80211/cfg80211_get_bss/format /sys/kernel/tracing/events/cfg80211/cfg80211_ibss_joined/format /sys/kernel/tracing/events/cfg80211/cfg80211_inform_bss_frame/format /sys/kernel/tracing/events/cfg80211/cfg80211_radar_event/format /sys/kernel/tracing/events/cfg80211/cfg80211_ready_on_channel_expired/format /sys/kernel/tracing/events/cfg80211/cfg80211_ready_on_channel/format /sys/kernel/tracing/events/cfg80211/cfg80211_reg_can_beacon/format /sys/kernel/tracing/events/cfg80211/cfg80211_return_bss/format /sys/kernel/tracing/events/cfg80211/cfg80211_tx_mgmt_expired/format /sys/kernel/tracing/events/cfg80211/rdev_add_virtual_intf/format /sys/kernel/tracing/events/cfg80211/rdev_auth/format /sys/kernel/tracing/events/cfg80211/rdev_change_virtual_intf/format /sys/kernel/tracing/events/cfg80211/rdev_channel_switch/format /sys/kernel/tracing/events/cfg80211/rdev_connect/format /sys/kernel/tracing/events/cfg80211/rdev_inform_bss/format /sys/kernel/tracing/events/cfg80211/rdev_libertas_set_mesh_channel/format /sys/kernel/tracing/events/cfg80211/rdev_mgmt_tx/format /sys/kernel/tracing/events/cfg80211/rdev_remain_on_channel/format /sys/kernel/tracing/events/cfg80211/rdev_return_chandef/format /sys/kernel/tracing/events/cfg80211/rdev_return_int_survey_info/format /sys/kernel/tracing/events/cfg80211/rdev_set_ap_chanwidth/format /sys/kernel/tracing/events/cfg80211/rdev_set_monitor_channel/format /sys/kernel/tracing/events/cfg80211/rdev_set_radar_background/format /sys/kernel/tracing/events/cfg80211/rdev_start_ap/format /sys/kernel/tracing/events/cfg80211/rdev_start_radar_detection/format /sys/kernel/tracing/events/cfg80211/rdev_tdls_channel_switch/format /sys/kernel/tracing/events/compaction/mm_compaction_defer_compaction/format /sys/kernel/tracing/events/compaction/mm_compaction_deferred/format /sys/kernel/tracing/events/compaction/mm_compaction_defer_reset/format /sys/kernel/tracing/events/compaction/mm_compaction_finished/format /sys/kernel/tracing/events/compaction/mm_compaction_kcompactd_wake/format /sys/kernel/tracing/events/compaction/mm_compaction_suitable/format /sys/kernel/tracing/events/compaction/mm_compaction_wakeup_kcompactd/format /sys/kernel/tracing/events/error_report/error_report_end/format /sys/kernel/tracing/events/i2c_slave/i2c_slave/format /sys/kernel/tracing/events/migrate/mm_migrate_pages/format /sys/kernel/tracing/events/migrate/mm_migrate_pages_start/format /sys/kernel/tracing/events/pagemap/mm_lru_insertion/format /sys/kernel/tracing/events/power/dev_pm_qos_add_request/format /sys/kernel/tracing/events/power/dev_pm_qos_remove_request/format /sys/kernel/tracing/events/power/dev_pm_qos_update_request/format /sys/kernel/tracing/events/power/pm_qos_update_flags/format /sys/kernel/tracing/events/power/pm_qos_update_target/format /sys/kernel/tracing/events/pwm/pwm_apply/format /sys/kernel/tracing/events/pwm/pwm_get/format /sys/kernel/tracing/events/sched/sched_skip_vma_numa/format /sys/kernel/tracing/events/skb/kfree_skb/format /sys/kernel/tracing/events/thermal/thermal_zone_trip/format /sys/kernel/tracing/events/timer/hrtimer_init/format /sys/kernel/tracing/events/timer/hrtimer_start/format /sys/kernel/tracing/events/xen/xen_mc_batch/format /sys/kernel/tracing/events/xen/xen_mc_extend_args/format /sys/kernel/tracing/events/xen/xen_mc_flush_reason/format /sys/kernel/tracing/events/xen/xen_mc_issue/format Committer testing: root@x1:~# perf trace -e timer:hrtimer_start --max-events=2 0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff225050, function: 0xffffffff9e22ddd0, expires: 241152380000000, softexpires: 241152380000000, mode: HRTIMER_MODE_ABS) 0.028 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff225050, function: 0xffffffff9e22ddd0, expires: 241153654000000, softexpires: 241153654000000, mode: HRTIMER_MODE_ABS_PINNED_HARD) root@x1:~# Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20240615032743.112750-1-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240624181345.124764-4-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 10:01:36 -03:00
Howard Chu	45a0c928e7	perf trace: BTF-based enum pretty printing for syscall args In this patch, BTF is used to turn enum value to the corresponding name. There is only one system call that uses enum value as its argument, that is `landlock_add_rule()`. The vmlinux btf is loaded lazily, when user decided to trace the `landlock_add_rule` syscall. But if one decide to run `perf trace` without any arguments, the behaviour is to trace `landlock_add_rule`, so vmlinux btf will be loaded by default. The laziest behaviour is to load vmlinux btf when a `landlock_add_rule` syscall hits. But I think you could lose some samples when loading vmlinux btf at run time, for it can delay the handling of other samples. I might need your precious opinions on this... before: ``` perf $ ./perf trace -e landlock_add_rule 0.000 ( 0.008 ms): ldlck-test/438194 landlock_add_rule(rule_type: 2) = -1 EBADFD (File descriptor in bad state) 0.010 ( 0.001 ms): ldlck-test/438194 landlock_add_rule(rule_type: 1) = -1 EBADFD (File descriptor in bad state) ``` after: ``` perf $ ./perf trace -e landlock_add_rule 0.000 ( 0.029 ms): ldlck-test/438194 landlock_add_rule(rule_type: LANDLOCK_RULE_NET_PORT) = -1 EBADFD (File descriptor in bad state) 0.036 ( 0.004 ms): ldlck-test/438194 landlock_add_rule(rule_type: LANDLOCK_RULE_PATH_BENEATH) = -1 EBADFD (File descriptor in bad state) ``` Committer notes: Made it build with NO_LIBBPF=1, simplified btf_enum_fprintf(), see [1] for the discussion. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20240613022757.3589783-1-howardchu95@gmail.com Link: https://lore.kernel.org/lkml/ZnXAhFflUl_LV1QY@x1 # [1] Link: https://lore.kernel.org/r/20240624181345.124764-3-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-07-31 10:01:35 -03:00
Linus Torvalds	e254e0c5ba	Another perf tools fixes for v6.11 Some more fixes about the build and a random crash: * Fix cross-build by setting pkg-config env according to the arch * Fix static build for missing library dependencies * Fix Segfault when callchain has no symbols Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZqls2wAKCRCMstVUGiXM g12aAQCovAOO6jC5GrzCS8KlBoHXplyGL1rscI2uZfOttotBnQEA875Ov6FNL9Ux u9oxHZOpN/U4Xnyeoj9c43ibae/urAQ= =LTu1 -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.11-2024-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Some more build fixes and a random crash fix: - Fix cross-build by setting pkg-config env according to the arch - Fix static build for missing library dependencies - Fix Segfault when callchain has no symbols" * tag 'perf-tools-fixes-for-v6.11-2024-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf docs: Document cross compilation perf: build: Link lib 'zstd' for static build perf: build: Link lib 'lzma' for static build perf: build: Only link libebl.a for old libdw perf: build: Set Python configuration for cross compilation perf: build: Setup PKG_CONFIG_LIBDIR for cross compilation perf tool: fix dereferencing NULL al->maps	2024-07-30 19:22:41 -07:00
Leo Yan	d27087c76e	perf docs: Document cross compilation Records the commands for cross compilation with two methods. The first method relies on Multiarch. The second approach is to explicitly specify the PKG_CONFIG variables, which is widely used in build system (like Buildroot, Yocto, etc). Co-developed-by: James Clark <james.clark@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: amadio@gentoo.org Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240717082211.524826-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-26 11:15:55 -07:00
Leo Yan	f42596c738	perf: build: Link lib 'zstd' for static build When build static perf, Makefile reports the error: Makefile.config:480: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR The libdw has been installed on the system, but the build system fails to build the feature detecting binary 'test-libdw-dwarf-unwind'. The failure is caused by missing to link the lib 'zstd'. Link lib 'zstd' for the static build, in the end, the dwarf feature can be enabled in the static perf. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: amadio@gentoo.org Cc: James Clark <james.clark@linaro.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240717082211.524826-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-26 11:15:47 -07:00
Leo Yan	536661da6e	perf: build: Only link libebl.a for old libdw Since libdw version 0.177, elfutils has merged libebl.a into libdw (see the commit "libebl: Don't install libebl.a, libebl.h and remove backends from spec." in the elfutils repository). As a result, libebl.a does not exist on Debian Bullseye and newer releases, causing static perf builds to fail on these distributions. This commit checks the libdw version and only links libebl.a if it detects that the libdw version is older than 0.177. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: amadio@gentoo.org Cc: James Clark <james.clark@linaro.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240717082211.524826-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-26 11:15:25 -07:00
Leo Yan	cffe29d3b5	perf: build: Set Python configuration for cross compilation Python configuration has dedicated folders for different architectures. For example, Python 3.11 has two folders as shown below, one for Arm64 and another for x86_64: /usr/lib/python3.11/config-3.11-aarch64-linux-gnu/ /usr/lib/python3.11/config-3.11-x86_64-linux-gnu/ This commit updates the Python configuration path based on the compiler's machine type, guiding the compiler to find the correct path for Python libraries. It also renames the generated .so file name to match the machine name. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: amadio@gentoo.org Cc: James Clark <james.clark@linaro.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240717082211.524826-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-26 11:15:09 -07:00
Leo Yan	440cf77625	perf: build: Setup PKG_CONFIG_LIBDIR for cross compilation On recent Linux distros like Ubuntu Noble and Debian Bookworm, the 'pkg-config-aarch64-linux-gnu' package is missing. As a result, the aarch64-linux-gnu-pkg-config command is not available, which causes build failures. When a build passes the environment variables PKG_CONFIG_LIBDIR or PKG_CONFIG_PATH, like a user uses make command or a build system (like Yocto, Buildroot, etc) prepares the variables and passes to the Perf's Makefile, the commit keeps these variables for package configuration. Otherwise, this commit sets the PKG_CONFIG_LIBDIR variable to use the Multiarch libs for the cross compilation. Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: amadio@gentoo.org Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240717082211.524826-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-26 11:14:56 -07:00
Casey Chen	4c17736689	perf tool: fix dereferencing NULL al->maps With `0dd5041c9a` ("perf addr_location: Add init/exit/copy functions"), when cpumode is 3 (macro PERF_RECORD_MISC_HYPERVISOR), thread__find_map() could return with al->maps being NULL. The path below could add a callchain_cursor_node with NULL ms.maps. add_callchain_ip() thread__find_symbol(.., &al) thread__find_map(.., &al) // al->maps becomes NULL ms.maps = maps__get(al.maps) callchain_cursor_append(..., &ms, ...) node->ms.maps = maps__get(ms->maps) Then the path below would dereference NULL maps and get segfault. fill_callchain_info() maps__machine(node->ms.maps); Fix it by checking if maps is NULL in fill_callchain_info(). Fixes: `0dd5041c9a` ("perf addr_location: Add init/exit/copy functions") Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: yzhong@purestorage.com Link: https://lore.kernel.org/r/20240722211548.61455-1-cachen@purestorage.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-26 11:12:16 -07:00
Linus Torvalds	786c8248db	perf tools fixes for v6.11 Two fixes about building perf and other tools: * Fix breakage in tracing tools due to pkg-config for libtrace{event,fs} * Fix build of perf when libunwind is used Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHQEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZqA1SAAKCRCMstVUGiXM g3TfAQDXLi+XcSDE/u5JcDN3H6+bXvavDn2k8Gsd6vWZQc5LEQD3X1E+GbtWTQsE ruk5ZT3voy8qBPgmrUg72NJwmRxYAQ== =0RKR -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Two fixes for building perf and other tools: - Fix breakage in tracing tools due to pkg-config for libtrace{event,fs} - Fix build of perf when libunwind is used" * tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf dso: Fix build when libunwind is enabled tools/latency: Use pkg-config in lib_setup of Makefile.config tools/rtla: Use pkg-config in lib_setup of Makefile.config tools/verification: Use pkg-config in lib_setup of Makefile.config tools: Make pkg-config dependency checks usable by other tools perf build: Warn if libtracefs is not found	2024-07-23 18:15:51 -07:00
Linus Torvalds	ca83c61cb3	Kbuild updates for v6.11 - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_DEBUG_INFO_BTF and CONFIG_KALLSYMS - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmagBLUVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGmoUQAJ8pnURs0g+Rcyk6bdY/qtXBYkS+ nXpIK1ssFgRRgAQdeszYtvBqLFzb0wRCSie87G1AriD/JkVVTjCCY1For1y+vs0u a7HfxitHhZpPyZW/T+WMQ3LViNccpkx+DFAcoRH8xOY/XPEJKVUby332jOIXMuyg +NKIELQJVsLhcDofTUGb5VfIQektw219n5c4jKjXdNk4ZtE24xCRM5X528ZebwWJ RZhMvJ968PyIH1IRXvNt6dsKBxoGIwPP8IO6yW9hzHaNsBqt7MGSChSel7r1VKpk iwCNApJvEiVBe5wvTSVOVro7/8p/AZ70CQAqnMJV+dNnRqtGqW7NvL6XAjZRJgJJ Uxe5NSrXgQd3FtqfcbXLetBgp9zGVt328nHm1HXHR5rFsvoOiTvO7hHPbhA+OoWJ fs+jHzEXdAMRgsNrczPWU5Svq6MgGe4v8HBf0m8N1Uy65t/O+z9ti2QAw7kIFlbu /VSFNjw4CHmNxGhnH0khCMsy85FwVIt9Ux+2d6IEc0gP8S1Qa1HgHGAoVI4U51eS 9dxEPVJNPOugaIVHheuS3wimEO6wzaJcQHn4IXaasMA7P6Yo4G/jiGoy4cb9qPTM Hb+GaOltUy7vDoG4D2LSym8zR8rdKwbIf/5psdZrq/IWVKq5p+p7KWs3aOykSoM7 o6Hb532Ioalhm8je =BYu7 -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_KALLSYMS and CONFIG_DEBUG_INFO_BTF - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig * tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits) kbuild: doc: gcc to CC change kallsyms: change sym_entry::percpu_absolute to bool type kallsyms: unify seq and start_pos fields of struct sym_entry kallsyms: add more original symbol type/name in comment lines kallsyms: use \t instead of a tab in printf() kallsyms: avoid repeated calculation of array size for markers kbuild: add script and target to generate pacman package modpost: use generic macros for hash table implementation kbuild: move some helper headers from scripts/kconfig/ to scripts/include/ Makefile: add comment to discourage tools/* addition for kernel builds kbuild: clean up scripts/remove-stale-files kconfig: recursive checks drop file/lineno kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec kallsyms: get rid of code for absolute kallsyms kbuild: Create INSTALL_PATH directory if it does not exist kbuild: Abort make on install failures kconfig: remove 'e1' and 'e2' macros from expression deduplication kconfig: remove SYMBOL_CHOICEVAL flag kconfig: add const qualifiers to several function arguments kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups() ...	2024-07-23 14:32:21 -07:00
Linus Torvalds	2c9b351240	ARM: * Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement * Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol * FPSIMD/SVE support for nested, including merged trap configuration and exception routing * New command-line parameter to control the WFx trap behavior under KVM * Introduce kCFI hardening in the EL2 hypervisor * Fixes + cleanups for handling presence/absence of FEAT_TCRX * Miscellaneous fixes + documentation updates LoongArch: * Add paravirt steal time support. * Add support for KVM_DIRTY_LOG_INITIALLY_SET. * Add perf kvm-stat support for loongarch. RISC-V: * Redirect AMO load/store access fault traps to guest * perf kvm stat support * Use guest files for IMSIC virtualization, when available ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA extensions is coming through the RISC-V tree. s390: * Assortment of tiny fixes which are not time critical x86: * Fixes for Xen emulation. * Add a global struct to consolidate tracking of host values, e.g. EFER * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. * Print the name of the APICv/AVIC inhibits in the relevant tracepoint. * Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. * Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop. * Update to the newfangled Intel CPU FMS infrastructure. * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored. * Misc cleanups x86 - MMU: * Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support. * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs. * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages. * Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards. x86 - AMD: * Make per-CPU save_area allocations NUMA-aware. * Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code. * Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges. This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification. There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data. x86 - Intel: * Remove an unnecessary EPT TLB flush when enabling hardware. * Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1). * KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation. Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed. See commit `0dc902267c` ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows. Generic: * Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them). * New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl. * Enable halt poll shrinking by default, as Intel found it to be a clear win. * Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. * Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). * Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. * Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. Selftests: * Remove dead code in the memslot modification stress test. * Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs. * Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs. * Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG +sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv 8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3 OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ== =zepX -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit `0dc902267c` ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...	2024-07-20 12:41:03 -07:00
Jann Horn	64e166099b	kallsyms: get rid of code for absolute kallsyms Commit `cf8e865810` ("arch: Remove Itanium (IA-64) architecture") removed the last use of the absolute kallsyms. Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/all/20240221202655.2423854-1-jannh@google.com/ [masahiroy@kernel.org: rebase the code and reword the commit description] Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2024-07-20 16:33:21 +09:00
James Clark	92717bc077	perf dso: Fix build when libunwind is enabled Now that symsrc_filename is always accessed through an accessor, we also need a free() function for it to avoid the following compilation error: util/unwind-libunwind-local.c:416:12: error: lvalue required as unary ‘&’ operand 416 \| zfree(&dso__symsrc_filename(dso)); Fixes: `1553419c3c` ("perf dso: Fix address sanitizer build") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240715094715.3914813-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-17 13:17:57 -07:00
Guilherme Amadio	8f61e98ad5	tools: Make pkg-config dependency checks usable by other tools Other tools, in tools/verification and tools/tracing, make use of libtraceevent and libtracefs as dependencies. This allows setting up the feature check flags for them as well. Signed-off-by: Guilherme Amadio <amadio@gentoo.org> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Tested-by: Leo Yan <leo.yan@arm.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20240717174739.186988-3-amadio@gentoo.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-17 13:14:35 -07:00
Guilherme Amadio	37ac347f87	perf build: Warn if libtracefs is not found Signed-off-by: Guilherme Amadio <amadio@gentoo.org> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Tested-by: Leo Yan <leo.yan@arm.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20240717174739.186988-2-amadio@gentoo.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-17 13:13:59 -07:00
Howard Chu	7a2fb5619c	perf trace: Fix iteration of syscall ids in syscalltbl->entries This is a bug found when implementing pretty-printing for the landlock_add_rule system call, I decided to send this patch separately because this is a serious bug that should be fixed fast. I wrote a test program to do landlock_add_rule syscall in a loop, yet perf trace -e landlock_add_rule freezes, giving no output. This bug is introduced by the false understanding of the variable "key" below: ``` for (key = 0; key < trace->sctbl->syscalls.nr_entries; ++key) { struct syscall sc = trace__syscall_info(trace, NULL, key); ... } ``` The code above seems right at the beginning, but when looking at syscalltbl.c, I found these lines: ``` for (i = 0; i <= syscalltbl_native_max_id; ++i) if (syscalltbl_native[i]) ++nr_entries; entries = tbl->syscalls.entries = malloc(sizeof(struct syscall) nr_entries); ... for (i = 0, j = 0; i <= syscalltbl_native_max_id; ++i) { if (syscalltbl_native[i]) { entries[j].name = syscalltbl_native[i]; entries[j].id = i; ++j; } } ``` meaning the key is merely an index to traverse the syscall table, instead of the actual syscall id for this particular syscall. So if one uses key to do trace__syscall_info(trace, NULL, key), because key only goes up to trace->sctbl->syscalls.nr_entries, for example, on my X86_64 machine, this number is 373, it will end up neglecting all the rest of the syscall, in my case, everything after `rseq`, because the traversal will stop at 373, and `rseq` is the last syscall whose id is lower than 373 in tools/perf/arch/x86/include/generated/asm/syscalls_64.c: ``` ... [334] = "rseq", [424] = "pidfd_send_signal", ... ``` The reason why the key is scrambled but perf trace works well is that key is used in trace__syscall_info(trace, NULL, key) to do trace->syscalls.table[id], this makes sure that the struct syscall returned actually has an id the same value as key, making the later bpf_prog matching all correct. After fixing this bug, I can do perf trace on 38 more syscalls, and because more syscalls are visible, we get 8 more syscalls that can be augmented. before: perf $ perf trace -vv --max-events=1 \|& grep Reusing Reusing "open" BPF sys_enter augmenter for "stat" Reusing "open" BPF sys_enter augmenter for "lstat" Reusing "open" BPF sys_enter augmenter for "access" Reusing "connect" BPF sys_enter augmenter for "accept" Reusing "sendto" BPF sys_enter augmenter for "recvfrom" Reusing "connect" BPF sys_enter augmenter for "bind" Reusing "connect" BPF sys_enter augmenter for "getsockname" Reusing "connect" BPF sys_enter augmenter for "getpeername" Reusing "open" BPF sys_enter augmenter for "execve" Reusing "open" BPF sys_enter augmenter for "truncate" Reusing "open" BPF sys_enter augmenter for "chdir" Reusing "open" BPF sys_enter augmenter for "mkdir" Reusing "open" BPF sys_enter augmenter for "rmdir" Reusing "open" BPF sys_enter augmenter for "creat" Reusing "open" BPF sys_enter augmenter for "link" Reusing "open" BPF sys_enter augmenter for "unlink" Reusing "open" BPF sys_enter augmenter for "symlink" Reusing "open" BPF sys_enter augmenter for "readlink" Reusing "open" BPF sys_enter augmenter for "chmod" Reusing "open" BPF sys_enter augmenter for "chown" Reusing "open" BPF sys_enter augmenter for "lchown" Reusing "open" BPF sys_enter augmenter for "mknod" Reusing "open" BPF sys_enter augmenter for "statfs" Reusing "open" BPF sys_enter augmenter for "pivot_root" Reusing "open" BPF sys_enter augmenter for "chroot" Reusing "open" BPF sys_enter augmenter for "acct" Reusing "open" BPF sys_enter augmenter for "swapon" Reusing "open" BPF sys_enter augmenter for "swapoff" Reusing "open" BPF sys_enter augmenter for "delete_module" Reusing "open" BPF sys_enter augmenter for "setxattr" Reusing "open" BPF sys_enter augmenter for "lsetxattr" Reusing "openat" BPF sys_enter augmenter for "fsetxattr" Reusing "open" BPF sys_enter augmenter for "getxattr" Reusing "open" BPF sys_enter augmenter for "lgetxattr" Reusing "openat" BPF sys_enter augmenter for "fgetxattr" Reusing "open" BPF sys_enter augmenter for "listxattr" Reusing "open" BPF sys_enter augmenter for "llistxattr" Reusing "open" BPF sys_enter augmenter for "removexattr" Reusing "open" BPF sys_enter augmenter for "lremovexattr" Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr" Reusing "open" BPF sys_enter augmenter for "mq_open" Reusing "open" BPF sys_enter augmenter for "mq_unlink" Reusing "fsetxattr" BPF sys_enter augmenter for "add_key" Reusing "fremovexattr" BPF sys_enter augmenter for "request_key" Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch" Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat" Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat" Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat" Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat" Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "linkat" Reusing "open" BPF sys_enter augmenter for "symlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat" Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat" Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat" Reusing "connect" BPF sys_enter augmenter for "accept4" Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at" Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2" Reusing "open" BPF sys_enter augmenter for "memfd_create" Reusing "fremovexattr" BPF sys_enter augmenter for "execveat" Reusing "fremovexattr" BPF sys_enter augmenter for "statx" after perf $ perf trace -vv --max-events=1 \|& grep Reusing Reusing "open" BPF sys_enter augmenter for "stat" Reusing "open" BPF sys_enter augmenter for "lstat" Reusing "open" BPF sys_enter augmenter for "access" Reusing "connect" BPF sys_enter augmenter for "accept" Reusing "sendto" BPF sys_enter augmenter for "recvfrom" Reusing "connect" BPF sys_enter augmenter for "bind" Reusing "connect" BPF sys_enter augmenter for "getsockname" Reusing "connect" BPF sys_enter augmenter for "getpeername" Reusing "open" BPF sys_enter augmenter for "execve" Reusing "open" BPF sys_enter augmenter for "truncate" Reusing "open" BPF sys_enter augmenter for "chdir" Reusing "open" BPF sys_enter augmenter for "mkdir" Reusing "open" BPF sys_enter augmenter for "rmdir" Reusing "open" BPF sys_enter augmenter for "creat" Reusing "open" BPF sys_enter augmenter for "link" Reusing "open" BPF sys_enter augmenter for "unlink" Reusing "open" BPF sys_enter augmenter for "symlink" Reusing "open" BPF sys_enter augmenter for "readlink" Reusing "open" BPF sys_enter augmenter for "chmod" Reusing "open" BPF sys_enter augmenter for "chown" Reusing "open" BPF sys_enter augmenter for "lchown" Reusing "open" BPF sys_enter augmenter for "mknod" Reusing "open" BPF sys_enter augmenter for "statfs" Reusing "open" BPF sys_enter augmenter for "pivot_root" Reusing "open" BPF sys_enter augmenter for "chroot" Reusing "open" BPF sys_enter augmenter for "acct" Reusing "open" BPF sys_enter augmenter for "swapon" Reusing "open" BPF sys_enter augmenter for "swapoff" Reusing "open" BPF sys_enter augmenter for "delete_module" Reusing "open" BPF sys_enter augmenter for "setxattr" Reusing "open" BPF sys_enter augmenter for "lsetxattr" Reusing "openat" BPF sys_enter augmenter for "fsetxattr" Reusing "open" BPF sys_enter augmenter for "getxattr" Reusing "open" BPF sys_enter augmenter for "lgetxattr" Reusing "openat" BPF sys_enter augmenter for "fgetxattr" Reusing "open" BPF sys_enter augmenter for "listxattr" Reusing "open" BPF sys_enter augmenter for "llistxattr" Reusing "open" BPF sys_enter augmenter for "removexattr" Reusing "open" BPF sys_enter augmenter for "lremovexattr" Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr" Reusing "open" BPF sys_enter augmenter for "mq_open" Reusing "open" BPF sys_enter augmenter for "mq_unlink" Reusing "fsetxattr" BPF sys_enter augmenter for "add_key" Reusing "fremovexattr" BPF sys_enter augmenter for "request_key" Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch" Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat" Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat" Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat" Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat" Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "linkat" Reusing "open" BPF sys_enter augmenter for "symlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat" Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat" Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat" Reusing "connect" BPF sys_enter augmenter for "accept4" Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at" Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2" Reusing "open" BPF sys_enter augmenter for "memfd_create" Reusing "fremovexattr" BPF sys_enter augmenter for "execveat" Reusing "fremovexattr" BPF sys_enter augmenter for "statx" TL;DR: These are the new syscalls that can be augmented Reusing "openat" BPF sys_enter augmenter for "open_tree" Reusing "openat" BPF sys_enter augmenter for "openat2" Reusing "openat" BPF sys_enter augmenter for "mount_setattr" Reusing "openat" BPF sys_enter augmenter for "move_mount" Reusing "open" BPF sys_enter augmenter for "fsopen" Reusing "openat" BPF sys_enter augmenter for "fspick" Reusing "openat" BPF sys_enter augmenter for "faccessat2" Reusing "openat" BPF sys_enter augmenter for "fchmodat2" as for the perf trace output: before perf $ perf trace -e faccessat2 --max-events=1 [no output] after perf $ ./perf trace -e faccessat2 --max-events=1 0.000 ( 0.037 ms): waybar/958 faccessat2(dfd: 40, filename: "uevent") = 0 P.S. The reason why this bug was not found in the past five years is probably because it only happens to the newer syscalls whose id is greater, for instance, faccessat2 of id 439, which not a lot of people care about when using perf trace. [Arnaldo]: notes That and the fact that the BPF code was hidden before having to use -e, that got changed kinda recently when we switched to using BPF skels for augmenting syscalls in 'perf trace': ⬢[acme@toolbox perf-tools-next]$ git log --oneline tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c `a9f4c6c999` perf trace: Collect sys_nanosleep first argument `29d16de26d` perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h `5069211e2f` perf trace: Use the right bpf_probe_read(_str) variant for reading user data `33b725ce7b` perf trace: Avoid compile error wrt redefining bool `7d9642311b` perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two. `262b54b6c9` perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two. `1836480429` perf bpf_skel augmented_raw_syscalls: Cap the socklen parameter using &= sizeof(saddr) `cd2cece61a` perf trace: Tidy comments related to BPF + syscall augmentation `5e6da6be30` perf trace: Migrate BPF augmentation to use a skeleton ⬢[acme@toolbox perf-tools-next]$ ⬢[acme@toolbox perf-tools-next]$ git show --oneline --pretty=reference `5e6da6be30` \| head -1 `5e6da6be30` (perf trace: Migrate BPF augmentation to use a skeleton, 2023-08-10) ⬢[acme@toolbox perf-tools-next]$ I.e. from August, 2023. One had as well to ask for BUILD_BPF_SKEL=1, which now is default if all it needs is available on the system. I simplified the code to not expose the 'struct syscall' outside of tools/perf/util/syscalltbl.c, instead providing a function to go from the index to the syscall id: int syscalltbl__id_at_idx(struct syscalltbl *tbl, int idx); Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/lkml/ZmhlAxbVcAKoPTg8@x1 Link: https://lore.kernel.org/r/20240705132059.853205-2-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:49:02 -07:00
Ian Rogers	1553419c3c	perf dso: Fix address sanitizer build Various files had been missed from having accessor functions added for the sake of dso reference count checking. Add the function calls and missing dso accessor functions. Fixes: `ee756ef749` ("perf dso: Add reference count checking and accessor functions") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240704011745.1021288-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:41 -07:00
Leo Yan	14b0fffa25	perf mem: Warn if memory events are not supported on all CPUs It is possible that memory events are not supported on all CPUs. Prints a warning by dumping the enabled CPU maps in this case. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20240706152035.86983-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:40 -07:00
Leo Yan	e6b4da6759	perf arm-spe: Support multiple Arm SPE PMUs A platform can have more than one Arm SPE PMU. For example, a system with multiple clusters may have each cluster enabled with its own Arm SPE instance. In such case, the PMU devices will be named 'arm_spe_0', 'arm_spe_1', and so on. Currently, the tool only supports 'arm_spe_0'. This commit extends support to multiple Arm SPE PMUs by detecting the substring 'arm_spe_'. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20240706152035.86983-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:40 -07:00
Haoze Xie	759ce73cf7	perf build x86: Fix SC2034 error in syscalltbl.sh Change the unused var in 'arch/x86/entry/syscalls/syscalltbl.sh' to '_' when reading from '$sorted_table'. This change allows the script to pass tests of ShellCheck before and after version 0.7.2 at the same time. When building in arch x86, syscalltbl.sh got a ShellCheck warning, which makes compilation error: In arch/x86/entry/syscalls/syscalltbl.sh line 27: while read nr _abi name entry _compat; do ^-^ SC2034: abi appears unused. Verify use (or export if used externally). ^----^ SC2034: compat appears unused. Verify use (or export if used externally). The script reads unused param abi and compat. It uses format '_xxx' to indicate dummy vars, which won't work properly when ShellCheck <= 0.7.2. According to SC2034, the more general way of writing is to use directly '_' to indicate discarding vars. 'entry' is also replaced by '_' because it just happens to be defined in emit function, otherwise it will lead to some misunderstandings. Link: https://www.shellcheck.net/wiki/SC2034 Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Yuan Tan <tanyuan@tinylab.org> Link: https://lore.kernel.org/r/2143cab4cd8468c88860f4e5e382d0e6b4d89ac9.1720372178.git.royenheart@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:40 -07:00
Haoze Xie	6353abd32c	perf record: Fix memset out-of-range error Modified the object of 'memset' from '&lost.lost' to '&lost' in record__read_lost_samples. This allows 'memset' to access memory properly without causing out-of-bounds problems. The problems got from builtin-record.c are: In file included from /usr/include/string.h:495, from util/parse-events.h:13, from builtin-record.c:14: In function 'memset', inlined from 'record__read_lost_samples' at builtin-record.c:1958:6, inlined from '__cmd_record.constprop' at builtin-record.c:2817:2: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71:10: error: '__builtin_memset' offset [17, 64] from the object at 'lost' is out of the bounds of referenced subobject 'lost' with type 'struct perf_record_lost_samples' at offset 0 [-Werror=array-bounds] 71\|return __builtin___memset_chk (__dest,__ch,__len,__bos0 (__dest)); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The error arised when performing a memset operation on the 'lost' variable, the bytes of 'sizeof(lost)' exceeds that of '&lost.lost', which are 64 and 16. Fixes: `6c1785cd75` ("perf record: Ensure space for lost samples") Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Yuan Tan <tanyuan@tinylab.org> Link: https://lore.kernel.org/r/11e12f171b846577cac698cd3999db3d7f6c4d03.1720372317.git.royenheart@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:40 -07:00
Madadi Vineeth Reddy	306f921e87	perf sched map: Add --fuzzy-name option for fuzzy matching in task names The --fuzzy-name option can be used if fuzzy name matching is required. For example, "taskname" can be matched to any string that contains "taskname" as its substring. Sample output for --task-name wdav --fuzzy-name ============= . A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509 . A0 B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274 . - B0 . . . - . 131040.641379 secs C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283 C0 . B0 . D0 . . . 131040.641572 secs D0 => wdavdaemon:62277 C0 . B0 . D0 . E0 . 131040.641578 secs E0 => wdavdaemon:62270 *- . B0 . D0 . E0 . 131040.641581 secs Suggested-by: Chen Yu <yu.c.chen@intel.com> Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Link: https://lore.kernel.org/r/20240707182716.22054-4-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:40 -07:00
Madadi Vineeth Reddy	9cc0afed6f	perf sched map: Add support for multiple task names using CSV To track the scheduling patterns of multiple tasks simultaneously, multiple task names can be specified using a comma separator without any whitespace. Sample output for --task-name perf,wdavdaemon ============= . A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509 . A0 B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274 . - B0 . . . - . 131040.641379 secs C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283 ... . - . . . . . . 131041.395649 secs . . . . . . . X2 131041.403969 secs X2 => perf:70211 . . . . . . . *- 131041.404006 secs Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Chen Yu <yu.c.chen@intel.com> Link: https://lore.kernel.org/r/20240707182716.22054-3-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:40 -07:00
Madadi Vineeth Reddy	3116d60910	perf sched map: Add task-name option to filter the output map By default, perf sched map prints sched-in events for all the tasks which may not be required all the time as it prints lot of symbols and rows to the terminal. With --task-name option, one could specify the specific task name for which the map has to be shown. This would help in analyzing the CPU usage patterns easier for that specific task. Since multiple PID's might have the same task name, using task-name filter would be more useful for debugging. For other tasks, instead of printing the symbol, '-' is printed and the same '.' is used to represent idle. '-' is used instead of symbol for other tasks because it helps in clear visualization of task of interest and secondly the symbol itself doesn't mean anything because the sched-in of that symbol will not be printed(first sched-in contains pid and the corresponding symbol). When using the --task-name option, the sched-out time is represented by a '-'. Since not all task sched-in events are printed, the sched-out time of the relevant task might be lost. This representation ensures that the sched-out time of the interested task is not overlooked. 6.10.0-rc1 ========== A0 131040.639793 secs A0 => migration/0:19 . 131040.639801 secs . => swapper:0 . B0 131040.639830 secs B0 => migration/1:24 . . 131040.639836 secs . . C0 131040.640108 secs C0 => migration/2:30 . . . 131040.640163 secs . . . D0 131040.640386 secs D0 => migration/3:36 . . . . 131040.640395 secs 6.10.0-rc1 + patch (--task-name wdavdaemon) ============= . A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509 . A0 B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274 - - B0 . . . - . 131040.641379 secs C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283 C0 . B0 . D0 . . . 131040.641572 secs D0 => wdavdaemon:62277 C0 . B0 . D0 . E0 . 131040.641578 secs E0 => wdavdaemon:62270 - . B0 . D0 . E0 . 131040.641581 secs . . B0 . D0 . *- . 131040.641583 secs Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Chen Yu <yu.c.chen@intel.com> Link: https://lore.kernel.org/r/20240707182716.22054-2-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-12 09:38:40 -07:00
Paolo Bonzini	c8b8b8190a	LoongArch KVM changes for v6.11 1. Add ParaVirt steal time support. 2. Add some VM migration enhancement. 3. Add perf kvm-stat support for loongarch. -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmaOS6UWHGNoZW5odWFj YWlAa2VybmVsLm9yZwAKCRAChivD8uImehejD/9pACGe3h3krXLcFVWXOFIu5Hpc 5kQLP0lSPJ/o5Xs8t/oPLrnDX70z90wXI1LOmltc7h32MSwFa2l8COQh+sN5eJBQ PNyt7u7bMipp0yJS4Gl3LQQ5vklcGOSpQc/gbeXnVx8J/tz+Mo9YGGLIXVRXRM6W Ri8D2VVFiwzQQYeTpPo1u1Ob8C6mA4KOppwvhscMTM3vj4NMbsinBzRnR0lG0Tdw meFhxDPly1Ksxsbnj9UGO6UnEY0A2SLONs6MiO4y4DtoqoDlw/lbqFJuYo4vvbx1 pxtjyirD/PX/wjslQFWUOuU0hMfAodera+JupZ5BZWfcG8FltA4DQfDsm/U9RjK/ 7gGNnr8Xk2/tp6+4AVV+HU2iTgRvq+mXCL72zSy2Y4r7ElBAANDfk4n+Zn/PWisn U9wwV8Ue7tVB15BRpRsg77NzBidiCFEe/6flWYiX2y24ke71gwDJBGUy8hMdKt6t 4Cq8atsU0MvDAzfYMsK9JjskJp4UFq6wb1tXbbuADM4TDhnzlK6s6h3vM+pFlh/f my7fDH8/2qsCWhBDM4pmsJskVp+I1GOk/80RjTQISwx7iHktJWvxNYTaisK2fvD5 Qs1IUWfNFbDX0Lr0QpN6j6X4rZkghR4R6XoFkd4nkicwi+UHVn3oK9GSqv24QJn9 7+Ev3dfRTUYLd6mC4Q== =DpIK -----END PGP SIGNATURE----- Merge tag 'loongarch-kvm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.11 1. Add ParaVirt steal time support. 2. Add some VM migration enhancement. 3. Add perf kvm-stat support for loongarch.	2024-07-12 11:24:12 -04:00
Guilherme Amadio	1d302f626c	perf build: Conditionally add feature check flags for libtrace{event,fs} This avoids reported warnings when the packages are not installed. [namhyung]: Removed the dummy assignment and unnecessary ifeq checks. Fixes: `0f0e1f4456` ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Signed-off-by: Guilherme Amadio <amadio@gentoo.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Link: https://lore.kernel.org/r/20240628203432.3273625-1-amadio@gentoo.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-11 15:08:49 -07:00
Bibo Mao	492ac37fa3	perf kvm: Add kvm-stat for loongarch64 Add support for 'perf kvm stat' on loongarch64 platform, now only kvm exit event is supported. Here is example output about "perf kvm --host stat report" command Event name Samples Sample% Time (ns) Time% Mean Time (ns) Mem Store 83969 51.00% 625697070 8.00% 7451 Mem Read 37641 22.00% 112485730 1.00% 2988 Interrupt 15542 9.00% 20620190 0.00% 1326 IOCSR 15207 9.00% 94296190 1.00% 6200 Hypercall 4873 2.00% 12265280 0.00% 2516 Idle 3713 2.00% 6322055860 87.00% 1702681 FPU 1819 1.00% 2750300 0.00% 1511 Inst Fetch 502 0.00% 1341740 0.00% 2672 Mem Modify 324 0.00% 602240 0.00% 1858 CPUCFG 55 0.00% 77610 0.00% 1411 CSR 12 0.00% 19690 0.00% 1640 LASX 3 0.00% 4870 0.00% 1623 LSX 2 0.00% 2100 0.00% 1050 Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-07-10 16:50:27 +08:00
Nicolas Schier	608c3b1e61	perf install: Don't propagate subdir to Documentation submake Explicitly reset 'subdir' variable when descending to tools/perf/Documentation. Similar to commit `f89fb55714` ("perf build: Don't propagate subdir to submakes for install_headers", 2023-01-02), calling the 'tools/perf_install' target via top-levels Makefile results in repeated subdir components when attempting to call the perf documentation installation rules: $ make tools/perf_install NO_LIBTRACEEVENT=1 JOBS=1 [...] /bin/sh: 1: cd: can't cd to /data/linux/kbuild/tools/perf/tools/perf/ ../../scripts/Makefile.include:17: * output directory "/data/linux/kbuild/tools/perf/tools/perf/" does not exist. Stop. make[5]: * [Makefile.perf:1096: try-install-man] Error 2 make[4]: * [Makefile.perf:264: sub-make] Error 2 make[3]: * [Makefile:113: install] Error 2 make[2]: *** [Makefile:131: perf_install] Error 2 Resetting 'subdir' fixes the call from top-level Makefile. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Nicolas Schier <n.schier@avm.de> Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Tested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/20240523-make-tools-perf-install-v1-1-3903499e637f@avm.de Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-03 17:11:38 -07:00
Xu Yang	3710578d2d	perf vendor events arm64:: Add i.MX95 DDR Performance Monitor metrics Add JSON metrics for i.MX95 DDR Performance Monitor. Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Cc: festevam@gmail.com Cc: conor+dt@kernel.org Cc: robh+dt@kernel.org Cc: shawnguo@kernel.org Cc: will@kernel.org Cc: krzysztof.kozlowski+dt@linaro.org Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: imx@lists.linux.dev Cc: kernel@pengutronix.de Cc: s.hauer@pengutronix.de Cc: devicetree@vger.kernel.org Link: https://lore.kernel.org/r/20240529080358.703784-8-xu.yang_2@nxp.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-03 16:46:05 -07:00
Xu Yang	2697b79a46	perf vendor events arm64:: Add i.MX93 DDR Performance Monitor metrics Add JSON metrics for i.MX93 DDR Performance Monitor. Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Cc: festevam@gmail.com Cc: conor+dt@kernel.org Cc: robh+dt@kernel.org Cc: shawnguo@kernel.org Cc: will@kernel.org Cc: krzysztof.kozlowski+dt@linaro.org Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: imx@lists.linux.dev Cc: john.g.garry@oracle.com Cc: kernel@pengutronix.de Cc: s.hauer@pengutronix.de Cc: devicetree@vger.kernel.org Link: https://lore.kernel.org/r/20240529080358.703784-7-xu.yang_2@nxp.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-03 16:46:05 -07:00
Ian Rogers	1059fb5291	perf dsos: When adding a dso into sorted dsos maintain the sort order dsos__add would add at the end of the dso array possibly requiring a later find to re-sort the array. Patterns of find then add were becoming O(nlog n) due to the sorts. Change the add routine to be O(n) rather than O(1) but to maintain the sorted-ness of the dsos array so that later finds don't need the O(nlog n) sort. Fixes: `3f4ac23a99` ("perf dsos: Switch backing storage to array from rbtree/list") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Steinar Gunderson <sesse@google.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Matt Fleming <matt@readmodwrite.com> Link: https://lore.kernel.org/r/20240703172117.810918-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-03 15:02:53 -07:00
Ian Rogers	feaaa8be0b	perf comm str: Avoid sort during insert The array is sorted, so just move the elements and insert in order. Fixes: `13ca628716` ("perf comm: Add reference count checking to 'struct comm_str'") Reported-by: Matt Fleming <matt@readmodwrite.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Matt Fleming <matt@readmodwrite.com> Cc: Steinar Gunderson <sesse@google.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240703172117.810918-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-03 14:59:15 -07:00
Abhishek Dubey	2eae307ec5	perf report: Calling available function for stats printing For printing dump_trace, just use existing stats_print() function. Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com> Link: https://lore.kernel.org/r/20240628183224.452055-1-adubey@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-02 16:01:23 -07:00
Adrian Hunter	b40934ae32	perf intel-pt: Fix exclude_guest setting In the past, the exclude_guest setting has had no effect on Intel PT tracing, but that may not be the case in the future. Set the flag correctly based upon whether KVM is using Intel PT "Host/Guest" mode, which is determined by the kvm_intel module parameter pt_mode: pt_mode=0 System-wide mode : host and guest output to host buffer pt_mode=1 Host/Guest mode : host/guest output to host/guest buffers respectively Fixes: `6e86bfdc4a` ("perf intel-pt: Support decoding of guest kernel") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240625104532.11990-3-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-02 15:24:07 -07:00
Adrian Hunter	36b4cd990a	perf intel-pt: Fix aux_watermark calculation for 64-bit size aux_watermark is a u32. For a 64-bit size, cap the aux_watermark calculation at UINT_MAX instead of truncating it to 32-bits. Fixes: `874fc35cdd` ("perf intel-pt: Use aux_watermark") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240625104532.11990-2-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-02 15:24:01 -07:00
Namhyung Kim	74ad3cb08b	Merge remote-tracking branch 'perf-tools' into perf-tools-next Merge fixes and updates in v6.10 into perf-tools-next to resolve changes in synthesizing the LOST_SAMPLES records and build fixes. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-07-02 11:51:32 -07:00
Madadi Vineeth Reddy	a7cacaa088	perf sched replay: Fix -r/--repeat command line option for infinity Currently, the -r/--repeat option accepts values from 0 and complains for -1. The help section specifies: -r, --repeat <n> repeat the workload replay N times (-1: infinite) The -r -1 option raises an error because replay_repeat is defined as an unsigned int. In the current implementation, the workload is repeated n times when -r <n> is used, except when n is 0. When -r is set to 0, the workload is also repeated once. This happens because when -r=0, the run_one_test function is not called. (Note that mutex unlocking, which is essential for child threads spawned to emulate the workload, happens in run_one_test.) However, mutex unlocking is still performed in the destroy_tasks function. Thus, -r=0 results in the workload running once coincidentally. To clarify and maintain the existing logic for -r >= 1 (which runs the workload the specified number of times) and to fix the issue with infinite runs, make -r=0 perform an infinite run. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240628071821.15264-1-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-06-28 12:55:46 -07:00
Yang Li	5484fd2767	perf: pmus: Remove unneeded semicolon ./tools/perf/util/pmu.c:1776:49-50: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9443 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20240628053049.44521-1-yang.lee@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-06-28 12:55:29 -07:00
Namhyung Kim	b195701e9f	perf stat: Use field separator in the metric header It didn't use the passed field separator (using -x option) when it prints the metric headers and always put "," between the fields. Before: $ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true core,cpus,% tma_core_bound: <<<--- here: "core,cpus," but ":" expected S0-D0-C0:2:10.5: S0-D0-C1:2:14.8: S0-D0-C2:2:9.9: S0-D0-C3:2:13.2: After: $ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true core:cpus:% tma_core_bound: S0-D0-C0:2:10.5: S0-D0-C1:2:15.0: S0-D0-C2:2:16.5: S0-D0-C3:2:12.5: Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20240628000604.1296808-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-06-28 10:58:09 -07:00
Namhyung Kim	caa463bb79	perf stat: Fix a segfault with --per-cluster --metric-only The new --per-cluster option was added recently but it forgot to update the aggr_header fields which are used for --metric-only option. And it resulted in a segfault due to NULL string in fputs(). Fixes: `cbc917a1b0` ("perf stat: Support per-cluster aggregation") Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20240628000604.1296808-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2024-06-28 10:56:03 -07:00
James Clark	7afbf90ea2	perf pmu: Don't de-duplicate core PMUs Arm PMUs have a suffix, either a single decimal (armv8_pmuv3_0) or 3 hex digits which (armv8_cortex_a53) which Perf assumes are both strippable suffixes for the purposes of deduplication. S390 "cpum_cf" is a similarly suffixed core PMU but is only two characters so is not treated as strippable because the rules are a minimum of 3 hex characters or 1 decimal character. There are two paths involved in listing PMU events: * HW/cache event printing assumes core PMUs don't have suffixes so doesn't try to strip. * Sysfs PMU events share the printing function with uncore PMUs which strips. This results in slightly inconsistent Perf list behavior if a core PMU has a suffix: # perf list ... armv8_pmuv3_0/branch-load-misses/ armv8_pmuv3/l3d_cache_wb/ [Kernel PMU event] ... Fix it by partially reverting back to the old list behavior where stripping was only done for uncore PMUs. For example commit `8d9f5146f5` ("perf pmus: Sort pmus by name then suffix") mentions that only PMUs starting 'uncore_' are considered to have a potential suffix. This change doesn't go back that far, but does only strip PMUs that are !is_core. This keeps the desirable behavior where the many possibly duplicated uncore PMUs aren't repeated, but it doesn't break listing for core PMUs. Searching for a PMU continues to use the new stripped comparison functions, meaning that it's still possible to request an event by specifying the common part of a PMU name, or even open events on multiple similarly named PMUs. For example: # perf stat -e armv8_cortex/inst_retired/ 5777173628 armv8_cortex_a53/inst_retired/ (99.93%) 7469626951 armv8_cortex_a57/inst_retired/ (49.88%) Fixes: `3241d46f5f` ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu") Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: robin.murphy@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240626145448.896746-3-james.clark@arm.com	2024-06-27 20:28:12 -07:00
James Clark	3e0bf9fde2	perf pmu: Restore full PMU name wildcard support Commit `b2b9d3a3f0` ("perf pmu: Support wildcards on pmu name in dynamic pmu events") gives the following example for wildcarding a subset of PMUs: E.g., in a system with the following dynamic pmus: mypmu_0 mypmu_1 mypmu_2 mypmu_4 perf stat -e mypmu_[01]/<config>/ Since commit `f91fa2ae63` ("perf pmu: Refactor perf_pmu__match()"), only "*" has been supported, removing the ability to subset PMUs, even though parse-events.l still supports ? and [] characters. Fix it by using fnmatch() when any glob character is detected and add a test which covers that and other scenarios of perf_pmu__match_ignoring_suffix(). Fixes: `f91fa2ae63` ("perf pmu: Refactor perf_pmu__match()") Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: robin.murphy@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240626145448.896746-2-james.clark@arm.com	2024-06-27 20:28:01 -07:00
Namhyung Kim	4553c431e7	perf report: Display pregress bar on redirected pipe data It's possible to save pipe output of perf record into a file. $ perf record -o- ... > pipe.data And you can use the data same as the normal perf data. $ perf report -i pipe.data In that case, perf tools will treat the input as a pipe, but it can get the total size of the input. This means it can show the progress bar unlike the normal pipe input (which doesn't know the total size in advance). While at it, fix the string in __perf_session__process_dir_events(). Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240627181916.1202110-1-namhyung@kernel.org	2024-06-27 19:58:07 -07:00
Veronika Molnarova	e8b86f0311	perf test stat_bpf_counter.sh: Stabilize the test results The test has been failing for some time when two separate runs of perf benchmarks are recorded for cycles events and their counts are compared, while once the recording was done with option --bpf-counters and once without it. It is expected that the count of the samples should be within a certain range, firstly the difference was set to be within 10%, which was then later raised to 20%. However, the test case keeps failing on certain architectures as recording the provided benchmark can produce completely different counts based on the current load of the system. Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t": Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t': 396782898 cycles 0.010051983 seconds time elapsed 0.008664000 seconds user 0.097058000 seconds sys Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t': 1431133032 cycles 0.021803714 seconds time elapsed 0.023377000 seconds user 0.349918000 seconds sys , which is ranging from 400mil to 1400mil samples. Instead of recording the cycles use instructions event, which provides more stable values. At the same time change the tested workload to one of the provided testing workloads by perf that is not based on a scheduler, which can provide another dependency on the current load. Sampling instructions event with the new workload provide much more stable results on intel-eaglestream-spr-13 of "perf stat --no-big-num -e instructions -- perf test -w brstack": Performance counter stats for 'perf test -w brstack': 64584494 instructions 0.009173945 seconds time elapsed 0.007262000 seconds user 0.002071000 seconds sys Performance counter stats for 'perf test -w brstack': 64672669 instructions 0.008888135 seconds time elapsed 0.005018000 seconds user 0.004018000 seconds sys Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: mpetlan@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625092001.10909-1-vmolnaro@redhat.com	2024-06-26 11:10:58 -07:00
Ian Rogers	e4b19e2cc3	perf python: Clean up build dependencies The python build now depends on libraries and doesn't use python-ext-sources except for the util/python.c dependency. Switch to just directly depending on that file and util/setup.py. This allows the removal of python-ext-sources. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-9-irogers@google.com	2024-06-26 11:10:25 -07:00
Ian Rogers	9dabf40034	perf python: Switch module to linking libraries from building source setup.py was building most perf sources causing setup.py to mimic the Makefile logic as well as flex/bison code to be stubbed out, due to complexity building. By using libraries fewer functions are stubbed out, the build is faster and the Makefile logic is reused which should simplify updating. The libraries are passed through LDFLAGS to avoid complexity in python. Force the -fPIC flag for libbpf.a to ensure it is suitable for linking into the perf python module. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-8-irogers@google.com	2024-06-26 11:08:00 -07:00
Ian Rogers	e467705a9f	perf util: Make util its own library Make the util directory into its own library. This is done to avoid compiling code twice, once for the perf tool and once for the perf python module. For convenience: arch/common.c scripts/perl/Perf-Trace-Util/Context.c scripts/python/Perf-Trace-Util/Context.c are made part of this library. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-7-irogers@google.com	2024-06-26 11:07:42 -07:00
Ian Rogers	21cc3bc00a	perf bench: Make bench its own library Make the benchmark code into a library so it may be linked against things like the python module to avoid compiling code twice. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-6-irogers@google.com	2024-06-26 11:07:28 -07:00
Ian Rogers	1dad99af1a	perf test: Make tests its own library Make the tests code its own library. This is done to avoid compiling code twice, once for the perf tool and once for the perf python module. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-5-irogers@google.com	2024-06-26 11:07:11 -07:00
Ian Rogers	49f4ac4b94	perf pmu-events: Make pmu-events a library Make pmu-events into a library so it may be linked against things like the python module and not built from source. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-4-irogers@google.com	2024-06-26 11:06:54 -07:00
Ian Rogers	39f3ce5cab	perf ui: Make ui its own library Make the ui code its own library. This is done to avoid compiling code twice, once for the perf tool and once for the perf python module. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-3-irogers@google.com	2024-06-26 11:06:34 -07:00
Ian Rogers	7f240209ba	perf build: Add '*.a' to clean targets Fix some excessively long lines by deploying '\'. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nick Terrell <terrelln@fb.com> Cc: Gary Guo <gary@garyguo.net> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Guo Ren <guoren@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: John Garry <john.g.garry@oracle.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240625214117.953777-2-irogers@google.com	2024-06-26 11:06:05 -07:00
Shenlin Liang	da7b1b525e	perf kvm/riscv: Port perf kvm stat to RISC-V 'perf kvm stat report/record' generates a statistical analysis of KVM events and can be used to analyze guest exit reasons. "report" reports statistical analysis of guest exit events. To record kvm events on the host: # perf kvm stat record -a To report kvm VM EXIT events: # perf kvm stat report --event=vmexit Signed-off-by: Shenlin Liang <liangshenlin@eswincomputing.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240422080833.8745-3-liangshenlin@eswincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>	2024-06-26 18:37:38 +05:30
Namhyung Kim	c7a5592e8e	perf mem: Fix a segfault with NULL event->name Guilherme reported a crash in perf mem record. It's because the perf_mem_event->name was NULL on his machine. It should just return a NULL string when it has no format string in the name. The backtrace at the crash is below: Program received signal SIGSEGV, Segmentation fault. __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67 67 vmovdqu (%rdi), %ymm2 (gdb) bt #0 __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67 #1 0x00007ffff6c982de in __find_specmb (format=0x0) at printf-parse.h:82 #2 __printf_buffer (buf=buf@entry=0x7fffffffc760, format=format@entry=0x0, ap=ap@entry=0x7fffffffc880, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:649 #3 0x00007ffff6cb7840 in __vsnprintf_internal (string=<optimized out>, maxlen=<optimized out>, format=0x0, args=0x7fffffffc880, mode_flags=mode_flags@entry=0) at vsnprintf.c:96 #4 0x00007ffff6cb787f in ___vsnprintf (string=<optimized out>, maxlen=<optimized out>, format=<optimized out>, args=<optimized out>) at vsnprintf.c:103 #5 0x00005555557b9391 in scnprintf (buf=0x555555fe9320 <mem_loads_name> "", size=100, fmt=0x0) at ../lib/vsprintf.c:21 #6 0x00005555557b74c3 in perf_pmu__mem_events_name (i=0, pmu=0x555556832180) at util/mem-events.c:106 #7 0x00005555557b7ab9 in perf_mem_events__record_args (rec_argv=0x55555684c000, argv_nr=0x7fffffffca20) at util/mem-events.c:252 #8 0x00005555555e370d in __cmd_record (argc=3, argv=0x7fffffffd760, mem=0x7fffffffcd80) at builtin-mem.c:156 #9 0x00005555555e49c4 in cmd_mem (argc=4, argv=0x7fffffffd760) at builtin-mem.c:514 #10 0x000055555569716c in run_builtin (p=0x555555fcde80 <commands+672>, argc=8, argv=0x7fffffffd760) at perf.c:349 #11 0x0000555555697402 in handle_internal_command (argc=8, argv=0x7fffffffd760) at perf.c:402 #12 0x0000555555697560 in run_argv (argcp=0x7fffffffd59c, argv=0x7fffffffd590) at perf.c:446 #13 0x00005555556978a6 in main (argc=8, argv=0x7fffffffd760) at perf.c:562 Reported-by: Guilherme Amadio <amadio@cern.ch> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Closes: https://lore.kernel.org/linux-perf-users/Zlns_o_IE5L28168@cern.ch Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240621170528.608772-5-namhyung@kernel.org	2024-06-25 11:06:20 -07:00
Namhyung Kim	0eb739d87f	perf tools: Fix a compiler warning of NULL pointer A compiler warning on the second argument of bsearch() should not be NULL, but there's a case we might pass it. Let's return early if we don't have any DSOs to search in __dsos__find_by_longname_id(). util/dsos.c:184:8: runtime error: null pointer passed as argument 2, which is declared to never be null Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Closes: https://lore.kernel.org/oe-lkp/202406180932.84be448c-oliver.sang@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240621170528.608772-4-namhyung@kernel.org	2024-06-25 11:06:20 -07:00
Namhyung Kim	e988a5b53e	perf symbol: Simplify kernel module checking In dso__load(), it checks if the dso is a kernel module by looking the symtab type. Actually dso has 'is_kmod' field to check that easily and dso__set_module_info() set the symtab type and the is_kmod bit. So it should have the same result to check the is_kmod bit. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240621170528.608772-3-namhyung@kernel.org	2024-06-25 11:06:20 -07:00
Namhyung Kim	cb39d05e67	perf report: Fix condition in sort__sym_cmp() It's expected that both hist entries are in the same hists when comparing two. But the current code in the function checks one without dso sort key and other with the key. This would make the condition true in any case. I guess the intention of the original commit was to add '!' for the right side too. But as it should be the same, let's just remove it. Fixes: `69849fc5d2` ("perf hists: Move sort__has_dso into struct perf_hpp_list") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240621170528.608772-2-namhyung@kernel.org	2024-06-25 11:06:20 -07:00
Junhao He	dd9a426ead	perf pmus: Fixes always false when compare duplicates aliases In the previous loop, all the members in the aliases[j-1] have been freed and set to NULL. But in this loop, the function pmu_alias_is_duplicate() compares the aliases[j] with the aliases[j-1] that has already been disposed, so the function will always return false and duplicate aliases will never be discarded. If we find duplicate aliases, it skips the zfree aliases[j], which is accompanied by a memory leak. We can use the next aliases[j+1] to theck for duplicate aliases to fixes the aliases NULL pointer dereference, then goto zfree code snippet to release it. After patch testing: $ perf list --unit=hisi_sicl,cpa pmu uncore cpa: cpa_p0_rd_dat_32b [Number of read ops transmitted by the P0 port which size is 32 bytes. Unit: hisi_sicl,cpa] cpa_p0_rd_dat_64b [Number of read ops transmitted by the P0 port which size is 64 bytes. Unit: hisi_sicl,cpa] Fixes: `c3245d2093` ("perf pmu: Abstract alias/event struct") Signed-off-by: Junhao He <hejunhao3@huawei.com> Cc: ravi.bangoria@amd.com Cc: james.clark@arm.com Cc: prime.zeng@hisilicon.com Cc: cuigaosheng1@huawei.com Cc: jonathan.cameron@huawei.com Cc: linuxarm@huawei.com Cc: yangyicong@huawei.com Cc: robh@kernel.org Cc: renyu.zj@linux.alibaba.com Cc: kjain@linux.ibm.com Cc: john.g.garry@oracle.com Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240614094318.11607-1-hejunhao3@huawei.com	2024-06-25 11:06:20 -07:00
Yunseong Kim	83da316a3b	perf unwind-libunwind: Add malloc() failure handling Add malloc() failure handling in unread_unwind_spec_debug_frame(). This make caller find_proc_info() works well when the allocation failure. Signed-off-by: Yunseong Kim <yskelg@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Austin Kim <austindh.kim@gmail.com> Cc: shjy180909@gmail.com Cc: Ze Gao <zegao2021@gmail.com> Cc: Leo Yan <leo.yan@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240619204211.6438-2-yskelg@gmail.com	2024-06-25 11:06:20 -07:00
Yunseong Kim	e9ffa312ff	util: constant -1 with expression of type char This patch resolve following warning. tools/perf/util/evsel.c:1620:9: error: result of comparison of constant -1 with expression of type 'char' is always false -Werror,-Wtautological-constant-out-of-range-compare 1620 \| if (c == -1) \| ~ ^ ~~ Signed-off-by: Yunseong Kim <yskelg@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Austin Kim <austindh.kim@gmail.com> Cc: shjy180909@gmail.com Cc: Ze Gao <zegao2021@gmail.com> Cc: Leo Yan <leo.yan@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240619203428.6330-2-yskelg@gmail.com	2024-06-25 11:06:20 -07:00
Fernand Sieber	d363c2a880	perf: Timehist account sch delay for scheduled out running When using perf timehist, sch delay is only computed for a waking task, not for a pre empted task. This patches changes sch delay to account for both. This makes sense as testing scheduling policy need to consider the effect of scheduling delay globally, not only for waking tasks. Example of `perf timehist` report before the patch for `stress` task competing with each other. First column is wait time, second column sch delay, third column runtime. 1.492060 [0000] s stress[81] 1.999 0.000 2.000 R next: stress[83] 1.494060 [0000] s stress[83] 2.000 0.000 2.000 R next: stress[81] 1.496060 [0000] s stress[81] 2.000 0.000 2.000 R next: stress[83] 1.498060 [0000] s stress[83] 2.000 0.000 1.999 R next: stress[81] After the patch, it looks like this (note that all wait time is not zero anymore): 1.492060 [0000] s stress[81] 1.999 1.999 2.000 R next: stress[83] 1.494060 [0000] s stress[83] 2.000 2.000 2.000 R next: stress[81] 1.496060 [0000] s stress[81] 2.000 2.000 2.000 R next: stress[83] 1.498060 [0000] s stress[83] 2.000 2.000 1.999 R next: stress[81] Signed-off-by: Fernand Sieber <sieberf@amazon.com> Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240618090339.87482-1-sieberf@amazon.com	2024-06-25 11:06:20 -07:00
Adrian Hunter	fcd094e52b	perf tests: Add APX and other new instructions to x86 instruction decoder test Add samples of APX and other new instructions to the 'x86 instruction decoder - new instructions' test. Note the test is only available if the perf tool has been built with EXTRA_TESTS=1. Example: $ make EXTRA_TESTS=1 -C tools/perf $ tools/perf/perf test -F -v 'new ins' \|& grep -i 'jmpabs\\|popp\\|pushp' Decoded ok: d5 00 a1 ef cd ab 90 78 56 34 12 jmpabs $0x1234567890abcdef Decoded ok: d5 08 53 pushp %rbx Decoded ok: d5 18 50 pushp %r16 Decoded ok: d5 19 57 pushp %r31 Decoded ok: d5 19 5f popp %r31 Decoded ok: d5 18 58 popp %r16 Decoded ok: d5 08 5b popp %rbx Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-11-adrian.hunter@intel.com	2024-06-25 11:06:19 -07:00
Adrian Hunter	a44abd2c4c	perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory REX2 prefix. JMPABS is designed to be used in the procedure linkage table (PLT) to replace indirect jumps, because it has better performance. In that case the jump target will be amended at run time. To enable Intel PT to follow the code, a TIP packet is always emitted when JMPABS is traced under Intel PT. Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture Specification for details. Decode JMPABS as an indirect jump, because it has an associated TIP packet the same as an indirect jump and the control flow should follow the TIP packet payload, and not assume it is the same as the on-file object code JMPABS target address. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-10-adrian.hunter@intel.com	2024-06-25 11:06:19 -07:00
Chaitanya S Prakash	abc0f0c444	perf test: Check output of the probe ... --funcs command Test "perf probe of function from different CU" only checks if the perf command has failed and doesn't test the --funcs output. In the issue reported in the previous commit, the garbage output of the --funcs command was being ignored by the test when it could have been caught. The script first makes use of --funcs option with the perf probe command to check if the function "foo" exists in the testfile before adding a probe to it in the next command. The output of probe...--funcs command is redirected to stdout, therefore, add '\| grep "foo"' to validate the result. Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: anshuman.khandual@arm.com Cc: james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240601125946.1741414-11-ChaitanyaS.Prakash@arm.com	2024-06-25 11:06:19 -07:00
Athira Rajeev	7d49ced808	tools/perf: Fix parallel-perf python script to replace new python syntax ":=" usage perf test "perf script tests" fails as below in systems with python 3.6 File "/home/athira/linux/tools/perf/tests/shell/../../scripts/python/parallel-perf.py", line 442 if line := p.stdout.readline(): ^ SyntaxError: invalid syntax --- Cleaning up --- ---- end(-1) ---- 92: perf script tests: FAILED! This happens because ":=" is a new syntax that assigns values to variables as part of a larger expression. This is introduced from python 3.8 and hence fails in setup with python 3.6 Address this by splitting the large expression and check the value in two steps: Previous line: if line := p.stdout.readline(): Current change: line = p.stdout.readline() if line: With patch ./perf test "perf script tests" 93: perf script tests: Ok Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240623064850.83720-3-atrajeev@linux.vnet.ibm.com	2024-06-25 11:06:19 -07:00
Athira Rajeev	b9241f150a	tools/perf: Use is_perf_pid_map_name helper function to check dso's of pattern /tmp/perf-%d.map commit `80d496be89` ("perf report: Add support for profiling JIT generated code") added support for profiling JIT generated code. This patch handles dso's of form "/tmp/perf-$PID.map". Some of the references doesn't check exactly for same pattern. some uses "if (!strncmp(dso_name, "/tmp/perf-", 10))". Fix this by using helper function perf_pid_map_tid and is_perf_pid_map_name which looks for proper pattern of form: "/tmp/perf-$PID.map" for these checks. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240623064850.83720-2-atrajeev@linux.vnet.ibm.com	2024-06-25 11:01:34 -07:00
Athira Rajeev	b0979f008f	tools/perf: Fix the string match for "/tmp/perf-$PID.map" files in dso__load Perf test for perf probe of function from different CU fails as below: ./perf test -vv "test perf probe of function from different CU" 116: test perf probe of function from different CU: --- start --- test child forked, pid 2679 Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.Msa7iy89bx/testfile Error: Failed to add events. --- Cleaning up --- "foo" does not hit any event. Error: Failed to delete events. ---- end(-1) ---- 116: test perf probe of function from different CU : FAILED! The test does below to probe function "foo" : # gcc -g -Og -flto -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.c -o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o # gcc -g -Og -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.c -o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o # gcc -g -Og -o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o # ./perf probe -x /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile foo Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile Error: Failed to add events. Perf probe fails to find symbol foo in the executable placed in /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7 Simple reproduce: # mktemp -d /tmp/perf-checkXXXXXXXXXX /tmp/perf-checkcWpuLRQI8j # gcc -g -o test test.c # cp test /tmp/perf-checkcWpuLRQI8j/ # nm /tmp/perf-checkcWpuLRQI8j/test \| grep foo 00000000100006bc T foo # ./perf probe -x /tmp/perf-checkcWpuLRQI8j/test foo Failed to find symbol foo in /tmp/perf-checkcWpuLRQI8j/test Error: Failed to add events. But it works with any files like /tmp/perf/test. Only for patterns with "/tmp/perf-", this fails. Further debugging, commit `80d496be89` ("perf report: Add support for profiling JIT generated code") added support for profiling JIT generated code. This patch handles dso's of form "/tmp/perf-$PID.map" . The check used "if (strncmp(self->name, "/tmp/perf-", 10) == 0)" to match "/tmp/perf-$PID.map". With this commit, any dso in /tmp/perf- folder will be considered separately for processing (not only JIT created map files ). Fix this by changing the string pattern to check for "/tmp/perf-%d.map". Add a helper function is_perf_pid_map_name to do this check. In "struct dso", dso->long_name holds the long name of the dso file. Since the /tmp/perf-$PID.map check uses the complete name, use dso___long_name for the string name. With the fix, # ./perf test "test perf probe of function from different CU" 117: test perf probe of function from different CU : Ok Fixes: `56cbeacf14` ("perf probe: Add test for regression introduced by switch to die_get_decl_file()") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240623064850.83720-1-atrajeev@linux.vnet.ibm.com	2024-06-25 11:00:43 -07:00
James Clark	ff16aeb9b8	perf test: Make test_arm_callgraph_fp.sh more robust The 2 second sleep can cause the test to fail on very slow network file systems because Perf ends up being killed before it finishes starting up. Fix it by making the leafloop workload end after a fixed time like the other workloads so there is no need to kill it after 2 seconds. Also remove the 1 second start sampling delay because it is similarly fragile. Instead, search through all samples for a matching one, rather than just checking the first sample and hoping it's in the right place. Fixes: `cd6382d827` ("perf test arm64: Test unwinding using fame-pointer (fp) mode") Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: German Gomez <german.gomez@arm.com> Cc: Spoorthy S <spoorts2@in.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240612140316.3006660-1-james.clark@arm.com	2024-06-24 14:42:59 -07:00
Guilherme Amadio	366e17409f	perf build: Ensure libtraceevent and libtracefs versions have 3 components When either of these have a shorter version, like 1.8, the expression that computes the version has a syntax error that can be seen in the output of make: expr: syntax error: missing argument after + Link: https://bugs.gentoo.org/917559 Reported-by: Peter Volkov <peter.volkov@gmail.com> Signed-off-by: Guilherme Amadio <amadio@gentoo.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240606153625.2255470-3-amadio@gentoo.org	2024-06-21 16:19:12 -07:00
Guilherme Amadio	0f0e1f4456	perf build: Use pkg-config for feature check for libtrace{event,fs} Needed to add required include directories for the feature detection to succeed. The header tracefs.h is installed either into the include directory /usr/include/tracefs/tracefs.h when using the Makefile, or into /usr/include/libtracefs/tracefs.h when using meson to build libtracefs. The header tracefs.h uses #include <event-parse.h> from libtraceevent, so pkg-config needs to pick the correct include directory for libtracefs and add the one for libtraceevent to succeed. Note that in `baa2ca59ec` the variable LIBTRACEEVENT_DIR was introduced, and now the method to compile against non-standard locations requires PKG_CONFIG_PATH to be set instead, which works for both libtraceevent and libtracefs. Signed-off-by: Guilherme Amadio <amadio@gentoo.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240606153625.2255470-2-amadio@gentoo.org	2024-06-21 16:19:03 -07:00
Ian Rogers	5518063fcb	perf arm: Workaround ARM PMUs cpu maps having offline cpus When PMUs have a cpu map in the 'cpus' or 'cpumask' file, perf will try to open events on those CPUs. ARM doesn't remove offline CPUs meaning taking a CPU offline will cause perf commands to fail unless a CPU map is passed on the command line. More context in: https://lore.kernel.org/lkml/20240603092812.46616-1-yangyicong@huawei.com/ Reported-by: Yicong Yang <yangyicong@huawei.com> Closes: https://lore.kernel.org/lkml/20240603092812.46616-2-yangyicong@huawei.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607065343.695369-1-irogers@google.com	2024-06-21 15:50:29 -07:00
Kan Liang	3612ca8e29	perf stat: Fix the hard-coded metrics calculation on the hybrid The hard-coded metrics is wrongly calculated on the hybrid machine. $ perf stat -e cycles,instructions -a sleep 1 Performance counter stats for 'system wide': 18,205,487 cpu_atom/cycles/ 9,733,603 cpu_core/cycles/ 9,423,111 cpu_atom/instructions/ # 0.52 insn per cycle 4,268,965 cpu_core/instructions/ # 0.23 insn per cycle The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44. When finding the metric events, the find_stat() doesn't take the PMU type into account. The cpu_atom/cycles/ is wrongly used to calculate the IPC of the cpu_core. In the hard-coded metrics, the events from a different PMU are only SW_CPU_CLOCK and SW_TASK_CLOCK. They both have the stat type, STAT_NSECS. Except the SW CLOCK events, check the PMU type as well. Fixes: `0a57b91080` ("perf stat: Use counts rather than saved_value") Reported-by: Khalil, Amiri <amiri.khalil@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240606180316.4122904-1-kan.liang@linux.intel.com	2024-06-21 09:43:09 -07:00
Ian Rogers	788c516052	perf vendor events: Add westmereex counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-38-irogers@google.com	2024-06-20 16:56:57 -07:00
Ian Rogers	dc5f18a102	perf vendor events: Add westmereep-sp counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-37-irogers@google.com	2024-06-20 16:56:50 -07:00
Ian Rogers	22123c26de	perf vendor events: Add westmereep-dp counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-36-irogers@google.com	2024-06-20 16:56:44 -07:00
Ian Rogers	321e0ffa1a	perf vendor events: Add/update tigerlake events/metrics Update events from v1.15 to v1.16. Update TMA metrics from v4.7 to v4.8. Bring in the event updates v1.16: `43f3b8d6f8` The TMA 4.8 information was added in: `59194d4d90` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-35-irogers@google.com	2024-06-20 16:56:37 -07:00
Ian Rogers	7c79eb5cc2	perf vendor events: Add snowridgex counter information Update/remove events as per v1.23: `9debd874e1` Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-34-irogers@google.com	2024-06-20 16:56:30 -07:00
Ian Rogers	4c10b96f49	perf vendor events: Add/update skylakex events/metrics Update events from v1.33 to v1.35. Update TMA metrics from v4.7 to v4.8. Bring in the event updates v1.35: `c99b60c147` The TMA 4.8 information was added in: `59194d4d90` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ Adds the event SW_PREFETCH_ACCESS.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-33-irogers@google.com	2024-06-20 16:56:23 -07:00
Ian Rogers	e2641db83f	perf vendor events: Add/update skylake events/metrics Update events from v58 to v59. Update TMA metrics from v4.7 to v4.8. Bring in the event updates v59: `5d36f1835b` The TMA 4.8 information was added in: `59194d4d90` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ Adds the event SW_PREFETCH_ACCESS.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-32-irogers@google.com	2024-06-20 16:56:17 -07:00
Ian Rogers	caccae3ce7	perf vendor events: Add silvermont counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-31-irogers@google.com	2024-06-20 16:56:10 -07:00
Ian Rogers	951bf72ace	perf vendor events: Add/update sierraforest events/metrics Update events from v1.02 to v1.04. Add TMA metrics v4.8. Bring in the event updates v1.04: `0a9546cdf6` v1.03: `c7dd26ce67` The TMA 4.8 information was added in: `59194d4d90` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ New events are: FP_INST_RETIRED.128B_DP, FP_INST_RETIRED.128B_SP, FP_INST_RETIRED.256B_DP, FP_INST_RETIRED.32B_SP, FP_INST_RETIRED.64B_DP, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD, OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM, OCR.STREAMING_WR.ANY_RESPONSE, UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_REMOTE, UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOM_REMOTE, UNC_CHA_TOR_INSERTS.IO_MISS, UNC_CHA_TOR_INSERTS.IO_MISS_ITOM, UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR, UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL, UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE, UNC_CXLCM_RxC_PACK_BUF_INSERTS.MEM_DATA, UNC_CXLDP_TxC_AGF_INSERTS.M2S_DATA. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-30-irogers@google.com	2024-06-20 16:56:02 -07:00
Ian Rogers	5ecf682e61	perf vendor events: Add/update sapphirerapids events/metrics Update events from v1.20 to v1.23. Update TMA metrics from v4.7 to v4.8. Bring in the event updates v1.23: `6ace93281c` v1.22: `356eba05c0` The TMA 4.8 information was added in: `59194d4d90` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ New events are: EXE_ACTIVITY.2_3_PORTS_UTIL, ICACHE_DATA.STALL_PERIODS, L2_TRANS.L2_WB, MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024, OFFCORE_REQUESTS.DEMAND_CODE_RD, OFFCORE_REQUESTS.DEMAND_RFO, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD, OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD, RS.EMPTY_RESOURCE, SW_PREFETCH_ACCESS.ANY, UOPS_ISSUED.CYCLES. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-29-irogers@google.com	2024-06-20 16:55:55 -07:00
Ian Rogers	01cb5e3d98	perf vendor events: Update sandybridge metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-28-irogers@google.com	2024-06-20 16:55:45 -07:00
Ian Rogers	bf0dd1f47f	perf vendor events: Add/update rocketlake events/metrics Update events from v1.02 to v1.03. Update TMA metrics from v4.7 to v4.8. Bring in the event updates v1.03: `a7c75ffd56` The TMA 4.8 information was added in: `59194d4d90` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ Adds the event SW_PREFETCH_ACCESS.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-27-irogers@google.com	2024-06-20 16:55:38 -07:00
Ian Rogers	d697772252	perf vendor events: Add nehalemex counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-26-irogers@google.com	2024-06-20 16:55:32 -07:00
Ian Rogers	af557589c4	perf vendor events: Add nehalemep counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-25-irogers@google.com	2024-06-20 16:55:25 -07:00
Ian Rogers	3323532ae5	perf vendor events: Update meteorlake events and add counter information Update events from v1.08 to v1.10. Bring in the event updates v1.10: `3bee3dc150` v1.09: `01c8c99f17` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ New events are: EXE_ACTIVITY.2_3_PORTS_UTIL, FP_INST_RETIRED.128B_DP, FP_INST_RETIRED.128B_SP, FP_INST_RETIRED.256B_DP, FP_INST_RETIRED.32B_SP, FP_INST_RETIRED.64B_DP, FP_VINT_UOPS_EXECUTED.STD, L2_LINES_OUT.USELESS_HWPF, L2_RQSTS.SWPF_HIT, L2_RQSTS.SWPF_MISS, LOAD_HIT_PREFETCH.SWPF, MACHINE_CLEARS.ANY, MACHINE_CLEARS.MRN_NUKE, MISC_RETIRED.LBR_INSERTS, SW_PREFETCH_ACCESS.ANY. The metrics aren't updated as they require retirement latency support that is added in this series: https://lore.kernel.org/lkml/20240613033631.199800-1-weilin.wang@intel.com/ Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-24-irogers@google.com	2024-06-20 16:55:17 -07:00
Ian Rogers	82eff6ee67	perf vendor events: Add lunarlake counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-23-irogers@google.com	2024-06-20 16:55:09 -07:00
Ian Rogers	025cce253b	perf vendor events: Add knightslanding counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-22-irogers@google.com	2024-06-20 16:54:59 -07:00
Ian Rogers	8791622572	perf vendor events: Update jaketown metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-21-irogers@google.com	2024-06-20 16:54:47 -07:00
Ian Rogers	3235704cbd	perf vendor events: Update ivytown metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-20-irogers@google.com	2024-06-20 16:54:40 -07:00
Ian Rogers	238a2117cc	perf vendor events: Update ivybridge metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-19-irogers@google.com	2024-06-20 16:54:31 -07:00
Ian Rogers	fab88961e2	perf vendor events: Add/update icelakex events/metrics Update events from v1.24 to v1.26. Add TMA metrics v4.8. Bring in the event updates v1.26: `c607c739e0` v1.25: `42d9967690` The TMA 4.8 information was added in: `59194d4d90` Adds the event SW_PREFETCH_ACCESS.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-18-irogers@google.com	2024-06-20 16:54:24 -07:00
Ian Rogers	91b5989212	perf vendor events: Add/update icelake events/metrics Update events from v1.21 to v1.22. Add TMA metrics v4.8. Bring in the event updates v1.22: `e5640646e9` The TMA 4.8 information was added in: `59194d4d90` Adds the event SW_PREFETCH_ACCESS.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-17-irogers@google.com	2024-06-20 16:54:16 -07:00
Ian Rogers	11c2302c9e	perf vendor events: Update haswellx metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-16-irogers@google.com	2024-06-20 16:54:10 -07:00
Ian Rogers	b59307d0ed	perf vendor events: Add haswell counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-15-irogers@google.com	2024-06-20 16:54:03 -07:00
Ian Rogers	917f63ad75	perf vendor events: Update graniterapids events and add counter information Update events from v1.01 to v1.02. Bring in the event updates v1.02: `0ff9f681bd` Add counter information. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ There are over 1000 new events. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-14-irogers@google.com	2024-06-20 16:53:57 -07:00
Ian Rogers	39c1471e3e	perf vendor events: Update/add grandridge events/metrics Update events from v1.02 to v1.03. Add TMA metrics v4.8. Bring in the event updates v1.03: `5ec7a252d0` The TMA 4.8 information was added in: `59194d4d90` New events are: FP_INST_RETIRED.128B_DP, FP_INST_RETIRED.128B_SP, FP_INST_RETIRED.256B_DP, FP_INST_RETIRED.32B_SP, FP_INST_RETIRED.64B_DP, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD, OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM, OCR.STREAMING_WR.ANY_RESPONSE. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-13-irogers@google.com	2024-06-20 16:53:49 -07:00
Ian Rogers	75e71be128	perf vendor events: Add goldmontplus counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-12-irogers@google.com	2024-06-20 16:53:41 -07:00
Ian Rogers	faa3591640	perf vendor events: Add goldmont counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-11-irogers@google.com	2024-06-20 16:53:31 -07:00
Ian Rogers	40ccd6aa3e	perf vendor events: Add/update emeraldrapids events/metrics Update events from v1.06 to v1.09. Add TMA metrics v4.8. Bring in the event updates v1.09: `3fd5892bb4` v1.08: `54525c4508` The TMA 4.8 information was added in: `59194d4d90` New events are: EXE_ACTIVITY.2_3_PORTS_UTIL, ICACHE_DATA.STALL_PERIODS, L2_TRANS.L2_WB, MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024, OFFCORE_REQUESTS.DEMAND_CODE_RD, OFFCORE_REQUESTS.DEMAND_RFO, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD, OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD, RS.EMPTY_RESOURCE, SW_PREFETCH_ACCESS.ANY, UNC_IIO_BANDWIDTH_OUT.PART[0-7]_FREERUN, UOPS_ISSUED.CYCLES. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-10-irogers@google.com	2024-06-20 16:53:22 -07:00
Ian Rogers	1e56e9191f	perf vendor events: Update elkhartlake events Update events from v1.04 to v1.05. Bring in event updates from: `fb91e1851c` The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-9-irogers@google.com	2024-06-20 16:53:15 -07:00
Ian Rogers	4cc4994244	perf vendor events: Update cascadelakex events/metrics Update events from v1.21 to v1.22. Bring in the event updates v1.22 `013877729c` The TMA 4.8 information was updated in: `59194d4d90` New events are: SW_PREFETCH_ACCESS.ANY Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-8-irogers@google.com	2024-06-20 16:53:06 -07:00
Ian Rogers	87835d9f85	perf vendor events: Update broadwellx metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-7-irogers@google.com	2024-06-20 16:52:49 -07:00
Ian Rogers	6a8ec0b65e	perf vendor events: Update broadwellde metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-6-irogers@google.com	2024-06-20 16:52:41 -07:00
Ian Rogers	39b8bd1635	perf vendor events: Update broadwell metrics add event counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. The TMA 4.8 information was updated in: `59194d4d90` Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-5-irogers@google.com	2024-06-20 16:52:34 -07:00
Ian Rogers	19121e877c	perf vendor events: Add bonnell counter information Add counter information necessary for optimizing event grouping the perf tool. The most recent RFC patch set using this information: https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/ The information was added in: `475892a969` and later patches. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-4-irogers@google.com	2024-06-20 16:52:24 -07:00
Ian Rogers	72da747ddd	perf vendor events: Update alderlaken events/metrics Update events from v1.24 to v1.27. Update e-core TMA metrics to v3.6. Bring in the event updates v1.27: `ea4f309a04` v1.26: `0052e68d24` The e-core TMA 3.6 information was updated in: `d9c2faa70b` New events are: MEM_UOPS_RETIRED.LOCK_LOADS, SERIALIZATION.C01_MS_SCB, UOPS_ISSUED.ANY. Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-3-irogers@google.com	2024-06-20 16:52:15 -07:00
Ian Rogers	17d4b1922c	perf vendor events: Update alderlake events/metrics Update events from v1.24 to v1.27. Update p-core TMA metrics from v4.7 to v4.8, and the e-core TMA metrics to v3.6. Bring in the event updates v1.27: `ea4f309a04` v1.26: `0052e68d24` The p-core TMA 4.8 information was updated in: `59194d4d90` And e-core in: `d9c2faa70b` New events are: EXE_ACTIVITY.2_3_PORTS_UTIL, ICACHE_DATA.STALL_PERIODS, L2_TRANS.L2_WB, MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024, MEM_UOPS_RETIRED.LOCK_LOADS, OFFCORE_REQUESTS.DEMAND_CODE_RD, OFFCORE_REQUESTS.DEMAND_RFO, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD, OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD, RS.EMPTY_RESOURCE, SERIALIZATION.C01_MS_SCB, SW_PREFETCH_ACCESS.ANY, UOPS_ISSUED.ANY, UOPS_ISSUED.CYCLES Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-2-irogers@google.com	2024-06-20 16:52:00 -07:00
Ravi Bangoria	b739759c4e	perf doc: Add AMD IBS usage document Add a perf man page document that describes how to exploit AMD IBS with Linux perf. Brief intro about IBS and simple one-liner examples will help naive users to get started. This is not meant to be an exhaustive IBS guide. User should refer latest AMD64 Architecture Programmer's Manual for detailed description of IBS. Usage: $ man perf-amd-ibs Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: ananth.narayan@amd.com Cc: sandipan.das@amd.com Cc: santosh.shukla@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620054104.815-1-ravi.bangoria@amd.com	2024-06-20 16:51:55 -07:00
Athira Rajeev	90d32e9201	tools/perf: Handle perftool-testsuite_probe testcases fail when kernel debuginfo is not present Running "perftool-testsuite_probe" fails as below: ./perf test -v "perftool-testsuite_probe" 83: perftool-testsuite_probe : FAILED There are three fails: 1. Regexp not found: "\sprobe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf probe -l (output regexp parsing) 2. Regexp not found: "probe:vfs_mknod" Regexp not found: "probe:vfs_create" Regexp not found: "probe:vfs_rmdir" Regexp not found: "probe:vfs_link" Regexp not found: "probe:vfs_write" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: wildcard adding support (command exitcode + output regexp parsing) 3. Regexp not found: "Failed to find" Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64" Regexp not found: "in this function\|at this address" Line did not match any pattern: "The /boot/vmlinux file has no debug information." Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package." These three tests depends on kernel debug info. 1. Fail 1 expects file name along with probe which needs debuginfo 2. Fail 2 : perf probe -nf --max-probes=512 -a 'vfs_ $params' Debuginfo-analysis is not supported. Error: Failed to add events. 3. Fail 3 : perf probe 'vfs_read somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64' Debuginfo-analysis is not supported. Error: Failed to add events. There is already helper function skip_if_no_debuginfo in lib/probe_vfs_getname.sh which does perf probe and returns "2" if debug info is not present. Use the skip_if_no_debuginfo function and skip only the three tests which needs debuginfo based on the result. With the patch: 83: perftool-testsuite_probe: --- start --- test child forked, pid 3927 -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -a -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: --add -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf list Regexp not found: "\s*probe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$" -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped -- [ PASS ] -- perf_probe :: test_adding_kernel :: using added probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: deleting added probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing removed probe (should NOT be listed) -- [ PASS ] -- perf_probe :: test_adding_kernel :: dry run :: adding probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: first probe adding -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (without force) -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (with force) -- [ PASS ] -- perf_probe :: test_adding_kernel :: using doubled probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: removing multiple probes Regexp not found: "probe:vfs_mknod" Regexp not found: "probe:vfs_create" Regexp not found: "probe:vfs_rmdir" Regexp not found: "probe:vfs_link" Regexp not found: "probe:vfs_write" -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped Regexp not found: "Failed to find" Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64" Regexp not found: "in this function\|at this address" Line did not match any pattern: "The /boot/vmlinux file has no debug information." Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package." -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: add -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: record -- [ PASS ] -- perf_probe :: test_adding_kernel :: function argument probing :: script ## [ PASS ] ## perf_probe :: test_adding_kernel SUMMARY ---- end(0) ---- 83: perftool-testsuite_probe : Ok Only the three specific tests are skipped and remaining ran successfully. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240617122121.7484-1-atrajeev@linux.vnet.ibm.com	2024-06-20 10:05:04 -07:00
Namhyung Kim	eae7044b67	perf hist: Honor symbol_conf.skip_empty So that it can skip events with no sample according to the config value. This can omit the dummy event in the output of perf report --group. An example output: $ sudo perf mem record -a sleep 1 $ sudo perf report --group Before) # # Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u' # Event count (approx.): 3089861 # # Overhead Command Shared Object Symbol # ........................ ........... ................. ..................................... # 9.29% 0.00% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages 5.26% 0.15% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_se 4.15% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0 3.87% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook 3.79% 0.17% 0.00% swapper [kernel.kallsyms] [k] enqueue_task_fair 3.63% 0.00% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page 2.86% 0.00% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq 2.78% 0.00% 0.00% swapper [kernel.kallsyms] [k] __schedule 2.34% 0.00% 0.00% swapper [kernel.kallsyms] [k] intel_idle 2.32% 0.97% 0.00% swapper [kernel.kallsyms] [k] psi_group_change After) # # Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P' # Event count (approx.): 3089861 # # Overhead Command Shared Object Symbol # ................ ........... ................. ..................................... # 9.29% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages 5.26% 0.15% swapper [kernel.kallsyms] [k] __update_load_avg_se 4.15% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0 3.87% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook 3.79% 0.17% swapper [kernel.kallsyms] [k] enqueue_task_fair 3.63% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page 2.86% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq 2.78% 0.00% swapper [kernel.kallsyms] [k] __schedule 2.34% 0.00% swapper [kernel.kallsyms] [k] intel_idle 2.32% 0.97% swapper [kernel.kallsyms] [k] psi_group_change Now it doesn't have a column for the dummy event. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-5-namhyung@kernel.org	2024-06-15 21:04:04 -07:00
Namhyung Kim	411ee13598	perf hist: Add symbol_conf.skip_empty Add the skip_empty flag to symbol_conf and set the value from the report command to preserve the existing behavior. This makes the code simpler and will be needed other code which is hard to add a new argument. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-4-namhyung@kernel.org	2024-06-15 21:04:04 -07:00
Namhyung Kim	8f6071a3dc	perf hist: Simplify __hpp_fmt() using hpp_fmt_data The struct hpp_fmt_data is to keep the values for each group members so it doesn't need to check the event index in the group. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-3-namhyung@kernel.org	2024-06-15 21:04:04 -07:00
Namhyung Kim	cc2621cecd	perf hist: Factor out __hpp__fmt_print() Split the logic to print the histogram values according to the format string. This was used in 3 different places so it's better to move out the logic into a function. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-2-namhyung@kernel.org	2024-06-15 21:04:04 -07:00
Fernand Sieber	231295a186	perf: sched map skips redundant lines with cpu filters perf sched map supports cpu filter. However, even with cpu filters active, any context switch currently corresponds to a separate line. As result, context switches on irrelevant cpus result to redundant lines, which makes the output particlularly difficult to read on wide architectures. Fix it by skipping printing for irrelevant CPUs. Example snippet of output before fix: B0 1.461147 secs B0 B0 B0 G0 1.517139 secs After fix: B0 1.461147 secs G0 1.517139 secs Signed-off-by: Fernand Sieber <sieberf@amazon.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-and-tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240614073517.94974-1-sieberf@amazon.com	2024-06-15 21:03:50 -07:00
Ian Rogers	65b37df8c6	perf test pmu: Warn don't fail for legacy mixed case event names PowerPC has mixed case events matching legacy hardware cache events. Warn but don't fail in this case. Event parsing will still work in this case by matching the legacy case. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240612124027.2712643-1-irogers@google.com	2024-06-13 21:27:49 -07:00
Athira Rajeev	245b0edf48	tools/perf: Fix timing issue with parallel threads in perf bench wake-up-parallel perf bench futex fails as below and hangs intermittently when attempted to run on on a powerpc system: ./perf bench futex wake-parallel Running 'futex/wake-parallel' benchmark: Run summary [PID 88588]: blocking on 640 threads (at [private] futex 0x10464b8c), 640 threads waking up 1 at a time. [Run 1]: Avg per-thread latency (waking 1/640 threads) in 0.1309 ms (+-53.27%) [Run 2]: Avg per-thread latency (waking 1/640 threads) in 0.0120 ms (+-31.16%) [Run 3]: Avg per-thread latency (waking 1/640 threads) in 0.1474 ms (+-92.47%) [Run 4]: Avg per-thread latency (waking 1/640 threads) in 0.2883 ms (+-67.75%) [Run 5]: Avg per-thread latency (waking 1/640 threads) in 0.4108 ms (+-39.60%) [Run 6]: Avg per-thread latency (waking 1/640 threads) in 0.7843 ms (+-78.98%) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) In the system, where perf bench wake-up-parallel is has system configuration of 640 cpus. After debugging, this turned out to be a timing issue. The benchmark creates threads equal to number of cpus and issues a futex_wait. Then it does a usleep for .1 second before initiating futex_wake. In system configuration with more threads, the usleep time is not enough. Patch changes the usleep from 100000 to 200000 With the patch, ran multiple iterations and there were no issues further seen Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607044354.82225-3-atrajeev@linux.vnet.ibm.com	2024-06-13 21:27:49 -07:00
Athira Rajeev	3638e44542	tools/perf: Fix perf bench epoll to enable the run when some CPU's are offline Perf bench epoll fails as below when attempted to run on on a powerpc system: ./perf bench epoll wait Running 'epoll/wait' benchmark: Run summary [PID 627653]: 79 threads monitoring on 64 file-descriptors for 8 secs. perf: pthread_create: No such file or directory In the setup where this perf bench was ran, difference was that partition had 640 CPU's, but not all CPUs were online. 80 CPUs were online. While creating threads and using epoll_wait , code sets the affinity using cpumask. The cpumask size used is 80 which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the benchmark reports fail while setting affinity for cpu number which is greater than 80 or higher, because it attempts to set a bit position which is not allocated on the cpumask. Fix this by changing the size of cpumask to number of possible cpus and not the number of online cpus. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607044354.82225-2-atrajeev@linux.vnet.ibm.com	2024-06-13 21:27:26 -07:00
Athira Rajeev	1833735867	tools/perf: Fix perf bench futex to enable the run when some CPU's are offline Perf bench futex fails as below when attempted to run on on a powerpc system: ./perf bench futex all Running futex/hash benchmark... Run summary [PID 626307]: 80 threads, each operating on 1024 [private] futexes for 10 secs. perf: pthread_create: No such file or directory In the setup where this perf bench was ran, difference was that partition had 640 CPU's, but not all CPUs were online. 80 CPUs were online. While blocking the threads with futex_wait, code sets the affinity using cpumask. The cpumask size used is 80 which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the benchmark reports fail while setting affinity for cpu number which is greater than 80 or higher, because it attempts to set a bit position which is not allocated on the cpumask. Fix this by changing the size of cpumask to number of possible cpus and not the number of online cpus. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607044354.82225-1-atrajeev@linux.vnet.ibm.com	2024-06-13 21:26:58 -07:00
Ian Rogers	6c1785cd75	perf record: Ensure space for lost samples Previous allocation didn't account for sample ID written after the lost samples event. Switch from malloc/free to a stack allocation. Reported-by: Milian Wolff <milian.wolff@kdab.com> Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240611050626.1223155-1-irogers@google.com	2024-06-13 20:45:31 -07:00
Ian Rogers	6828d6929b	perf evsel: Refactor tool events Tool events unnecessarily open a dummy perf event which is useless even with `perf record` which will still open a dummy event. Change the behavior of tool events so: - duration_time - call `rdclock` on open and then report the count as a delta since the start in evsel__read_counter. This moves code out of builtin-stat making it more general purpose. - user_time/system_time - open the fd as either `/proc/pid/stat` or `/proc/stat` for cases like system wide. evsel__read_counter will read the appropriate field out of the procfs file. These values were previously supplied by wait4, if the procfs read fails then the wait4 values are used, assuming the process/thread terminated. By reading user_time and system_time this way, interval mode, per PID and per CPU can be supported although there are restrictions given what the files provide (e.g. per PID can't be combined with per CPU). Opening any of the tool events for `perf record` is changed to return invalid. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240503232849.17752-1-irogers@google.com	2024-06-10 16:45:10 -07:00
Thomas Richter	658a8805cb	perf test: Speed up test case 70 annotate basic tests On some s390 linux machine (mostly older models) and with debug packages installed, the test case 'perf annotate basic tests' runs for some longer time. Speed up the test and save the output of command perf annotate in a temporary file. This is used to perform pattern matching via grep command. This saves on invocation of perf annotate which runs for some time. Output before: # time bash -x tests/shell/annotate.sh >/dev/null 2>&1; echo EXIT CODE $? real 4m35.543s user 3m19.442s sys 1m14.322s EXIT CODE 0 # Output after: # time bash -x tests/shell/annotate.sh >/dev/null 2>&1; echo EXIT CODE $? real 2m2.881s user 1m30.980s sys 0m30.684s EXIT CODE 0 # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Cc: svens@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607054352.2774936-1-tmricht@linux.ibm.com	2024-06-07 13:06:44 -07:00
Ian Rogers	f5803651b4	perf stat: Choose the most disaggregate command line option When multiple aggregation options are passed to perf stat the behavior isn't clear. Consider "perf stat -A --per-socket .." and "perf stat --per-socket -A ..", the first won't aggregate at all while the second will do per-socket aggregation, even though the same options were passed. Rather than set an enum value, gather the options in a struct and process them from most to least aggregate. This ensures the least aggregate option always applies, so no aggregation if "-A" is passed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605063828.195700-2-irogers@google.com	2024-06-07 13:00:03 -07:00
Ian Rogers	0dddd91ab6	perf stat: Make options local Reduce the scope of stat_options to cmd_stat, and pass as an argument to __cmd_record. This is done to make more localized changes to the options in later patches. A side-effect of the change is to reduce the size of a stripped PIE perf binary by 5952 bytes. The savings come mainly in the dynamic relocation section. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605063828.195700-1-irogers@google.com	2024-06-07 12:59:53 -07:00
Ian Rogers	d2307fd4f9	perf maps: Add/use a sorted insert for fixup overlap and insert Data may have lots of overlapping mmaps. The regular insert adds at the end and relies on a later sort. For data with overlapping mappings the sort will happen during a subsequent maps__find or __maps__fixup_overlap_and_insert, there's never a period where the inserted maps buffer up and a single sort happens. To avoid back to back sorts, maintain the sort order when fixing up and inserting. Previously the first_ending_after search was O(log n) where n is the size of maps, and the insert was O(1) but because of the continuous sorting was becoming O(n*log(n)). With maintaining sort order, the insert now becomes O(n) for a memmove. For a perf report on a perf.data file containing overlapping mappings the time numbers are: Before: real 0m5.894s user 0m5.650s sys 0m0.231s After: real 0m0.675s user 0m0.454s sys 0m0.196s Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Steinar H . Gunderson <sesse@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521165109.708593-4-irogers@google.com	2024-06-06 23:31:30 -07:00
Ian Rogers	aeefb04393	perf maps: Reduce sorting for overlapping mappings When an 'after' map is generated the 'new' map must be before it so terminate iterating and don't resort. If the entry 'pos' is entirely overlapped by the 'new' mapping then don't remove and insert the mapping, just replace - again to remove sorting. For a perf report on a perf.data file containing overlapping mappings the time numbers are: Before: real 0m9.856s user 0m9.637s sys 0m0.204s After: real 0m5.894s user 0m5.650s sys 0m0.231s Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Steinar H . Gunderson <sesse@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521165109.708593-3-irogers@google.com	2024-06-06 23:31:15 -07:00
Ian Rogers	0b90dfda22	perf maps: Fix use after free in __maps__fixup_overlap_and_insert In the case 'before' and 'after' are broken out from pos, maps_by_address may be changed by __maps__insert, as such it needs re-reading. Don't ignore the return value from __maps_insert. Fixes: `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Steinar H . Gunderson <sesse@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521165109.708593-2-irogers@google.com	2024-06-06 23:30:58 -07:00
Lucas Stach	a9700511fd	perf script: netdev-times: add location parameter to consume_skb `dd1b527831` ("net: add location to trace_consume_skb()") added a new parameter to the consume_skb tracepoint. Adapt the script to match. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kernel@pengutronix.de Cc: patchwork-lst@pengutronix.de Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605144442.1985270-1-l.stach@pengutronix.de	2024-06-06 23:29:23 -07:00
Clément Le Goffic	9aa61d8ecb	perf: parse-events: Fix compilation error while defining DEBUG_PARSER Compiling perf tool with 'DEBUG_PARSER=1' leads to errors: $> make -C tools/perf PARSER_DEBUG=1 NO_LIBTRACEEVENT=1 ... CC util/expr-flex.o CC util/expr.o util/parse-events.c:33:12: error: redundant redeclaration of ‘parse_events_debug’ [-Werror=redundant-decls] 33 \| extern int parse_events_debug; \| ^~~~~~~~~~~~~~~~~~ In file included from util/parse-events.c:18: util/parse-events-bison.h:43:12: note: previous declaration of ‘parse_events_debug’ with type ‘int’ 43 \| extern int parse_events_debug; \| ^~~~~~~~~~~~~~~~~~ util/expr.c:27:12: error: redundant redeclaration of ‘expr_debug’ [-Werror=redundant-decls] 27 \| extern int expr_debug; \| ^~~~~~~~~~ In file included from util/expr.c:11: util/expr-bison.h:43:12: note: previous declaration of ‘expr_debug’ with type ‘int’ 43 \| extern int expr_debug; \| ^~~~~~~~~~ cc-1: all warnings being treated as errors Remove extern declaration from the parse-envents.c file as there is a conflict with the ones generated using bison and yacc tools from the file parse-events.[ly]. Signed-off-by: Clément Le Goffic <clement.legoffic@foss.st.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605140453.614862-1-clement.legoffic@foss.st.com	2024-06-06 00:19:36 -07:00
Namhyung Kim	ca9680821d	perf bpf: Fix handling of minimal vmlinux.h file when interrupting the build Ingo reported that he was seeing these when hitting Control+C during a perf tools build: Makefile.perf:1149: *** Missing bpftool input for generating vmlinux.h. Stop. The failure happens when you don't have vmlinux.h or vmlinux with BTF. ifeq ($(VMLINUX_H),) ifeq ($(VMLINUX_BTF),) $(error Missing bpftool input for generating vmlinux.h) endif endif VMLINUX_BTF can be empty if you didn't build a kernel or it doesn't have a BTF section and the current kernel also has no BTF. This is totally ok. But VMLINUX_H should be set to the minimal version in the source tree (unless you overwrite it manually) when you don't pass GEN_VMLINUX_H=1 (which requires VMLINUX_BTF should not be empty). The problem is that it's defined in Makefile.config which is not included for `make clean`. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: http://lore.kernel.org/lkml/CAM9d7ch5HTr+k+_GpbMrX0HUo5BZ11byh1xq0Two7B7RQACuNw@mail.gmail.com Link: http://lore.kernel.org/lkml/ZjssGrj+abyC6mYP@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-06-05 11:33:00 -03:00
Arnaldo Carvalho de Melo	5b3cde1988	Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event" This reverts commit `7d1405c71d`. This causes segfaults in some cases, as reported by Milian: ``` sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e raw_syscalls:sys_enter ls ... [ perf record: Woken up 3 times to write data ] malloc(): invalid next size (unsorted) Aborted ``` Backtrace with GDB + debuginfod: ``` malloc(): invalid next size (unsorted) Thread 1 "perf" received signal SIGABRT, Aborted. __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c 44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78 #2 0x00007ffff6e50a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/ raise.c:26 #3 0x00007ffff6e384c3 in __GI_abort () at abort.c:79 #4 0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@entry=0x7ffff6fc22ea "%s\n") at ../sysdeps/posix/libc_fatal.c:132 #5 0x00007ffff6eb3085 in malloc_printerr (str=str@entry=0x7ffff6fc5850 "malloc(): invalid next size (unsorted)") at malloc.c:5772 #6 0x00007ffff6eb657c in _int_malloc (av=av@entry=0x7ffff6ff6ac0 <main_arena>, bytes=bytes@entry=368) at malloc.c:4081 #7 0x00007ffff6eb877e in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3754 #8 0x000055555569bdb6 in perf_session.do_write_header () #9 0x00005555555a373a in __cmd_record.constprop.0 () #10 0x00005555555a6846 in cmd_record () #11 0x000055555564db7f in run_builtin () #12 0x000055555558ed77 in main () ``` Valgrind memcheck: ``` ==45136== Invalid write of size 8 ==45136== at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf) ==45136== by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ==45136== Syscall param write(buf) points to unaddressable byte(s) ==45136== at 0x575953D: __libc_write (write.c:26) ==45136== by 0x575953D: write (write.c:24) ==45136== by 0x35761F: ion (in /usr/bin/perf) ==45136== by 0x357778: writen (in /usr/bin/perf) ==45136== by 0x1548F7: record__write (in /usr/bin/perf) ==45136== by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ----- Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/ Reported-by: Milian Wolff <milian.wolff@kdab.com> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@kernel.org # 6.8+ Link: https://lore.kernel.org/lkml/Zl9ksOlHJHnKM70p@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-06-05 11:12:36 -03:00
Dr. David Alan Gilbert	0770ceaff2	perf hisi-ptt: remove unused struct 'hisi_ptt_queue' 'hisi_ptt_queue' has been unused since the original commit `5e91e57e68` ("perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: yangyicong@hisilicon.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240602000709.213116-1-linux@treblig.org	2024-06-04 18:17:17 -07:00
Dr. David Alan Gilbert	f7abc0cfa8	perf genelf: remove unused struct 'options' 'options' has been unused since commit `fa7f7e7354` ("perf jit: Move test functionality in to a test"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240602000505.213032-1-linux@treblig.org	2024-06-03 22:07:52 -07:00
Nick Forrington	f7d4485fce	perf lock info: Display both map and thread by default Change "perf lock info" argument handling to: Display both map and thread info (rather than an error) when neither are specified. Display both map and thread info (rather than just thread info) when both are requested. Signed-off-by: Nick Forrington <nick.forrington@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240513091413.738537-2-nick.forrington@arm.com	2024-06-03 22:01:00 -07:00
Ian Rogers	af75201634	perf top: Allow filters on events Allow filters to be added to perf top events. One use is to workaround issues with: ``` $ perf top --uid="$(id -u)" ``` which tries to scan /proc find processes belonging to the uid and can fail in such a pid terminates between the scan and the perf_event_open reporting: ``` Error: The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:P). /bin/dmesg \| grep -i perf may provide additional information. ``` A similar filter: ``` $ perf top -e cycles:P --filter "uid == $(id -u)" ``` doesn't fail this way. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240524205227.244375-4-irogers@google.com	2024-05-30 10:05:57 -07:00
Ian Rogers	d92aa899fe	perf bpf filter: Add uid and gid terms Allow the BPF filter to use the uid and gid terms determined by the bpf_get_current_uid_gid BPF helper. For example, the following will record the cpu-clock event system wide discarding samples that don't belong to the current user. $ perf record -e cpu-clock --filter "uid == $(id -u)" -a sleep 0.1 Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240524205227.244375-3-irogers@google.com	2024-05-30 10:05:57 -07:00
Ian Rogers	63b9cbd794	perf bpf filter: Give terms their own enum Give the term types their own enum so that additional terms can be added that don't correspond to a PERF_SAMPLE_xx flag. The term values are numerically ascending rather than bit field positions, this means they need translating to a PERF_SAMPLE_xx bit field in certain places using a shift. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240524205227.244375-2-irogers@google.com	2024-05-30 10:05:57 -07:00
Changbin Du	f975c13d2a	perf trace beauty: Always show mmap prot even though PROT_NONE PROT_NONE is also useful information, so do not omit the mmap prot even though it is 0. syscall_arg__scnprintf_mmap_prot() could print PROT_NONE for prot 0. Before: PROT_NONE is not shown. $ sudo perf trace -e syscalls:sys_enter_mmap --filter prot==0 -- ls 0.000 ls/2979231 syscalls:sys_enter_mmap(len: 4220888, flags: PRIVATE\|ANONYMOUS) After: PROT_NONE is displayed. $ sudo perf trace -e syscalls:sys_enter_mmap --filter prot==0 -- ls 0.000 ls/2975708 syscalls:sys_enter_mmap(len: 4220888, prot: NONE, flags: PRIVATE\|ANONYMOUS) Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240522033542.1359421-3-changbin.du@huawei.com	2024-05-29 22:48:23 -07:00
Changbin Du	92968dcc03	perf trace beauty: Always show param if show_zero is set For some parameters, it is best to also display them when they are 0, e.g. flags. Here we only check the show_zero property and let arg printer handle special cases. Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240522033542.1359421-2-changbin.du@huawei.com	2024-05-29 22:48:05 -07:00
Ian Rogers	a93c83eca4	perf docs: Fix typos Assorted typo fixes. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521223555.858859-1-irogers@google.com	2024-05-28 22:52:28 -07:00
Breno Leitao	265b71153e	perf list: Fix the --no-desc option Currently, the --no-desc option in perf list isn't functioning as intended. This issue arises from the overwriting of struct option->desc with the opposite value of struct option->long_desc. Consequently, whatever parse_options() returns at struct option->desc gets overridden later, rendering the --desc or --no-desc arguments ineffective. To resolve this, set ->desc as true by default and allow parse_options() to adjust it accordingly. This adjustment will fix the --no-desc option while preserving the functionality of the other parameters. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: leit@meta.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240517141427.1905691-1-leitao@debian.org	2024-05-28 11:29:49 -07:00
Ian Rogers	cbd446b4db	perf arm-spe: Unaligned pointer work around Use get_unaligned_leXX instead of leXX_to_cpu to handle unaligned pointers. Such pointers occur with libFuzzer testing. A similar change for intel-pt was done in: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240514052402.3031871-1-irogers@google.com	2024-05-28 11:29:49 -07:00
Ian Rogers	678be1ca30	perf tests: Add some pmu core functionality tests Test behavior of PMU names and comparisons wrt suffixes using Intel uncore_cha, marvell mrvl_ddr_pmu and S390's cpum_cf as examples. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Bharat Bhushan <bbhushan2@marvell.com> Cc: Bhaskara Budiredla <bbudiredla@marvell.com> Cc: Tuan Phan <tuanphan@os.amperecomputing.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515060114.3268149-3-irogers@google.com	2024-05-28 11:29:49 -07:00
Ian Rogers	3241d46f5f	perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu The mrvl_ddr_pmu is uncore and has a hexadecimal address suffix while the previous PMU sorting/merging code assumes uncore PMU names start with uncore_ and have a decimal suffix. Because of the previous assumption it isn't possible to wildcard the mrvl_ddr_pmu. Modify pmu_name_len_no_suffix but also remove the suffix number out argument, this is because we don't know if a suffix number of say 100 is in hexadecimal or decimal. As the only use of the suffix number is in comparisons, it is safe there to compare the values as hexadecimal. Modify perf_pmu__match_ignoring_suffix so that hexadecimal suffixes are ignored. Only allow hexadecimal suffixes to be greater than length 2 (ie 3 or more) so that S390's cpum_cf PMU doesn't lose its suffix. Change the return type of pmu_name_len_no_suffix to size_t to workaround GCC incorrectly determining the result could be negative. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Bharat Bhushan <bbhushan2@marvell.com> Cc: Bhaskara Budiredla <bbudiredla@marvell.com> Cc: Tuan Phan <tuanphan@os.amperecomputing.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515060114.3268149-2-irogers@google.com	2024-05-28 11:29:49 -07:00
Arnaldo Carvalho de Melo	da42b5229b	tools headers: Update the syscall tables and unistd.h, mostly to support the new 'mseal' syscall But also to wire up shadow stacks on 32-bit x86, picking up those changes from these csets: `ff388fe5c4` ("mseal: wire up mseal syscall") `2883f01ec3` ("x86/shstk: Enable shadow stacks for x32") This makes 'perf trace' support it, now its possible, for instance to do: # perf trace -e mseal --max-stack=16 Here is an example with the 'sendmmsg' syscall: root@x1:~# perf trace -e sendmmsg --max-stack 16 --max-events=1 0.000 ( 0.062 ms): dbus-broker/1012 sendmmsg(fd: 150, mmsg: 0x7ffef57cca50, vlen: 1, flags: DONTWAIT\|NOSIGNAL) = 1 syscall_exit_to_user_mode_prepare ([kernel.kallsyms]) syscall_exit_to_user_mode_prepare ([kernel.kallsyms]) syscall_exit_to_user_mode ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64 ([kernel.kallsyms]) [0x117ce7] (/usr/lib64/libc.so.6 (deleted)) root@x1:~# To do a system wide tracing of the new 'mseal' syscall with a backtrace of at most 16 entries. This addresses these perf tools build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H J Lu <hjl.tools@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZlXlo4TNcba4wnVZ@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-28 11:10:00 -03:00
Arnaldo Carvalho de Melo	001821b0e7	perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources to pick POSTED_MSI_NOTIFICATION To pick up the change in: `f5a3562ec9` ("x86/irq: Reserve a per CPU IDT vector for posted MSIs") That picks up this new vector: $ cp arch/x86/include/asm/irq_vectors.h tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h $ tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh > after $ diff -u before after --- before 2024-05-27 12:50:47.708863932 -0300 +++ after 2024-05-27 12:51:15.335113123 -0300 @@ -1,6 +1,7 @@ static const char x86_irq_vectors[] = { [0x02] = "NMI", [0x80] = "IA32_SYSCALL", + [0xeb] = "POSTED_MSI_NOTIFICATION", [0xec] = "LOCAL_TIMER", [0xed] = "HYPERV_STIMER0", [0xee] = "HYPERV_REENLIGHTENMENT", $ Now those will be known when pretty printing the irq_vectors: tracepoints. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/ZlS34M0x30EFVhbg@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-27 13:42:18 -03:00
Arnaldo Carvalho de Melo	a3eed53bee	perf beauty: Update copy of linux/socket.h with the kernel sources To pick up the fixes in: `0645fbe760` ("net: have do_accept() take a struct proto_accept_arg argument") That just changes a function prototype, not touching things used by the perf scrape scripts such as: $ tools/perf/trace/beauty/sockaddr.sh \| head -5 static const char *socket_families[] = { [0] = "UNSPEC", [1] = "LOCAL", [2] = "INET", [3] = "AX25", $ This addresses this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZlSrceExgjrUiDb5@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-27 12:49:18 -03:00
Arnaldo Carvalho de Melo	1437a9f06f	tools headers UAPI: Sync fcntl.h with the kernel sources to pick F_DUPFD_QUERY There is no scrape script yet for those, but the warning pointed out we need to update the array with the F_LINUX_SPECIFIC_BASE entries, do it. Now 'perf trace' can decode that cmd and also use it in filter, as in: root@number:~# perf trace -e syscalls:*enter_fcntl --filter 'cmd != SETFL && cmd != GETFL' 0.000 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLK, arg: 0x7fffdc6a8a50) 0.013 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a8aa0) 0.090 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a88e0) ^Croot@number:~# This picks up the changes in: `c62b758bae` ("fcntl: add F_DUPFD_QUERY fcntl()") Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZlSqNQH9mFw2bmjq@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-27 12:44:09 -03:00
Arnaldo Carvalho de Melo	0efc88e444	tools headers UAPI: Sync linux/prctl.h with the kernel sources To pick the changes in: `628d701f2d` ("powerpc/dexcr: Add DEXCR prctl interface") `6b9391b581` ("riscv: Include riscv_set_icache_flush_ctx prctl") That adds some PowerPC and a RISC-V specific prctl options: $ tools/perf/trace/beauty/prctl_option.sh > before $ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h $ tools/perf/trace/beauty/prctl_option.sh > after $ diff -u before after --- before 2024-05-27 12:14:21.358032781 -0300 +++ after 2024-05-27 12:14:32.364530185 -0300 @@ -65,6 +65,9 @@ [68] = "GET_MEMORY_MERGE", [69] = "RISCV_V_SET_CONTROL", [70] = "RISCV_V_GET_CONTROL", + [71] = "RISCV_SET_ICACHE_FLUSH_CTX", + [72] = "PPC_GET_DEXCR", + [73] = "PPC_SET_DEXCR", }; static const char *prctl_set_mm_options[] = { [1] = "START_CODE", $ That now will be used to decode the syscall option and also to compose filters, for instance: [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME 0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee) 0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670) 7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10) 7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970) 8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10) 8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970) ^C[root@five ~]# This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/lkml/ZlSklGWp--v_Ije7@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-27 12:20:02 -03:00
Arnaldo Carvalho de Melo	e5c7bd4e5c	tools include UAPI: Sync linux/stat.h with the kernel sources To get the changes in: `2a82bb0294` ("statx: stx_subvol") To pick up this change and support it: $ tools/perf/trace/beauty/statx_mask.sh > before $ cp include/uapi/linux/stat.h tools/perf/trace/beauty/include/uapi/linux/stat.h $ tools/perf/trace/beauty/statx_mask.sh > after $ diff -u before after --- before 2024-05-22 13:39:49.742470571 -0300 +++ after 2024-05-22 13:39:59.157883101 -0300 @@ -14,4 +14,5 @@ [ilog2(0x00001000) + 1] = "MNT_ID", [ilog2(0x00002000) + 1] = "DIOALIGN", [ilog2(0x00004000) + 1] = "MNT_ID_UNIQUE", + [ilog2(0x00008000) + 1] = "SUBVOL", }; $ Now we'll see it like we see these: # perf trace -e statx 0.000 ( 0.015 ms): systemd-userwo/3982299 statx(dfd: 6, filename: ".", mask: TYPE\|INO\|MNT_ID, buffer: 0x7ffd8945e850) = 0 <SNIP> 180.559 ( 0.007 ms): (ostnamed)/3982957 statx(dfd: 4, filename: "sys", flags: SYMLINK_NOFOLLOW\|NO_AUTOMOUNT\|STATX_DONT_SYNC, mask: TYPE, buffer: 0x7fff13161190) = 0 180.918 ( 0.011 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/kernel/security", flags: SYMLINK_NOFOLLOW\|NO_AUTOMOUNT\|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0 180.956 ( 0.010 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/fs/cgroup", flags: SYMLINK_NOFOLLOW\|NO_AUTOMOUNT\|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0 <SNIP> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zk5nO9yT0oPezUoo@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-27 12:13:51 -03:00
Arnaldo Carvalho de Melo	4f1b067359	Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy" This reverts commit `617824a7f0`. This made a simple 'perf record -e cycles:pp make -j199' stop working on the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed at length in the threads in the Link tags below. The fix provided by Ian wasn't acceptable and work to fix this will take time we don't have at this point, so lets revert this and work on it on the next devel cycle. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Ethan Adams <j.ethan.adams@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-26 08:41:34 -03:00
Linus Torvalds	29c73fc794	perf tools fixes and improvements for v6.10: - Add Kan Liang to MAINTAINERS as a perf tools reviewer. - Add support for using the 'capstone' disassembler library in various tools, such as 'perf script' and 'perf annotate'. This is an alternative for the use of the 'xed' and 'objdump' disassemblers. - Data-type profiling improvements: Resolve types for a->b->c by backtracking the assignments until it finds DWARF info for one of those members Support for global variables, keeping a cache to speed up lookups. Handle the 'call' instruction, dealing with effects on registers and handling its return when tracking register data types. Handle x86's segment based addressing like %gs:0x28, to support things like per CPU variables, the stack canary, etc. Data-type profiling got big speedups when using capstone for disassembling. The objdump outoput parsing method is left as a fallback when capstone fails or isn't available. There are patches posted for 6.11 that to use a LLVM disassembler. Support event group display in the TUI when annotating types with --data-type, for instance to show memory load and store events for the data type fields. Optimize the 'perf annotate' data structures, reducing memory usage. Add a initial 'perf test' for 'perf annotate', checking that a target symbol appears on the output, specifying objdump via the command line, etc. - Integrate the shellcheck utility with the build of perf to allow catching shell problems early in areas such as 'perf test', 'perf trace' scrape scripts, etc. - Add 'uretprobe' variant in the 'perf bench uprobe' tool. - Add script to run instances of 'perf script' in parallel. - Allow parsing tracepoint names that start with digits, such as 9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems where those tracepoints aren't available. Vendor Events: - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1. - Add AMD's Zen 5 core and uncore events and metrics. Those come from the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors" document, with events that capture information on op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc. - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/AmpereOneX. Miscellaneous: - Sync header copies with the kernel sources. - Move some header copies used only for generating translation string tables for ioctl cmds and other syscall integer arguments to a new directory under tools/perf/beauty/, to separate from copies in tools/include/ that are used to build the tools. - Introduce scrape script for several syscall 'flags'/'mask' arguments. - Improve cpumap utilization, fixing up pairing of refcounts, using the right iterators (perf_cpu_map__for_each_cpu), etc. - Give more details about raw event encodings in 'perf list', show tracepoint encoding in the detailed output. - Refactor the DSOs handling code, reducing memory usage. - Document the BPF event modifier and add a 'perf test' for it. - Improve the event parser, better error messages and add further 'perf test's for it. - Add reference count checking to 'struct comm_str' and 'struct mem_info'. - Make ARM64's 'perf test' entries for the Neoverse N1 more robust. - Tweak the ARM64's Coresight 'perf test's. - Improve ARM64's CoreSight ETM version detection and error reporting. - Fix handling of symbols when using kcore. - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual machines in 'perf report'. - Fix -g/--call-graph option failure in 'perf sched timehist'. - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent installed in non-standard directories, such as when doing cross builds. - Various 'perf test' and 'perf bench' fixes. - Improve 'perf probe' error message for long C++ probe names. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZkzdjgAKCRCyPKLppCJ+ J8ZYAP46rcbOuDWol6tjD9FDXd+spkWc40bnqeSnOR+TWlmJXwEA87XU4+3LAh6p HQxKXehJRh90I90yn954mK2NuN+58Q0= =sie+ -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "General: - Integrate the shellcheck utility with the build of perf to allow catching shell problems early in areas such as 'perf test', 'perf trace' scrape scripts, etc - Add 'uretprobe' variant in the 'perf bench uprobe' tool - Add script to run instances of 'perf script' in parallel - Allow parsing tracepoint names that start with digits, such as 9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems where those tracepoints aren't available - Add Kan Liang to MAINTAINERS as a perf tools reviewer - Add support for using the 'capstone' disassembler library in various tools, such as 'perf script' and 'perf annotate'. This is an alternative for the use of the 'xed' and 'objdump' disassemblers Data-type profiling improvements: - Resolve types for a->b->c by backtracking the assignments until it finds DWARF info for one of those members - Support for global variables, keeping a cache to speed up lookups - Handle the 'call' instruction, dealing with effects on registers and handling its return when tracking register data types - Handle x86's segment based addressing like %gs:0x28, to support things like per CPU variables, the stack canary, etc - Data-type profiling got big speedups when using capstone for disassembling. The objdump outoput parsing method is left as a fallback when capstone fails or isn't available. There are patches posted for 6.11 that to use a LLVM disassembler - Support event group display in the TUI when annotating types with --data-type, for instance to show memory load and store events for the data type fields - Optimize the 'perf annotate' data structures, reducing memory usage - Add a initial 'perf test' for 'perf annotate', checking that a target symbol appears on the output, specifying objdump via the command line, etc Vendor Events: - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1 - Add AMD's Zen 5 core and uncore events and metrics. Those come from the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors" document, with events that capture information on op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/ AmpereOneX Miscellaneous: - Sync header copies with the kernel sources - Move some header copies used only for generating translation string tables for ioctl cmds and other syscall integer arguments to a new directory under tools/perf/beauty/, to separate from copies in tools/include/ that are used to build the tools - Introduce scrape script for several syscall 'flags'/'mask' arguments - Improve cpumap utilization, fixing up pairing of refcounts, using the right iterators (perf_cpu_map__for_each_cpu), etc - Give more details about raw event encodings in 'perf list', show tracepoint encoding in the detailed output - Refactor the DSOs handling code, reducing memory usage - Document the BPF event modifier and add a 'perf test' for it - Improve the event parser, better error messages and add further 'perf test's for it - Add reference count checking to 'struct comm_str' and 'struct mem_info' - Make ARM64's 'perf test' entries for the Neoverse N1 more robust - Tweak the ARM64's Coresight 'perf test's - Improve ARM64's CoreSight ETM version detection and error reporting - Fix handling of symbols when using kcore - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual machines in 'perf report' - Fix -g/--call-graph option failure in 'perf sched timehist' - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent installed in non-standard directories, such as when doing cross builds - Various 'perf test' and 'perf bench' fixes - Improve 'perf probe' error message for long C++ probe names" * tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits) tools lib subcmd: Show parent options in help perf pmu: Count sys and cpuid JSON events separately perf stat: Don't display metric header for non-leader uncore events perf annotate-data: Ensure the number of type histograms perf annotate: Fix segfault on sample histogram perf daemon: Fix file leak in daemon_session__control libsubcmd: Fix parse-options memory leak perf lock: Avoid memory leaks from strdup() perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency perf tools: Ignore deleted cgroups perf parse: Allow tracepoint names to start with digits perf parse-events: Add new 'fake_tp' parameter for tests perf parse-events: pass parse_state to add_tracepoint perf symbols: Fix ownership of string in dso__load_vmlinux() perf symbols: Update kcore map before merging in remaining symbols perf maps: Re-use __maps__free_maps_by_name() perf symbols: Remove map from list before updating addresses perf tracepoint: Don't scan all tracepoints to test if one exists perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT perf thread: Fixes to thread__new() related to initializing comm ...	2024-05-21 15:45:14 -07:00
Ian Rogers	d9c5f5f94c	perf pmu: Count sys and cpuid JSON events separately Sys events are eagerly loaded as each event has a compat option that may mean the event is or isn't associated with the PMU. These shouldn't be counted as loaded_json_events as that is used for JSON events matching the CPUID that may or may not have been loaded. The mismatch causes issues on ARM64 that uses sys events. Fixes: `e6ff1eed35` ("perf pmu: Lazily add JSON events") Closes: https://lore.kernel.org/lkml/20240510024729.1075732-1-justin.he@arm.com/ Reported-by: Jia He <justin.he@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240511003601.2666907-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-11 13:03:13 -03:00
Ian Rogers	193a9e3020	perf stat: Don't display metric header for non-leader uncore events On an Intel tigerlake laptop a metric like: { "BriefDescription": "Test", "MetricExpr": "imc_free_running@data_read@ + imc_free_running@data_write@", "MetricGroup": "Test", "MetricName": "Test", "ScaleUnit": "6.103515625e-5MiB" }, Will have 4 events: uncore_imc_free_running_0/data_read/ uncore_imc_free_running_0/data_write/ uncore_imc_free_running_1/data_read/ uncore_imc_free_running_1/data_write/ If aggregration is disabled with metric-only 2 column headers are needed: $ perf stat -M test --metric-only -A -a sleep 1 Performance counter stats for 'system wide': MiB Test MiB Test CPU0 1821.0 1820.5 But when not, the counts aggregated in the metric leader and only 1 column should be shown: $ perf stat -M test --metric-only -a sleep 1 Performance counter stats for 'system wide': MiB Test 5909.4 1.001258915 seconds time elapsed Achieve this by skipping events that aren't metric leaders when printing column headers and aggregation isn't disabled. The bug is long standing, the fixes tag is set to a refactor as that is as far back as is reasonable to backport. Fixes: `088519f318` ("perf stat: Move the display functions to stat-display.c") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaige Ye <ye@kaige.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240510051309.2452468-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-11 13:03:13 -03:00
Namhyung Kim	2af1280b19	perf annotate-data: Ensure the number of type histograms Arnaldo reported that there is a case where nr_histograms and histograms don't agree each other. It ended up in a segfault trying to access a NULL histograms array. Let's make sure to update the nr_histograms when the histograms array is changed. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510210452.2449944-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-11 13:03:13 -03:00
Namhyung Kim	9ef30265a4	perf annotate: Fix segfault on sample histogram A symbol can have no samples, then accessing the annotated_source->samples hashmap will result in a segfault. Fixes: `a3f7768bcf` ("perf annotate: Fix memory leak in annotated_source") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510210452.2449944-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-11 13:03:13 -03:00
Samasth Norway Ananda	0954160346	perf daemon: Fix file leak in daemon_session__control The open() function returns -1 on error. The 'control' and 'ack' file descriptors are both initialized with open() and further validated with 'if' statement. 'if (!control)' would evaluate to 'true' if returned value on error were '0' but it is actually '-1'. Fixes: `edcaa47958` ("perf daemon: Add 'ping' command") Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510003424.2016914-1-samasth.norway.ananda@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-10 11:28:11 -03:00
Ian Rogers	5ecab78539	perf lock: Avoid memory leaks from strdup() Leak sanitizer complains about the strdup-ed arguments not being freed and given cmd_record doesn't modify the given strings, remove the strdups. Original discussion in this patch: https://lore.kernel.org/lkml/20240430184156.1824083-1-irogers@google.com/ Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240509053123.1918093-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-10 11:15:13 -03:00
Madadi Vineeth Reddy	6fe61cb4ae	perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency Rename 'Switches' to 'Count' and document metrics shown for perf sched latency output. Also add options possible with perf sched latency. Initially, after seeing the output of 'perf sched latency', the term 'Switches' seemed like it's the number of context switches-in for a particular task, but upon going through the code, it was observed that it's actually keeping track of number of times a delay was calculated so that it is used in calculation of the average delay. Actually, the switches here is a subset of number of context switches-in because there are some cases where the count is not incremented in switch-in handler 'add_sched_in_event'. For example when a task is switched-in while it's state is not ready to run(!= THREAD_WAIT_CPU). commit `d9340c1db3` ("perf sched: Display time in milliseconds, reorganize output") changed it from the original count to switches. So, renamed switches to count to make things a bit more clearer and added the metrics description of latency in the document. Reviewed-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240328090005.8321-1-vineethr@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-10 11:10:03 -03:00
Namhyung Kim	e2eeef290c	perf tools: Ignore deleted cgroups On large systems, cgroups can be created and deleted often. That means there's a race between perf tools and cgroups when it gets the cgroup name and opens the cgroup. I got a report that 'perf stat' with many cgroups failed quite often due to the missing cgroups on such a large machine. I think we can ignore such cgroups when expanding events and use id 0 if it fails to read the cgroup id. IIUC 0 is not a vaild cgroup id so it won't update event counts for the failed cgroups. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240509182235.2319599-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-10 10:52:46 -03:00
Dominique Martinet	5ceb57990b	perf parse: Allow tracepoint names to start with digits Tracepoints can start with digits, although we don't have many of these: $ rg -g '.h' '\bTRACE_EVENT\([0-9]' net/mac802154/trace.h 53:TRACE_EVENT(802154_drv_return_int, ... net/ieee802154/trace.h 66:TRACE_EVENT(802154_rdev_add_virtual_intf, ... include/trace/events/9p.h 124:TRACE_EVENT(9p_client_req, ... Just allow names to start with digits too so e.g. "perf trace -e '9p:'" works Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510-perf_digit-v4-3-db1553f3233b@codewreck.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-10 10:50:34 -03:00
Dominique Martinet	a2a6604e1c	perf parse-events: Add new 'fake_tp' parameter for tests The next commit will allow tracepoints starting with digits, but most systems do not have any available by default so tests should skip the actual "check if it exists in /sys/kernel/debug/tracing" step. In order to do that, add a new boolean flag specifying if we should actually "format" the probe or not. Originally-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510-perf_digit-v4-2-db1553f3233b@codewreck.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-10 10:49:26 -03:00
Dominique Martinet	11a4296485	perf parse-events: pass parse_state to add_tracepoint The next patch will add another flag to parse_state that we will want to pass to evsel__newtp_idx(), so pass the whole parse_state all the way down instead of giving only the index Originally-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510-perf_digit-v4-1-db1553f3233b@codewreck.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-10 10:49:09 -03:00
James Clark	25626e19ae	perf symbols: Fix ownership of string in dso__load_vmlinux() The linked commit updated dso__load_vmlinux() to call dso__set_long_name() before loading the symbols. Loading the symbols may not succeed but dso__set_long_name() takes ownership of the string. The two callers of this function free the string themselves on failure cases, resulting in the following error: $ perf record -- ls $ perf report free(): double free detected in tcache 2 Fix it by always taking ownership of the string, even on failure. This means the string is either freed at the very first early exit condition, or later when the dso is deleted or the long name is replaced. Now no special return value is needed to signify that the caller needs to free the string. Fixes: `e59fea47f8` ("perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240507141210.195939-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:48:46 -03:00
James Clark	f30232b20f	perf symbols: Update kcore map before merging in remaining symbols When loading kcore, the main vmlinux map is updated in the same loop that merges the remaining maps. If a map that overlaps is merged in before kcore, the list can become unsortable when the main map addresses are updated. This will later trigger the check_invariants() assert: $ perf record $ perf report util/maps.c:96: check_invariants: Assertion `map__end(prev) <= map__start(map) \|\| map__start(prev) == map__start(map)' failed. Aborted Fix it by moving the main map update prior to the loop so that maps__merge_in() can split it if necessary. Fixes: `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses") Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240507141210.195939-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:48:32 -03:00
James Clark	fd81f52e31	perf maps: Re-use __maps__free_maps_by_name() maps__merge_in() hard codes the steps to free the maps_by_name list. It seems to not map__put() each element before freeing, and it sets maps_by_name_sorted to true after freeing, which may be harmless but is inconsistent with maps__init() and other functions. maps__maps_by_name_addr() is also quite hard to read because we already have maps__maps_by_name() and maps__maps_by_address(), but the function is only used in that place so delete it. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240507141210.195939-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:48:19 -03:00
James Clark	9fe410a7ef	perf symbols: Remove map from list before updating addresses Make the order of operations remove, update, add. Updating addresses before the map is removed causes the ordering check to fail when the map is removed. This can be reproduced when running Perf on an Arm system with a static kernel and Perf uses kcore rather than other sources: $ perf record -- ls $ perf report util/maps.c:96: check_invariants: Assertion `map__end(prev) <= map__start(map) \|\| map__start(prev) == map__start(map)' failed Fixes: `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses") Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240507141210.195939-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:48:00 -03:00
Ian Rogers	d790ead8a6	perf tracepoint: Don't scan all tracepoints to test if one exists In is_valid_tracepoint, rather than scanning "/sys/kernel/tracing/events//" skipping any path where "/sys/kernel/tracing/events///id" doesn't exist, and then testing if ":" matches the tracepoint name, just use the given tracepoint name replace the ':' with '/' and see if the id file exists. This turns a nested directory search into a single file available test. Rather than return 1 for valid and 0 for invalid, return true and false. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240509153245.1990426-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:46:43 -03:00
James Clark	c9d492378f	perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT check_allowed_ops() is used from both HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT sections, so move it into the right place so that it's available when either are defined. This shows up when doing a static cross compile for arm64: $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LDFLAGS="-static" \ EXTRA_PERFLIBS="-lexpat" util/dwarf-aux.c:1723:6: error: implicit declaration of function 'check_allowed_ops' Fixes: `55442cc2f2` ("perf dwarf-aux: Check allowed DWARF Ops") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240508141458.439017-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:19:27 -03:00
Ian Rogers	3536c2575e	perf thread: Fixes to thread__new() related to initializing comm Freeing the thread on failure won't work with reference count checking, use thread__delete(). Don't allocate the comm_str, use a stack allocation instead. Fixes: `f6005cafeb` ("perf thread: Add reference count checking") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240508035301.1554434-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:15:25 -03:00
Ian Rogers	45b4f402a6	perf report: Avoid SEGV in report__setup_sample_type() In some cases evsel->name is lazily initialized in evsel__name(). If not initialized passing NULL to strstr() leads to a SEGV. Fixes: `ccb17caecf` ("perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240508035301.1554434-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:14:39 -03:00
Ian Rogers	de6a908384	perf comm: Fix comm_str__put() for reference count checking Searching for the entry in the array needs to avoid the intermediate pointer with reference count checking. Refactor the array removal to binary search for the entry. Change the array to hold an entry with a reference count (so the intermediate pointer can work) and remove from the array when the reference count on a comm_str falls to 1. Fixes: `13ca628716` ("perf comm: Add reference count checking to 'struct comm_str'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240508035301.1554434-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:13:22 -03:00
Ian Rogers	90f01afb0d	perf ui browser: Avoid SEGV on title If the title is NULL then it can lead to a SEGV. Fixes: `769e6a1e15` ("perf ui browser: Don't save pointer to stack memory") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240508035301.1554434-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-09 18:12:47 -03:00
Namhyung Kim	187c219b57	perf dwarf-aux: Print array type name with "[]" It's confusing both pointers and arrays are printed as . Let's print array types with [] so that we can identify them easily. Although it's interchangable, sometimes it can cause confusion with size like in the below example. Note that it is not the same with C syntax where it goes to the variable names, but we want to have it in the type names (like in Go language). Before: mov [20] 0x68(reg5) -> reg0 type='struct page' size=0x80 (die:0x4e61d32) After: mov [20] 0x68(reg5) -> reg0 type='struct page[]' size=0x80 (die:0x4e61d32) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240507041338.2081775-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 21:39:42 -03:00
Ian Rogers	d561e170bd	perf hist: Avoid 'struct hist_entry_iter' mem_info memory leak 'struct mem_info' is reference counted while 'struct branch_info' and he_cache (struct hist_entry **) are not. Break apart the priv field in 'struct hist_entry_iter' so that we can know which values are owned by the iter and do the appropriate free or put. Move hide_unresolved to marginally shrink the size of the now grown struct. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:06:44 -03:00
Ian Rogers	1a8c2e0177	perf mem-info: Add reference count checking Add reference count checking and switch 'struct mem_info' usage to use accessor functions. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:06:44 -03:00
Ian Rogers	ad3003a65a	perf mem-info: Move mem-info out of mem-events and symbol Move mem-info to its own header rather than having it split between mem-events and symbol. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:06:44 -03:00
Ian Rogers	13ca628716	perf comm: Add reference count checking to 'struct comm_str' Reference count checking of an rbtree is troublesome as each pointer should have a reference, switch to using a sorted array. Remove an indirection by embedding the reference count with the string. Use pthread_once to safely initialize the comm_strs and reader writer mutex. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:06:44 -03:00
Ian Rogers	a8cd4766d9	perf cpumap: Remove refcnt from 'struct cpu_aggr_map' It is assigned a value of 1 and never incremented. Remove and replace puts with delete. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:06:44 -03:00
Ian Rogers	557b32c343	perf block-info: Remove unused refcount block_info__get() has no callers so the refcount is only ever one. As such remove the reference counting logic and turn puts to deletes. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:06:44 -03:00
Ian Rogers	a3f7768bcf	perf annotate: Fix memory leak in annotated_source Freeing hash map doesn't free the entries added to the hashmap, add the missing free(). Fixes: `d3e7cad6f3` ("perf annotate: Add a hashmap for symbol histogram") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:06:43 -03:00
Ian Rogers	769e6a1e15	perf ui browser: Don't save pointer to stack memory ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: `05e8b0804e` ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 18:05:31 -03:00
He Zhe	d9180e23fb	perf bench internals inject-build-id: Fix trap divide when collecting just one DSO 'perf bench internals inject-build-id' suffers from the following error when only one DSO is collected. # perf bench internals inject-build-id -v Collected 1 DSOs traps: internals-injec[2305] trap divide error ip:557566ba6394 sp:7ffd4de97fe0 error:0 in perf[557566b2a000+23d000] Build-id injection benchmark Iteration #1 Floating point exception This patch removes the unnecessary minus one from the divisor which also corrects the randomization range. Signed-off-by: He Zhe <zhe.he@windriver.com> Fixes: `0bf02a0d80` ("perf bench: Add build-id injection benchmark") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240507065026.2652929-1-zhe.he@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:44:02 -03:00
Arnaldo Carvalho de Melo	b78854e5c0	perf probe: Use zfree() to avoid possibly accessing dangling pointers When freeing a->b it is good practice to set a->b to NULL using zfree(&a->b) so that when we have a bug where a reference to a freed 'a' pointer is kept somewhere, we can more quickly cause a segfault if some code tries to use a->b. Convert one such case in the 'perf probe' codebase. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZjpBnkL2wO3QJa5W@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:44:02 -03:00
James Clark	ee73fe99f7	perf auxtrace: Allow number of queues to be specified Currently it's only possible to initialize with the default number of queues and then use auxtrace_queues__add_event() to grow the array. But that's problematic if you don't have a real event to pass into that function yet. The queues hold a void *priv member to store custom state, and for Coresight we want to create decoders upfront before receiving data, so add a new function that allows pre-allocating queues. One reason to do this is because we might need to store metadata (HW_ID events) that effects other queues, but never actually receive auxtrace data on that queue. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Clevenger <scclevenger@os.amperecomputing.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240429152207.479221-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:44:02 -03:00
James Clark	0d2e3f2511	perf cs-etm: Print error for new PERF_RECORD_AUX_OUTPUT_HW_ID versions The likely fix for this is to update perf so print a helpful message. Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Acked-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Clevenger <scclevenger@os.amperecomputing.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240429152207.479221-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:44:02 -03:00
Athira Rajeev	36e8aa90fd	perf annotate: Fix a comment about multi_regs in extract_reg_offset function Fix a comment in function which explains how multi_regs field gets set for an instruction. In the example, "mov %rsi, 8(%rbx,%rcx,4)", the comment mistakenly referred to "dst_multi_regs = 0". Correct it to use "src_multi_regs = 0" Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/r/20240506121906.76639-4-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:44:02 -03:00
Arnaldo Carvalho de Melo	07fde75306	perf kwork: Use zfree() to avoid possibly accessing dangling pointers When freeing a->b it is good practice to set a->b to NULL using zfree(&a->b) so that when we have a bug where a reference to a freed 'a' pointer is kept somewhere, we can more quickly cause a segfault if some code tries to use a->b. Convert one such case in the 'perf kwork' codebase. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/lkml/Zjmc5EiN6zmWZj4r@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:44:02 -03:00
Arnaldo Carvalho de Melo	54ef362e4d	perf callchain: Use zfree() to avoid possibly accessing dangling pointers When freeing a->b it is good practice to set a->b to NULL using zfree(&a->b) so that when we have a bug where a reference to a freed 'a' pointer is kept somewhere, we can more quickly cause a segfault if some code tries to use a->b. Convert one such case in the callchain code. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZjmcGobQ8E52EyjJ@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:44:02 -03:00
Arnaldo Carvalho de Melo	69fb6eab19	perf annotate: Use zfree() to avoid possibly accessing dangling pointers When freeing a->b it is good practice to set a->b to NULL using zfree(&a->b) so that when we have a bug where a reference to a freed 'a' pointer is kept somewhere, we can more quickly cause a segfault if some code tries to use a->b. This is mostly done but some new cases were introduced recently, convert them to zfree(). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZjmbHHrjIm5YRIBv@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-07 12:43:53 -03:00
Ian Rogers	37862d6fdc	perf dso: Use container_of() to avoid a pointer in 'struct dso_data' The dso pointer in 'struct dso_data' is necessary for reference count checking to account for the dso_data forming a global list of open dso's with references to the dso. The dso pointer also allows for the indirection that reference count checking needs. Outside of reference count checking the indirection isn't needed and container_of() is more efficient and saves space. The reference count won't be increased by placing items onto the global list, matching how things were before the reference count checking change, but we assert the dso is in dsos holding it live (and that the set of open dsos is a subset of all dsos for the machine). Update the DSO data tests so that they use a dsos struct to make the invariant true. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20240506180104.485674-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 16:08:31 -03:00
Ian Rogers	23106e3188	perf symbol-elf: dso__load_sym_internal() reference count fixes dso__load_sym_internal() passed curr_mapp as an out argument to dso__process_kernel_symbol(). The out argument was never used so remove it to simplify the reference counting logic. Simplify reference counting issues with curr_dso by ensuring the value it points to has a +1 reference count, and then putting as necessary. This avoids some reference counting games when the dso is created making the code more obviously correct with some possible introduced overhead due to the reference counting get/puts. This, however, silences reference count checking and we can always optimize from a seemingly correct point. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20240506180104.485674-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 16:07:30 -03:00
Ian Rogers	ee5061f824	perf symbol-elf: Ensure dso__put() in machine__process_ksymbol_register() The dso__put() after the map creation causes a use after put in dso__set_loaded(). To ensure there is a +1 reference count on both sides of the if-else, do a dso__get() on the found map's dso. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20240506180104.485674-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 16:06:29 -03:00
Ian Rogers	7fdc33f842	perf map: Add missing dso__put() in map__new() A dso__put() is needed for the dsos__find() when the map is created and a buildid is sought. Fixes: `f649ed80f3` ("perf dsos: Tidy reference counting and locking") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20240506180104.485674-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 15:36:51 -03:00
Ian Rogers	ee756ef749	perf dso: Add reference count checking and accessor functions Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCESS everywhere, add accessor functions for the variables in struct dso. The majority of the change is mechanical in nature and not easy to split up. Committer testing: 'perf test' up to this patch shows no regressions. But: util/symbol.c: In function ‘dso__load_bfd_symbols’: util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’ 1683 \| dso__set_adjust_symbols(dso); \| ^~~~~~~~~~~~~~~~~~~~~~~ In file included from util/symbol.c:21: util/dso.h:268:20: note: declared here 268 \| static inline void dso__set_adjust_symbols(struct dso dso, bool val) \| ^~~~~~~~~~~~~~~~~~~~~~~ make[6]: [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1 MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/ make[6]: * Waiting for unfinished jobs.... This was updated: - symbols__fixup_end(&dso->symbols, false); - symbols__fixup_duplicate(&dso->symbols); - dso->adjust_symbols = 1; + symbols__fixup_end(dso__symbols(dso), false); + symbols__fixup_duplicate(dso__symbols(dso)); + dso__set_adjust_symbols(dso); But not build tested with BUILD_NONDISTRO and libbfd devel files installed (binutils-devel on fedora). Add the missing argument: symbols__fixup_end(dso__symbols(dso), false); symbols__fixup_duplicate(dso__symbols(dso)); - dso__set_adjust_symbols(dso); + dso__set_adjust_symbols(dso, true); Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240504213803.218974-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 15:28:49 -03:00
Ian Rogers	7a9418cf7f	perf dsos: Switch hand crafted code to bsearch() Switch to using the bsearch library function rather than having a hand written binary search. Const-ify some static functions to avoid compiler warnings. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240504213803.218974-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 10:41:32 -03:00
Ian Rogers	7410d6008d	perf dsos: Remove __dsos__findnew_link_by_longname_id() Function was only called in dsos.c with the dso parameter as NULL. Remove the function and specialize for the dso being NULL case removing other unused functions along the way. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240504213803.218974-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 09:33:37 -03:00
Ian Rogers	dfd48165bb	perf dsos: Remove __dsos__addnew() Function no longer used so remove. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240504213803.218974-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 09:33:05 -03:00
Ian Rogers	3f4ac23a99	perf dsos: Switch backing storage to array from rbtree/list DSOs were held on a list for fast iteration and in an rbtree for fast finds. Switch to using a lazily sorted array where iteration is just iterating through the array and binary searches are the same complexity as searching the rbtree. The find may need to sort the array first which does increase the complexity, but add operations have lower complexity and overall the complexity should remain about the same. The set name operations on the dso just records that the array is no longer sorted, avoiding complexity in rebalancing the rbtree. Tighter locking discipline is enforced to avoid the array being resorted while long and short names or ids are changed. The array is smaller in size, replacing 6 pointers with 2, and so even with extra allocated space in the array, the array may be 50% unoccupied, the memory saving should be at least 2x. Committer testing: On a previous version of this patchset we were getting a lot of warnings about deleting a DSO still on a list, now it is ok: root@x1:~# perf probe -l root@x1:~# perf probe finish_task_switch Added new event: probe:finish_task_switch (on finish_task_switch) You can now use it in all perf tools, such as: perf record -e probe:finish_task_switch -aR sleep 1 root@x1:~# perf probe -l probe:finish_task_switch (on finish_task_switch@kernel/sched/core.c) root@x1:~# perf trace -e probe:finish_task_switch/max-stack=8/ --max-events=1 0.000 migration/0/19 probe:finish_task_switch(__probe_ip: -1894408688) finish_task_switch.isra.0 ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) smpboot_thread_fn ([kernel.kallsyms]) kthread ([kernel.kallsyms]) ret_from_fork ([kernel.kallsyms]) ret_from_fork_asm ([kernel.kallsyms]) root@x1:~# root@x1:~# perf probe -d probe:* Removed event: probe:finish_task_switch root@x1:~# perf probe -l root@x1:~# I also ran the full 'perf test' suite after applying this one, no regressions. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240504213803.218974-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-06 09:13:11 -03:00
Sandipan Das	77a70f8075	perf vendor events amd: Add Zen 5 mapping Add a regular expression in the map file so that appropriate JSON event files are used for AMD Zen 5 processors belonging to Family 1Ah. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/862a6b683755601725f9081897a850127d085ace.1714717230.git.sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-04 15:10:07 -03:00
Sandipan Das	a9fe4ac7a3	perf vendor events amd: Add Zen 5 metrics Add metrics taken from Section 1.2 "Performance Measurement" of the Performance Monitor Counters for AMD Family 1Ah Model 00h-0Fh Processors document available at the link below. The recommended metrics are sourced from Table 1 "Guidance for Common Performance Statistics with Complex Event Selects". The pipeline utilization metrics are sourced from Table 2 "Guidance for Pipeline Utilization Analysis Statistics". These are useful for finding performance bottlenecks by analyzing activity at different stages of the pipeline. There are metric groups available for Level 1 and Level 2 analysis. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://bugzilla.kernel.org/attachment.cgi?id=305974 Link: https://lore.kernel.org/r/ee21ff77d89efa99997d3c2ebeeae22ddb6e7e12.1714717230.git.sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-04 15:10:04 -03:00
Sandipan Das	dc082ae618	perf vendor events amd: Add Zen 5 uncore events Add uncore events taken from Section 1.5 "L3 Cache Performance Monitor Counters" and Section 2 "UMC Performance Monitors" of the Performance Monitor Counters for AMD Family 1Ah Model 00h-0Fh Processors document available at the link below. This constitutes events which capture L3 cache and UMC command activity. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://bugzilla.kernel.org/attachment.cgi?id=305974 Link: https://lore.kernel.org/r/e11e8d9d1af34a0fb565fc9d1c4a05f569c39ddc.1714717230.git.sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-04 15:09:48 -03:00
Sandipan Das	45c072f253	perf vendor events amd: Add Zen 5 core events Add core events taken from Section 1.4 "Core Performance Monitor Counters" of the Performance Monitor Counters for AMD Family 1Ah Model 00h-0Fh Processors document available at the link below. This constitutes events which capture information on op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://bugzilla.kernel.org/attachment.cgi?id=305974 Link: https://lore.kernel.org/r/668d194241bf0d42dc37f1c5af8131069a0bd82c.1714717230.git.sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-04 15:09:33 -03:00
Ian Rogers	8f283fb7b8	perf trace: Disable syscall augmentation with record Syscall augmentation is causing samples not to be written to the perf.data file with "perf trace record". Disabling augmentation is sub-optimal, but it beats having a totally broken perf trace record. Closes: https://lore.kernel.org/lkml/CAP-5=fV9Gd1Teak+EOcUSxe13KqSyfZyPNagK97GbLiOQRgGaw@mail.gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240216172357.65037-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-04 15:03:58 -03:00
Ian Rogers	7b6dd7a923	perf pmu: Assume sysfs events are always the same case Perf event names aren't case sensitive. For sysfs events the entire directory of events is read then iterated comparing names in a case insensitive way, most often to see if an event is present. Consider: $ perf stat -e inst_retired.any true The event inst_retired.any may be present in any PMU, so every PMU's sysfs events are loaded and then searched with strcasecmp to see if any match. This event is only present on the cpu PMU as a JSON event so a lot of events were loaded from sysfs unnecessarily just to prove an event didn't exist there. This change avoids loading all the events by assuming sysfs event names are always either lower or uppercase. It uses file exists and only loads the events when the desired event is present. For the example above, the number of openat calls measured by 'perf trace' on a tigerlake laptop goes from 325 down to 255. The reduction will be larger for machines with many PMUs, particularly replicated uncore PMUs. Ensure pmu_aliases_parse() is called before all uses of the aliases list, but remove some "pmu->sysfs_aliases_loaded" tests as they are now part of the function. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240502213507.2339733-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-03 17:08:20 -03:00
Ian Rogers	6debc5aa32	perf test pmu: Test all sysfs PMU event names are the same case Being either lower or upper case means event name probes can avoid scanning the directory doing case insensitive comparisons, just the lower or upper case version of the name can be checked for existence. For the majority of PMUs event names are all lower case, upper case names are present on S390. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240502213507.2339733-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-03 17:08:20 -03:00
Ian Rogers	18eb2ca8c1	perf test pmu: Add an eagerly loaded event test Allow events/aliases to be eagerly loaded for a PMU. Factor out the pmu_aliases_parse to allow this. Parse a test event and check it configures the attribute as expected. There is overlap with the parse-events tests, but this test is done with a PMU created in a temp directory and doesn't rely on PMUs in sysfs. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240502213507.2339733-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-03 17:08:20 -03:00
Ian Rogers	aa1551f299	perf test pmu: Refactor format test and exposed test APIs In tests/pmu.c, make a common utility that creates a PMU in a mkdtemp directory and uses regular PMU parsing logic to load that PMU. Formats must still be eagerly loaded as by default the PMU code assumes devices are going to be in sysfs. In util/pmu.[ch], hide perf_pmu__format_parse but add the eager argument to perf_pmu__lookup called by perf_pmus__add_test_pmu. Later patches will eagerly load other non-sysfs files when eager loading is enabled. In tests/pmu.c, rather than manually constructing a list of term arguments, just use the term parsing code from a string. Add more comments and debug logging. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240502213507.2339733-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-03 17:08:20 -03:00
Ian Rogers	97c48ea8ff	perf test pmu-events: Make it clearer that pmu-events tests JSON events Add JSON to the test name. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240502213507.2339733-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-03 17:08:04 -03:00
Namhyung Kim	3cdd98b42d	perf maps: Remove check_invariants() from maps__lock() I found that the debug build was a slowed down a lot by the maps lock code since it checks the invariants whenever it gets the pointer to the lock. This means it checks twice the invariants before and after the access. Instead, let's move the checking code within the lock area but after any modification and remove it from the read paths. This would remove (more than) half of the maps lock overhead. The time for perf report with a huge data file (200k+ of MMAP2 events). Non-debug Before After --------- -------- -------- 2m 43s 6m 45s 4m 21s Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240429225738.1491791-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-02 16:35:47 -03:00
Jakub Kicinski	e958da0ddb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c `66e13b615a` ("bpf: verifier: prevent userspace memory access") `d503a04f8b` ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 12:06:25 -07:00
James Clark	e3123079b9	perf cs-etm: Improve version detection and error reporting When the config validation functions are warning about ETMv3, they do it based on "not ETMv4". If the drivers aren't all loaded or the hardware doesn't support Coresight it will appear as "not ETMv4" and then Perf will print the error message "... not supported in ETMv3 ..." which is wrong and confusing. cs_etm_is_etmv4() is also misnamed because it also returns true for ETE because ETE has a superset of the ETMv4 metadata files. Although this was always done in the correct order so it wasn't a bug. Improve all this by making a single get version function which also handles not present as a separate case. Change the ETMv3 error message to only print when ETMv3 is detected, and add a new error message for the not present case. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Leo Yan <leo.yan@linux.dev> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240501135753.508022-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-02 11:34:41 -03:00
James Clark	bc5e0e1b93	perf cs-etm: Remove repeated fetches of the ETM PMU Most functions already have cs_etm_pmu, so it's a bit neater to pass it through rather than itr only to convert it again. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240501135753.508022-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-02 11:33:46 -03:00
James Clark	cbaf2c4f93	perf cs-etm: Use struct perf_cpu as much as possible The perf_cpu struct makes some iterators simpler and avoids some mistakes with interchanging CPU IDs with indexes etc. At the moment in this file the conversion to an integer is done somewhere in the middle of the call tree. Change it to delay the conversion to an int until the leaf functions. Some of the usage patterns are duplicated, so instead of changing them all, make cs_etm_get_ro() more reusable and use that everywhere. cs_etm_get_ro() didn't return an error before, but return one now so that it can also be used where an error is needed. Continue to ignore the error where it was already ignored. Use cs_etm_pmu_path_exists() instead of cs_etm_get_ro() in cs_etm_is_etmv4() because cs_etm_get_ro() prints a warning, but path exists is sufficient for this use case. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240501135753.508022-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-02 11:26:59 -03:00
Namhyung Kim	b7d4aacfc8	perf annotate-data: Check kind of stack variables I sometimes see ("unknown type") in the result and it was because it didn't check the type of stack variables properly during the instruction tracking. The stack can carry constant values (without type info) and if the target instruction is accessing the stack location, it resulted in the "unknown type". Maybe we could pick one of integer types for the constant, but it doesn't really mean anything useful. Let's just drop the stack slot if it doesn't have a valid type info. Here's an example how it got the unknown type. Note that 0xffffff48 = -0xb8. ----------------------------------------------------------- find data type for 0xffffff48(reg6) at ... CU for ... frame base: cfa=0 fbreg=6 scope: [2/2] (die:11cb97f) bb: [37 - 3a] var [37] reg15 type='int' size=0x4 (die:0x1180633) bb: [40 - 4b] mov [40] imm=0x1 -> reg13 var [45] reg8 type='sigset_t' size=0x8 (die:0x11a39ee) mov [45] imm=0x1 -> reg2 <--- here reg2 has a constant bb: [215 - 237] mov [218] reg2 -> -0xb8(stack) constant <--- and save it to the stack mov [225] reg13 -> -0xc4(stack) constant call [22f] find_task_by_vgpid call [22f] return -> reg0 type='struct task_struct' size=0x8 (die:0x11881e8) bb: [5c8 - 5cf] bb: [2fb - 302] mov [2fb] -0xc4(stack) -> reg13 constant bb: [13b - 14d] mov [143] 0xd50(reg3) -> reg5 type='struct task_struct*' size=0x8 (die:0xa31f3c) bb: [153 - 153] chk [153] reg6 offset=0xffffff48 ok=0 kind=0 fbreg <--- access here found by insn track: 0xffffff48(reg6) type-offset=0 type='G<EF>^K<F6><AF>U' size=0 (die:0xffffffffffffffff) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-02 11:06:23 -03:00
Namhyung Kim	af89e8f2bd	perf annotate-data: Handle multi regs in find_data_type_block() The instruction tracking should be the same for the both registers. Just do it once and compare the result with multi regs as with the previous patches. Then we don't need to call find_data_type_block() separately for each reg. Let's remove the 'reg' argument from the relevant functions. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240502060011.1838090-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-05-02 11:05:10 -03:00

... 4 5 6 7 8 ...

16659 Commits