mirror of
git://git.yoctoproject.org/linux-yocto.git
synced 2025-08-22 00:42:01 +02:00
master
9276 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
39d14c0dd6 |
Merge branch 'perf-tools' into perf-tools-next
To get some fixes in the perf test and JSON metrics into the development branch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
c7ba9d18ae |
perf srcline: Add missed addr2line closes
The child_process for addr2line sets in and out to -1 so that pipes
get created. It is the caller's responsibility to close the pipes,
finish_command doesn't do it. Add the missed closes.
Fixes:
|
||
![]() |
cbc917a1b0 |
perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].
perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention.
This patch add support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:
[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
Performance counter stats for 'system wide':
S56-D0-CLS158 4 1,321,521,570 LLC-load
S56-D0-CLS594 4 794,211,453 LLC-load
S56-D0-CLS1030 4 41,623 LLC-load
S56-D0-CLS1466 4 41,646 LLC-load
S56-D0-CLS1902 4 16,863 LLC-load
S56-D0-CLS2338 4 15,721 LLC-load
S56-D0-CLS2774 4 22,671 LLC-load
[...]
On a legacy system without cluster or cluster support, the output will
be look like:
[root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Performance counter stats for 'system wide':
S56-D0-CLS0 64 18,011,485 cycles
S7182-D0-CLS0 64 16,548,835 cycles
Note that this patch doesn't mix the cluster information in the outputs
of --per-core to avoid breaking any tools/scripts using it.
Note that perf recently supports "--per-cache" aggregation, but it's not
the same with the cluster although cluster CPUs may share some cache
resources. For example on my machine all clusters within a die share the
same L3 cache:
$ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-31
$ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3
[1] commit
|
||
![]() |
9a440bb2e2 |
perf tools: Remove misleading comments on map functions
When it converts sample IP to or from objdump-capable one, there's a
comment saying that kernel modules have DSO_SPACE__USER. But commit
|
||
![]() |
1eb3d924e3 |
perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()
slist needs to be freed in both error path and normal path in
thread_map__new_by_tid_str().
Fixes:
|
||
![]() |
94a830d7cc |
perf symbols: Slightly improve module file executable section mappings
Currently perf does not record module section addresses except for the .text section. In general that means perf cannot get module section mappings correct (except for .text) when loading symbols from a kernel module file. (Note using --kcore does not have this issue) Improve that situation slightly by identifying executable sections that use the same mapping as the .text section. That happens when an executable section comes directly after the .text section, both in memory and on file, something that can be determined by following the same layout rules used by the kernel, refer kernel layout_sections(). Note whether that happens is somewhat arbitrary, so this is not a final solution. Example from tracing a virtual machine process: Before: $ perf script | grep unknown CPU 0/KVM 1718 203.511270: 318341 cpu-cycles:P: ffffffffc13e8a70 [unknown] (/lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko) $ perf script -vvv 2>&1 >/dev/null | grep kvm.intel | grep 'noinstr.text\|ffff' Map: 0-7e0 41430 [kvm_intel].noinstr.text Map: ffffffffc13a7000-ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko After: $ perf script | grep 203.511270 CPU 0/KVM 1718 203.511270: 318341 cpu-cycles:P: ffffffffc13e8a70 vmx_vmexit+0x0 (/lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko) $ perf script -vvv 2>&1 >/dev/null | grep kvm.intel | grep 'noinstr.text\|ffff' Map: ffffffffc13a7000-ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko Reported-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240208085326.13432-3-adrian.hunter@intel.com |
||
![]() |
0bdfbd04c6 |
perf tools: Make it possible to see perf's kernel and module memory mappings
Dump kmaps if using 'perf --debug kmaps' or verbose > 2 (e.g. -vvv) for tools 'perf script' and 'perf report' if there is no browser. Example: $ perf --debug kmaps script 2>&1 >/dev/null | grep kvm.intel build id event received for /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko: 0691d75e10e72ebbbd45a44c59f6d00a5604badf [20] Map: 0-3a3 4f5d8 [kvm_intel].modinfo Map: 0-5240 5f280 [kvm_intel]__versions Map: 0-30 64 [kvm_intel].note.Linux Map: 0-14 644c0 [kvm_intel].orc_header Map: 0-5297 43680 [kvm_intel].rodata Map: 0-5bee 3b837 [kvm_intel].text.unlikely Map: 0-7e0 41430 [kvm_intel].noinstr.text Map: 0-2080 713c0 [kvm_intel].bss Map: 0-26 705c8 [kvm_intel].data..read_mostly Map: 0-5888 6a4c0 [kvm_intel].data Map: 0-22 70220 [kvm_intel].data.once Map: 0-40 705f0 [kvm_intel].data..percpu Map: 0-1685 41d20 [kvm_intel].init.text Map: 0-4b8 6fd60 [kvm_intel].init.data Map: 0-380 70248 [kvm_intel]__dyndbg Map: 0-8 70218 [kvm_intel].exit.data Map: 0-438 4f980 [kvm_intel]__param Map: 0-5f5 4ca0f [kvm_intel].rodata.str1.1 Map: 0-3657 493b8 [kvm_intel].rodata.str1.8 Map: 0-e0 70640 [kvm_intel].data..ro_after_init Map: 0-500 70ec0 [kvm_intel].gnu.linkonce.this_module Map: ffffffffc13a7000-ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko The example above shows how the module section mappings are all wrong except for the main .text mapping at 0xffffffffc13a7000. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240208085326.13432-2-adrian.hunter@intel.com |
||
![]() |
acfd65c894 |
tools: perf: Expose sample ID / stream ID to python scripts
perf script exposes the evsel_name to python scripts as part of the data passed to the sample or tracepoint handler function, and it passes the id and stream_id to the throttled/unthrottled handler functions. This makes matching throttle events and samples difficult. To make this possible, this change exposes the sample id and stream_id values to the script. Signed-off-by: Ben Gainey <ben.gainey@arm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: will@kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240123103137.1890779-2-ben.gainey@arm.com |
||
![]() |
fd7b8e8fb2 |
perf parse-events: Print all errors
Prior to this patch the first and the last error encountered during parsing are printed. To see other errors verbose needs enabling. Unfortunately this can drop useful errors, in particular on terms. This patch changes the errors so that instead of the first and last all errors are recorded and printed, the underlying data structure is changed to a list. Before: ``` $ perf stat -e 'slots/edge=2/' true event syntax error: 'slots/edge=2/' \___ Bad event or PMU Unable to find PMU or event on a PMU of 'slots' Initial error: event syntax error: 'slots/edge=2/' \___ Cannot find PMU `slots'. Missing kernel support? Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ``` After: ``` $ perf stat -e 'slots/edge=2/' true event syntax error: 'slots/edge=2/' \___ Bad event or PMU Unable to find PMU or event on a PMU of 'slots' event syntax error: 'slots/edge=2/' \___ value too big for format (edge), maximum is 1 event syntax error: 'slots/edge=2/' \___ Cannot find PMU `slots'. Missing kernel support? Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ``` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: tchen168@asu.edu Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240131134940.593788-3-irogers@google.com |
||
![]() |
f5144ecad7 |
perf parse-events: Improve error location of terms cloned from an event
A PMU event/alias will have a set of format terms that replace it when an event is parsed. The location of the terms is their position when parsed for the event/alias either from sysfs or json. This location is of little use when an event fails to parse as the error will be given in terms of the location in the string of events parsed not the json or sysfs string. Fix this by making the cloned terms location that of the event/alias. If a cloned term from an event/alias is invalid the bad format is hard to determine from the error string. Add the name of the bad format into the error string. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: tchen168@asu.edu Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240131134940.593788-2-irogers@google.com |
||
![]() |
1c84b47f99 |
perf report: Prevent segfault with --no-parent
Prevent a perf report segfault with the (non sensical) --no-parent option Signed-off-By: Andi Kleen <ak@linux.intel.com> Acked-by: Namhyung <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240130185552.150578-1-ak@linux.intel.com |
||
![]() |
4962aec0d6 |
perf evsel: Fix duplicate initialization of data->id in evsel__parse_sample()
data->id has been initialized at line 2362, remove duplicate initialization.
Fixes:
|
||
![]() |
20018398fc |
perf evsel: Rename get_states() to parse_task_states() and make it public
Since get_states() assumes the existence of libtraceevent, so move to where it should belong, i.e, util/trace-event-parse.c, and also rename it to parse_task_states(). Leave evsel_getstate() untouched as it fits well in the evsel category. Also make some necessary tweaks for python support, and get it verified with: perf test python. Signed-off-by: Ze Gao <zegao@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240123070210.1669843-2-zegao@tencent.com |
||
![]() |
7814fe24a6 |
perf evlist: Fix evlist__new_default() for > 1 core PMU
The 'Session topology' test currently fails with this message when evlist__new_default() opens more than one event: 32: Session topology : --- start --- templ file: /tmp/perf-test-vv5YzZ Using CPUID 0x00000000410fd070 Opening: unknown-hardware:HG ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xb00000000 disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4 Opening: unknown-hardware:HG ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5 non matching sample_type FAILED tests/topology.c:73 can't get session ---- end ---- Session topology: FAILED! This is because when re-opening the file and parsing the header, Perf expects that any file that has more than one event has the sample ID flag set. Perf record already sets the flag in a similar way when there is more than one event, so add the same logic to evlist__new_default(). evlist__new_default() is only currently used in tests, so I don't expect this change to have any other side effects. The other tests that use it don't save and re-open the file so don't hit this issue. The session topology test has been failing on Arm big.LITTLE platforms since commit |
||
![]() |
efe80f9c90 |
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
This is to get the changes from: |
||
![]() |
7bbe8f0071 |
perf tools: Fix calloc() arguments to address error introduced in gcc-14
the definition of calloc is as follows: void *calloc(size_t nmemb, size_t size); number of members is in the first parameter and the size is in the second parameter. Fix error messages on gcc 14 20240102: error: 'calloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Werror=calloc-transposed-args] Committer notes: I noticed this on fedora 40 and rawhide. Signed-off-by: Sun Haiyong <sunhaiyong@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240106094129.3337057-1-siyanteng@loongson.cn Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9d95c6be48 |
perf list: Switch error message to pr_err() to respect debug settings (-v)
Using printf() can interrupt 'perf list output', use pr_err() which can respect debug settings and the debug file. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Shirisha G <shirisha@linux.ibm.com> Link: https://lore.kernel.org/r/20240124043015.1388867-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
24852ef2e2 |
perf pmu: Treat the msr pmu as software
The msr PMU is a software one, meaning msr events may be grouped
with events in a hardware context. As the msr PMU isn't marked as a
software PMU by perf_pmu__is_software, groups with the msr PMU in
are broken and the msr events placed in a different group. This
may lead to multiplexing errors where a hardware event isn't
counted while the msr event, such as tsc, is. Fix all of this by
marking the msr PMU as software, which agrees with the driver.
Before:
```
$ perf stat -e '{slots,tsc}' -a true
WARNING: events were regrouped to match PMUs
Performance counter stats for 'system wide':
1,750,335 slots
4,243,557 tsc
0.001456717 seconds time elapsed
```
After:
```
$ perf stat -e '{slots,tsc}' -a true
Performance counter stats for 'system wide':
12,526,380 slots
3,415,163 tsc
0.001488360 seconds time elapsed
```
Fixes:
|
||
![]() |
63f209b6fa |
perf evlist: Fix evlist__new_default() for > 1 core PMU
The 'Session topology' test currently fails with this message when evlist__new_default() opens more than one event: 32: Session topology : --- start --- templ file: /tmp/perf-test-vv5YzZ Using CPUID 0x00000000410fd070 Opening: unknown-hardware:HG ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xb00000000 disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4 Opening: unknown-hardware:HG ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5 non matching sample_type FAILED tests/topology.c:73 can't get session ---- end ---- Session topology: FAILED! This is because when re-opening the file and parsing the header, Perf expects that any file that has more than one event has the sample ID flag set. Perf record already sets the flag in a similar way when there is more than one event, so add the same logic to evlist__new_default(). evlist__new_default() is only currently used in tests, so I don't expect this change to have any other side effects. The other tests that use it don't save and re-open the file so don't hit this issue. The session topology test has been failing on Arm big.LITTLE platforms since commit |
||
![]() |
821aca20be |
perf mem: Clean up perf_pmus__num_mem_pmus()
The number of mem PMUs can be calculated by searching the perf_pmus__scan_mem(). Remove the ARCH specific perf_pmus__num_mem_pmus() Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ravi.bangoria@amd.com Cc: james.clark@arm.com Cc: will@kernel.org Cc: mike.leach@linaro.org Cc: renyu.zj@linux.alibaba.com Cc: yuhaixin.yhx@linux.alibaba.com Cc: tmricht@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Cc: linux-arm-kernel@lists.infradead.org Cc: john.g.garry@oracle.com Link: https://lore.kernel.org/r/20240123185036.3461837-8-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
70f4b20d07 |
perf mem: Clean up perf_mem_events__record_args()
The current code iterates all memory PMUs. It doesn't matter if the system has only one memory PMU or multiple PMUs. The check of perf_pmus__num_mem_pmus() is not required anymore. The rec_tmp is not used in c2c and mem. Removing them as well. Suggested-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ravi.bangoria@amd.com Cc: james.clark@arm.com Cc: will@kernel.org Cc: mike.leach@linaro.org Cc: renyu.zj@linux.alibaba.com Cc: yuhaixin.yhx@linux.alibaba.com Cc: tmricht@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Cc: linux-arm-kernel@lists.infradead.org Cc: john.g.garry@oracle.com Link: https://lore.kernel.org/r/20240123185036.3461837-7-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
8ea9dfb916 |
perf mem: Clean up is_mem_loads_aux_event()
The aux_event can be retrieved from the perf_pmu now. Implement a generic support. Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: james.clark@arm.com Cc: will@kernel.org Cc: mike.leach@linaro.org Cc: renyu.zj@linux.alibaba.com Cc: yuhaixin.yhx@linux.alibaba.com Cc: tmricht@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Cc: linux-arm-kernel@lists.infradead.org Cc: john.g.garry@oracle.com Link: https://lore.kernel.org/r/20240123185036.3461837-6-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
db95c2ce9b |
perf mem: Clean up perf_mem_event__supported()
For some ARCHs, e.g., ARM and AMD, to get the availability of the mem-events, perf checks the existence of a specific PMU. For the other ARCHs, e.g., Intel and Power, perf has to check the existence of some specific events. The current perf only iterates the mem-events-supported PMUs. It's not required to check the existence of a specific PMU anymore. Rename sysfs_name to event_name, which stores the specific mem-events. Perf only needs to check those events for the availability of the mem-events. Rename perf_mem_event__supported to perf_pmu__mem_events_supported. Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: james.clark@arm.com Cc: will@kernel.org Cc: mike.leach@linaro.org Cc: renyu.zj@linux.alibaba.com Cc: yuhaixin.yhx@linux.alibaba.com Cc: tmricht@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Cc: linux-arm-kernel@lists.infradead.org Cc: john.g.garry@oracle.com Link: https://lore.kernel.org/r/20240123185036.3461837-5-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
abbdd79b78 |
perf mem: Clean up perf_mem_events__name()
Introduce a generic perf_mem_events__name(). Remove the ARCH-specific one. The mem_load events may have a different format. Add ldlat and aux_event in the struct perf_mem_event to indicate the format and the extra aux event. Add perf_mem_events_intel_aux[] to support the extra mem_load_aux event. Rename perf_mem_events__name to perf_pmu__mem_events_name. Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: james.clark@arm.com Cc: will@kernel.org Cc: mike.leach@linaro.org Cc: renyu.zj@linux.alibaba.com Cc: yuhaixin.yhx@linux.alibaba.com Cc: tmricht@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Cc: linux-arm-kernel@lists.infradead.org Cc: john.g.garry@oracle.com Link: https://lore.kernel.org/r/20240123185036.3461837-4-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
a30450e6a4 |
perf mem: Clean up perf_mem_events__ptr()
The mem_events can be retrieved from the struct perf_pmu now. An ARCH specific perf_mem_events__ptr() is not required anymore. Remove all of them. The Intel hybrid has multiple mem-events-supported PMUs. But they share the same mem_events. Other ARCHs only support one mem-events-supported PMU. In the configuration, it's good enough to only configure the mem_events for one PMU. Add perf_mem_events_find_pmu() which returns the first mem-events-supported PMU. In the perf_mem_events__init(), the perf_pmus__scan() is not required anymore. It avoids checking the sysfs for every PMU on the system. Make the perf_mem_events__record_args() more generic. Remove the perf_mem_events__print_unsupport_hybrid(). Since pmu is added as a new parameter, rename perf_mem_events__ptr() to perf_pmu__mem_events_ptr(). Several other functions also do a similar rename. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Kajol jain <kjain@linux.ibm.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: james.clark@arm.com Cc: will@kernel.org Cc: leo.yan@linaro.org Cc: mike.leach@linaro.org Cc: renyu.zj@linux.alibaba.com Cc: yuhaixin.yhx@linux.alibaba.com Cc: tmricht@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Cc: linux-arm-kernel@lists.infradead.org Cc: john.g.garry@oracle.com Link: https://lore.kernel.org/r/20240123185036.3461837-3-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
bb65acdc83 |
perf mem: Add mem_events into the supported perf_pmu
With the mem_events, perf doesn't need to read sysfs for each PMU to find the mem-events-supported PMU. The patch also makes it possible to clean up the related __weak functions later. The patch is only to add the mem_events into the perf_pmu for all ARCHs. It will be used in the later cleanup patches. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Leo Yan <leo.yan@linaro.org> Tested-by: Kajol Jain <kjain@linux.ibm.com> Suggested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: will@kernel.org Cc: mike.leach@linaro.org Cc: renyu.zj@linux.alibaba.com Cc: yuhaixin.yhx@linux.alibaba.com Cc: tmricht@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Cc: linux-arm-kernel@lists.infradead.org Cc: john.g.garry@oracle.com Link: https://lore.kernel.org/r/20240123185036.3461837-2-kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
df8bc77e4a |
perf util: Add evsel__taskstate() to parse the task state info instead
Now that we have the __prinf_flags() parsing routines, we add a new helper evsel__taskstate() to extract the task state info from the recorded data. Signed-off-by: Ze Gao <zegao@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240122070859.1394479-5-zegao@tencent.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
2f29a74f1d |
perf util: Add helpers to parse task state string from libtraceevent
Perf uses a hard coded string "RSDTtXZPI" to index the sched_switch prev_state field raw bitmask value. This works well except for when the kernel changes this string, in which case this will break again. Instead we add a new way to parse task state string from tracepoint print format already recorded by perf, which eliminates the further dependencies with this hardcode and unmaintainable macro, and this is exactly what libtraceevent[1] does for now. So we borrow the print flags parsing logic from libtraceevent[1]. And in get_states(), we walk the print arguments until the __print_flags() for the target state field is found, and use that to build the states string for future parsing. [1]: https://lore.kernel.org/linux-trace-devel/20231224140732.7d41698d@rorschach.local.home/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Ze Gao <zegao@tencent.com> Link: https://lore.kernel.org/r/20240122070859.1394479-4-zegao@tencent.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
57c8f1073f |
perf data: Minor code style alignment cleanup
Minor code style alignment cleanup for perf_data__switch() and perf_data__write(). No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240119040304.3708522-4-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
02f9b50e04 |
perf record: Check conflict between '--timestamp-filename' option and pipe mode before recording
In pipe mode, no need to switch perf data output, therefore,
'--timestamp-filename' option should not take effect.
Check the conflict before recording and output WARNING.
In this case, the check pipe mode in perf_data__switch() can be removed.
Before:
# perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Dump -.2024011812110182 ]
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cycles:P'
# Event count (approx.): 2176784359
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ......................................
#
97.83% perf perf [.] noploop
#
# (Tip: Print event counts in CSV format with: perf stat -x,)
#
After:
# perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1
WARNING: --timestamp-filename option is not available in pipe mode.
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cycles:P'
# Event count (approx.): 2185575421
#
# Overhead Command Shared Object Symbol
# ........ ....... ..................... .............................................
#
97.75% perf perf [.] noploop
#
# (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
#
Fixes:
|
||
![]() |
55442cc2f2 |
perf dwarf-aux: Check allowed DWARF Ops
The DWARF location expression can be fairly complex and it'd be hard to match it with the condition correctly. So let's be conservative and only allow simple expressions. For now it just checks the first operation in the list. The following operations looks ok: * DW_OP_stack_value * DW_OP_deref_size * DW_OP_deref * DW_OP_piece To refuse complex (and unsupported) location expressions, add check_allowed_ops() to compare the rest of the list. It seems earlier result contained those unsupported expressions. For example, I found some local struct variable is placed like below. <2><43d1517>: Abbrev Number: 62 (DW_TAG_variable) <43d1518> DW_AT_location : 15 byte block: 91 50 93 8 91 78 93 4 93 84 8 91 68 93 4 (DW_OP_fbreg: -48; DW_OP_piece: 8; DW_OP_fbreg: -8; DW_OP_piece: 4; DW_OP_piece: 1028; DW_OP_fbreg: -24; DW_OP_piece: 4) Another example is something like this. 0057c8be ffffffffffffffff ffffffff812109f0 (base address) 0057c8ce ffffffff812112b5 ffffffff812112c8 (DW_OP_breg3 (rbx): 0; DW_OP_constu: 18446744073709551612; DW_OP_and; DW_OP_stack_value) It should refuse them. After the change, the stat shows: Annotate data type stats: total 294, ok 158 (53.7%), bad 136 (46.3%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 53 : no_var 14 : no_typeinfo 7 : bad_offset Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240117062657.985479-10-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
bc10db8eb8 |
perf annotate-data: Support stack variables
Local variables are allocated in the stack and the location list should look like base register(s) and an offset. Extend the die_find_variable_by_reg() to handle the following expressions * DW_OP_breg{0..31} * DW_OP_bregx * DW_OP_fbreg Ususally DWARF subprogram entries have frame base information and use it to locate stack variable like below: <2><43d1575>: Abbrev Number: 62 (DW_TAG_variable) <43d1576> DW_AT_location : 2 byte block: 91 7c (DW_OP_fbreg: -4) <--- here <43d1579> DW_AT_name : (indirect string, offset: 0x2c00c9): i <43d157d> DW_AT_decl_file : 1 <43d157e> DW_AT_decl_line : 78 <43d157f> DW_AT_type : <0x43d19d7> I found some differences on saving the frame base between gcc and clang. The gcc uses the CFA to get the base so it needs to check the current frame's CFI info. In this case, stack offset needs to be adjusted from the start of the CFA. <1><1bb8d>: Abbrev Number: 102 (DW_TAG_subprogram) <1bb8e> DW_AT_name : (indirect string, offset: 0x74d41): kernel_init <1bb92> DW_AT_decl_file : 2 <1bb92> DW_AT_decl_line : 1440 <1bb94> DW_AT_decl_column : 18 <1bb95> DW_AT_prototyped : 1 <1bb95> DW_AT_type : <0xcc> <1bb99> DW_AT_low_pc : 0xffffffff81bab9e0 <1bba1> DW_AT_high_pc : 0x1b2 <1bba9> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa) <------ here <1bbab> DW_AT_call_all_calls: 1 <1bbab> DW_AT_sibling : <0x1bf5a> While clang sets it to a register directly and it can check the register and offset in the instruction directly. <1><43d1542>: Abbrev Number: 60 (DW_TAG_subprogram) <43d1543> DW_AT_low_pc : 0xffffffff816a7c60 <43d154b> DW_AT_high_pc : 0x98 <43d154f> DW_AT_frame_base : 1 byte block: 56 (DW_OP_reg6 (rbp)) <---------- here <43d1551> DW_AT_GNU_all_call_sites: 1 <43d1551> DW_AT_name : (indirect string, offset: 0x3bce91): foo <43d1555> DW_AT_decl_file : 1 <43d1556> DW_AT_decl_line : 75 <43d1557> DW_AT_prototyped : 1 <43d1557> DW_AT_type : <0x43c7332> <43d155b> DW_AT_external : 1 Also it needs to update the offset after finding the type like global variables since the offset was from the frame base. Factor out match_var_offset() to check global and local variables in the same way. The type stats are improved too: Annotate data type stats: total 294, ok 160 (54.4%), bad 134 (45.6%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 51 : no_var 14 : no_typeinfo 7 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
6fed025f11 |
perf dwarf-aux: Add die_get_cfa()
The die_get_cfa() is to get frame base register and offset at the given instruction address (pc). This info will be used to locate stack variables which have location expression using DW_OP_fbreg. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
5f7cdde843 |
perf annotate-data: Support global variables
Global variables are accessed using PC-relative address so it needs to be handled separately. The PC-rel addressing is detected by using DWARF_REG_PC. On x86, %rip register would be used. The address can be calculated using the ip and offset in the instruction. But it should start from the next instruction so add calculate_pcrel_addr() to do it properly. But global variables defined in a different file would only have a declaration which doesn't include a location list. So it first tries to get the type info using the address, and then looks up the variable declarations using name. The name of global variables should be get from the symbol table. The declaration would have the type info. So extend find_var_type() to take both address and name for global variables. The stat is now looks like: Annotate data type stats: total 294, ok 153 (52.0%), bad 141 (48.0%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 61 : no_var 10 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
83bfa06d08 |
perf annotate-data: Handle PC-relative addressing
Extend find_data_type_die() to find data type from PC-relative address using die_find_variable_by_addr(). Users need to pass the address for the (global) variable. The offset for the variable should be updated after finding the type because the offset in the instruction is just to calcuate the address for the variable. So it changed to pass a pointer to offset and renamed it to 'poffset'. First it searches variables in the CU DIE as it's likely that the global variables are defined in the file level. And then it iterates the scope DIEs to find a local (static) variable. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
7a54f1d83d |
perf annotate-data: Add stack operation pseudo type
A typical function prologue and epilogue include multiple stack operations to save and restore the current value of registers. On x86, it looks like below: push r15 push r14 push r13 push r12 ... pop r12 pop r13 pop r14 pop r15 ret As these all touches the stack memory region, chances are high that they appear in a memory profile data. But these are not used for any real purpose yet so it'd return no types. One of my profile type shows that non neglible portion of data came from the stack operations. It also seems GCC generates more stack operations than clang. Annotate Instruction stats total 264, ok 169 (64.0%), bad 95 (36.0%) Name : Good Bad ----------------------------------------------------------- movq : 49 27 movl : 24 9 popq : 0 19 <-- here cmpl : 17 2 addq : 14 1 cmpq : 12 2 cmpxchgl : 3 7 Instead of dealing them as unknown, let's create a seperate pseudo type to represent those stack operations separately. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
d3030191d3 |
perf annotate-data: Handle array style accesses
On x86, instructions for array access often looks like below. mov 0x1234(%rax,%rbx,8), %rcx Usually the first register holds the type information and the second one has the index. And the current code only looks up a variable for the first register. But it's possible to be in the other way around so it needs to check the second register if the first one failed. The stat changed like this. Annotate data type stats: total 294, ok 148 (50.3%), bad 146 (49.7%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 66 : no_var 10 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
1cf4df0373 |
perf annotate-data: Handle macro fusion on x86
When a sample was come from a conditional branch without a memory operand, it could be due to a macro fusion with a previous instruction. So it needs to check the memory operand in the previous one. This improves the stat like below: Annotate data type stats: total 294, ok 147 (50.0%), bad 147 (50.0%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 71 : no_var 6 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
a3397d69e4 |
perf annotate-data: Parse 'lock' prefix from llvm-objdump
For the performance reason, I prefer llvm-objdump over GNU's. But I found that llvm-objdump puts x86 lock prefix in a separate line like below. ffffffff81000695: f0 lock ffffffff81000696: ff 83 54 0b 00 00 incl 2900(%rbx) This should be parsed properly, but I just changed to find the insn with next offset for now. This improves the statistics as it can process more instructions. Annotate data type stats: total 294, ok 144 (49.0%), bad 150 (51.0%) ----------------------------------------------------------- 30 : no_sym 35 : no_mem_ops 71 : no_var 6 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
1e24ce402c |
perf db-export: Fix missing reference count get in call_path_from_sample()
The addr_location map and maps fields in the inner loop were missing
calls to map__get()/maps__get(). The subsequent addr_location__exit()
call in each loop puts the map/maps fields causing use-after-free
aborts.
This issue reproduces on at least arm64 and x86_64 with something
simple like `perf record -g ls` followed by `perf script -s script.py`
with the following script:
perf_db_export_mode = True
perf_db_export_calls = False
perf_db_export_callchains = True
def sample_table(*args):
print(f'sample_table({args})')
def call_path_table(*args):
print(f'call_path_table({args}')
Committer testing:
This test, just introduced by Ian Rogers, now passes, not segfaulting
anymore:
# perf test "perf script tests"
95: perf script tests : Ok
#
Fixes:
|
||
![]() |
f2567e12a0 |
perf stat: Fix hard coded LL miss units
Copy-paste error where LL cache misses are reported as l1i.
Fixes:
|
||
![]() |
9c51f8788b |
perf env: Avoid recursively taking env->bpf_progs.lock
Add variants of perf_env__insert_bpf_prog_info(), perf_env__insert_btf()
and perf_env__find_btf prefixed with __ to indicate the
env->bpf_progs.lock is assumed held.
Call these variants when the lock is held to avoid recursively taking it
and potentially having a thread deadlock with itself.
Fixes:
|
||
![]() |
58824fa008 |
perf annotate: Add --insn-stat option for debugging
This is for a debugging purpose. It'd be useful to see per-instrucion level success/failure stats. $ perf annotate --data-type --insn-stat Annotate Instruction stats total 264, ok 143 (54.2%), bad 121 (45.8%) Name : Good Bad ----------------------------------------------------------- movq : 45 31 movl : 22 11 popq : 0 19 cmpl : 16 3 addq : 8 7 cmpq : 11 3 cmpxchgl : 3 7 cmpxchgq : 8 0 incl : 3 3 movzbl : 4 2 incq : 4 2 decl : 6 0 ... Committer notes: So these are about being able to find the type for accesses from these instructions, we should improve the naming, but it is for debugging, we can improve this later: @@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he) continue; mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset); + if (mem_type) + istat->good++; + else + istat->bad++; Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
61a9741e9f |
perf annotate: Add --type-stat option for debugging
The --type-stat option is to be used with --data-type and to print detailed failure reasons for the data type annotation. $ perf annotate --data-type --type-stat Annotate data type stats: total 294, ok 116 (39.5%), bad 178 (60.5%) ----------------------------------------------------------- 30 : no_sym 40 : no_insn_ops 33 : no_mem_ops 63 : no_var 4 : no_typeinfo 8 : bad_offset Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
263925bf84 |
perf annotate: Add --data-type option
Support data type annotation with new --data-type option. It internally uses type sort key to collect sample histogram for the type and display every members like below. $ perf annotate --data-type ... Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples): ============================================================================ samples offset size field 13 0 640 struct cfs_rq { 2 0 16 struct load_weight load { 2 0 8 unsigned long weight; 0 8 4 u32 inv_weight; }; 0 16 8 unsigned long runnable_weight; 0 24 4 unsigned int nr_running; 1 28 4 unsigned int h_nr_running; ... For simplicity it prints the number of samples per field for now. But it should be easy to show the overhead percentage instead. The number at the outer struct is a sum of the numbers of the inner members. For example, struct cfs_rq got total 13 samples, and 2 came from the load (struct load_weight) and 1 from h_nr_running. Similarly, the struct load_weight got total 2 samples and they all came from the weight field. I've added two new flags in the symbol_conf for this. The annotate_data_member is to get the members of the type. This is also needed for perf report with typeoff sort key. The annotate_data_sample is to update sample stats for each offset and used only in annotate. Currently it only support stdio output mode, TUI support can be added later. Committer testing: With the perf.data from the previous csets, a very simple, short duration one: # perf annotate --data-type Annotate type: 'struct list_head' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 16 struct list_head { 0 0 8 struct list_head* next; 1 8 8 struct list_head* prev; }; Annotate type: 'char' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 1 char ; # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e2c1c8ff2d |
perf report: Add 'symoff' sort key
The symoff sort key is to print symbol and offset of sample. This is useful for data type profiling to show exact instruction in the function which refers the data. $ perf report -s type,sym,typeoff,symoff --hierarchy ... # Overhead Data Type / Symbol / Data Type Offset / Symbol Offset # .............. ..................................................... # 1.23% struct cfs_rq 0.84% update_blocked_averages 0.19% struct cfs_rq +336 (leaf_cfs_rq_list.next) 0.19% [k] update_blocked_averages+0x96 0.19% struct cfs_rq +0 (load.weight) 0.14% [k] update_blocked_averages+0x104 0.04% [k] update_blocked_averages+0x31c 0.17% struct cfs_rq +404 (throttle_count) 0.12% [k] update_blocked_averages+0x9d 0.05% [k] update_blocked_averages+0x1f9 0.08% struct cfs_rq +272 (propagate) 0.07% [k] update_blocked_averages+0x3d3 0.02% [k] update_blocked_averages+0x45b ... Committer testing: # perf report --stdio -s type,typeoff,symoff # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P' # Event count (approx.): 7 # # Overhead Data Type Data Type Offset Symbol Offset # ........ ......... ................ ............. # 42.86% struct list_head struct list_head +8 (prev) [k] __list_del_entry_valid_or_report+0x7 28.57% (unknown) (unknown) +0 (no field) [.] _nl_intern_locale_data+0x25 14.29% char char +0 (no field) [k] strncpy_from_user+0xa5 14.29% (unknown) (unknown) +0 (no field) [.] _dl_lookup_symbol_x+0x50 # # (Tip: To change sampling frequency to 100 Hz: perf record -F 100) # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-14-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
871304a79f |
perf report: Add 'typeoff' sort key
The typeoff sort key shows the data type name, offset and the name of the field. This is useful to see which field in the struct is accessed most frequently. $ perf report -s type,typeoff --hierarchy --stdio ... # Overhead Data Type / Data Type Offset # ............ ............................ # ... 1.23% struct cfs_rq 0.19% struct cfs_rq +404 (throttle_count) 0.19% struct cfs_rq +0 (load.weight) 0.19% struct cfs_rq +336 (leaf_cfs_rq_list.next) 0.09% struct cfs_rq +272 (propagate) 0.09% struct cfs_rq +196 (removed.nr) 0.09% struct cfs_rq +80 (curr) 0.09% struct cfs_rq +544 (lt_b_children_throttled) 0.06% struct cfs_rq +320 (rq) Committer testing: Again with the perf.data from the previous csets: # perf report --stdio -s type,typeoff # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P' # Event count (approx.): 7 # # Overhead Data Type Data Type Offset # ........ ......... ................ # 42.86% struct list_head struct list_head +8 (prev) 42.86% (unknown) (unknown) +0 (no field) 14.29% char char +0 (no field) # # (Tip: To see callchains in a more compact form: perf report -g folded) # # perf report --stdio -s dso,type,typeoff # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P' # Event count (approx.): 7 # # Overhead Shared Object Data Type Data Type Offset # ........ .................... ......... ................ # 42.86% [kernel.kallsyms] struct list_head struct list_head +8 (prev) 28.57% libc.so.6 (unknown) (unknown) +0 (no field) 14.29% [kernel.kallsyms] char char +0 (no field) 14.29% ld-linux-x86-64.so.2 (unknown) (unknown) +0 (no field) # # (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline) # # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-13-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9bd7ddd157 |
perf annotate-data: Update sample histogram for type
The annotated_data_type__update_samples() to get histogram for data type access. It'll be called by perf annotate to show which fields in the data type are accessed frequently. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4a111cadac |
perf annotate-data: Add member field in the data type
Add child member field if the current type is a composite type like a struct or union. The member fields are linked in the children list and do the same recursively if the child itself is a composite type. Add 'self' member to the annotated_data_type to handle the members in the same way. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
2f2c41bdd8 |
perf report: Add 'type' sort key
The 'type' sort key is to aggregate hist entries by data type they access. Add mem_type field to hist_entry struct to save the type. If hist_entry__get_data_type() returns NULL, it'd use the 'unknown_type' instance. Committer testing: Before: # perf mem record sleep 2s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.037 MB perf.data (4 samples) ] root@number:/home/acme/Downloads# perf report --stdio -s type Error: Unknown --sort key: `type' Usage: perf report [<options>] -s, --sort <key[,key2...]> sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period pid comm dso symbol parent cpu socket srcline srcfile local_weight weight transaction trace symbol_size dso_size cgroup cgroup_id ipc_null time code_page_size local_ins_lat ins_lat local_p_stage_cyc p_stage_cyc addr local_retire_lat retire_lat simd dso_from dso_to symbol_from symbol_to mispredict abort in_tx cycles srcline_from srcline_to ipc_lbr addr_from addr_to symbol_daddr dso_daddr locked tlb mem snoop dcacheline symbol_iaddr phys_daddr data_page_size blocked # After: # perf report --stdio -s type # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P' # Event count (approx.): 7 # # Overhead Data Type # ........ ......... # 100.00% (unknown) # # (Tip: Print event counts in CSV format with: perf stat -x,) # # rpm -q kernel-debuginfo kernel-debuginfo-6.6.4-200.fc39.x86_64 # uname -r 6.6.4-200.fc39.x86_64 # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org> Cc: linux-trace-devel@vger.kernel.org> Link: https://lore.kernel.org/r/20231213001323.718046-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
67bc54bbc5 |
perf annotate: Implement hist_entry__get_data_type()
It's the function to find out the type info from the given sample data and will be called from the hist_entry sort logic when 'type' sort key is used. It first calls objdump to disassemble the instructions and figure out information about memory access at the location. Maybe we can do it better by analyzing the instruction directly, but I'll leave it for later work. The memory access is determined by checking instruction operands to have "(" and then extract register name and offset. It'll return NULL if no data type is found. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3a0c26edc3 |
perf annotate: Add annotate_get_insn_location()
The annotate_get_insn_location() is to get the detailed information of instruction locations like registers and offset. It has source and target operands locations in an array. Each operand can have a register and an offset. The offset is meaningful when mem_ref flag is set. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
0669729eb0 |
perf annotate: Factor out evsel__get_arch()
The evsel__get_arch() is to get architecture info from the environment. It'll be used by other places later so let's factor it out. Also add arch__is() to check the arch info by name. Committer notes: "get" is usually associated with refcounting, so we better rename this at some point to a better name. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
fc044c53b9 |
perf annotate-data: Add dso->data_types tree
To aggregate accesses to the same data type, add 'data_types' tree in DSO to maintain data types and find it by name and size. It might have different data types that happen to have the same name, so it also compares the size of the type. Even if it doesn't 100% guarantee, it reduces the possibility of mis-handling of such conflicts. And I don't think it's common to have different types with the same name. Committer notes: Very few cases on the Linux kernel, but there are some different types with the same name, unsure if there is a debug mode in libbpf dedup that warns about such cases, but there are provisions in pahole for that, see: "emit: Notice type shadowing, i.e. multiple types with the same name (enum, struct, union, etc)" https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4f332dbfd02072e4f410db7bdcda8d6e3422974b $ pahole --compile > vmlinux.h $ rm -f a ; make a cc a.c -o a $ grep __[0-9] vmlinux.h union irte__1 { struct map_info__1; struct map_info__1 { struct map_info__1 * next; /* 0 8 */ $ drivers/iommu/amd/amd_iommu_types.h 'union irte' include/linux/dmar.h 'struct irte' include/linux/device-mapper.h: union map_info { void *ptr; }; include/linux/mtd/map.h: struct map_info { const char *name; unsigned long size; resource_size_t phys; <SNIP> kernel/events/uprobes.c: struct map_info { struct map_info *next; struct mm_struct *mm; unsigned long vaddr; }; Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b9c87f536c |
perf annotate-data: Add find_data_type() to get type from memory access
The find_data_type() is to get a data type from the memory access at the given address (IP) using a register and an offset. It requires DWARF debug info in the DSO and searches the list of variables and function parameters in the scope. In a pseudo code, it does basically the following: find_data_type(dso, ip, reg, offset) { pc = map__rip_2objdump(ip); CU = dwarf_addrdie(dso->dwarf, pc); scopes = die_get_scopes(CU, pc); for_each_scope(S, scopes) { V = die_find_variable_by_reg(S, pc, reg); if (V && V.type == pointer_type) { T = die_get_real_type(V); if (offset < T.size) return T; } } return NULL; } Committer notes: The 'size' variable in check_variable() is 64-bit, so use PRIu64 and inttypes.h to debug it. Ditto at find_data_type_die(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3eee606757 |
perf dwarf-regs: Add get_dwarf_regnum()
The get_dwarf_regnum() returns a DWARF register number from a register name string according to the psABI. Also add two pseudo encodings of DWARF_REG_PC which is a register that are used by PC-relative addressing and DWARF_REG_FB which is a frame base register. They need to be handled in a special way. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
60cb19b485 |
perf dwarf-aux: Factor out die_get_typename_from_type()
The die_get_typename_from_type() is to get the name of the given DIE in C-style type name. The difference from die_get_typename() is that it does not retrieve the DW_AT_type and use the given DIE directly. This will be used when users know the type DIE already. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7887097c65 |
perf maps: Fix up overlaps during fixup_end
Maps are sometimes made overlapping, in particular kernel maps. If the end of a map overlaps the start of the next, shorten the overlapping map. This should remove potential non-determinism in maps__find, ie finding maps by address. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-23-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
631bb236aa |
perf maps: Reduce scope of map_rb_node and maps internals
Avoid exposing the implementation of maps so that the internals can be refactored. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-22-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
75858007d1 |
perf maps: Add find next entry to give entry after the given map
Use to remove map_rb_node use from machine.c. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-21-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e77b0236cd |
perf maps: Add maps__load_first()
Avoid bpf_lock_contention_read touching the internal maps data structure by adding a helper function. As access is done directly on the map in maps, hold the read lock to stop it being removed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-20-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9084952704 |
perf maps: Rename clone to copy from
Rename maps__clone() to maps__copy_from() to be more intention revealing of its behavior. Pass the underlying maps rather than the thread. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-19-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
980d792721 |
perf maps: Do simple merge if given map doesn't overlap
Simplify merge in for the simple case of a non-overlapping map. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-18-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
07ef14d50c |
perf maps: Refactor maps__fixup_overlappings()
Rename to maps__fixup_overlap_and_insert() as the given mapping is always inserted. Factor out first_ending_after() as a utility function. Minor variable name changes. Switch to using debug_file() rather than passing a debug FILE*. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
ec49230cf6 |
perf debug: Expose debug file
Some dumping call backs need to be passed a FILE*. Expose debug file via an accessor API for a consistent way to do this. Catch the unlikely failure of it not being set. Switch two cases where stderr was being used instead of debug_file. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-16-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
8d5847a617 |
perf maps: Add remove maps function to remove a map based on callback
Removing maps wasn't being done under the write lock. Similar to maps__for_each_map(), iterate the entries but in this case remove the entry based on the result of the callback. If an entry was removed then maps_by_name() also needs updating, so add missed flush. In dso__load_kcore(), the test of map to save would always be false with REFCNT_CHECKING because of a missing RC_CHK_ACCESS/RC_CHK_EQUAL. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-15-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9cce3a161e |
perf maps: Reduce scope of maps__for_each_entry()
Reduce scope of maps__for_each_entry() as maps__for_each_map() is a safer alternative holding the maps lock during iteration. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
51ab715e2b |
perf vdso: Use function to add missing maps lock
Switch machine__thread_dso_type() from loop macro maps__for_each_entry() to maps__for_each_map() function that takes a callback. The function holds the maps lock, which should be held during iteration. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
16f533ade7 |
perf unwind: Use function to add missing maps lock
Switch read_unwind_spec_eh_frame() from loop macro
maps__for_each_entry() to maps__for_each_map() function that takes a
callback. The function holds the maps lock, which should be held during
iteration.
Committer notes:
Fixed up conflict with:
|
||
![]() |
ab1c247094 |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes that went thru perf-tools for v6.7 and to get in sync with upstream to check for drift in the copies of headers, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
71225af17f |
perf thread: Use function to add missing maps lock
Switch thread__prepare_access from loop macro maps__for_each_entry to maps__for_each_map function that takes a callback. The function holds the maps lock, which should be held during iteration. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
228493d0a8 |
perf synthetic-events: Use function to add missing maps lock
Switch perf_event__synthesize_modules from loop macro maps__for_each_entry to maps__for_each_map function that takes a callback. The function holds the maps lock, which should be held during iteration. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
111350c67d |
perf symbol: Use function to add missing maps lock
Switch do_validate_kcore_modules from loop macro maps__for_each_entry to maps__for_each_map function that takes a callback. The function holds the maps lock, which should be held during iteration. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
300b53d5b8 |
perf probe-event: Use function to add missing maps lock
Switch kernel_get_module_map from loop macro maps__for_each_entry to maps__for_each_map function that takes a callback. The function holds the maps lock, which should be held during iteration. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
2dc549b1dd |
perf machine: Use function to add missing maps lock
Switch machine__map_x86_64_entry_trampolines and machine__for_each_kernel_map from loop macro maps__for_each_entry to maps__for_each_map function that takes a callback. The function holds the maps lock, which should be held during iteration. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
19b5bd9a59 |
perf maps: Add maps__for_each_map to iterate maps holding the lock
The macro maps__for_each_entry is error prone as it doesn't require holding the maps lock. Add a new function that iterates the maps holding the read lock. Convert maps__find_symbol_by_name() and maps__fprintf() to use callbacks, the latter being an example of where the read lock wasn't being held. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5cc47ffba7 |
perf map: Improve map/unmap parameter names
The u64 values are either absolute or relative, try to hint better in the parameter names. Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231207011722.1220634-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3e0594f9f0 |
perf top: Avoid repeated function calls to perf_cpu_map__nr().
Add a local variable to avoid repeated calls to perf_cpu_map__nr(). Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20231129060211.1890454-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
0b4b785d1f |
perf evlist: Move event attributes to after the / when uniquefying using the PMU name
When turning an event with attributes to the format including the PMU we need to move the "event:attributes" format to "event/attributes/" so that we can copy the event displayed and use it in the command line, i.e. in 'perf top' we had: 1K cpu_atom/cycles:P/ 11K cpu_core/cycles:P/ If I try to use that on the command line: # perf top -e cpu_atom/cycles:P/ event syntax error: 'cpu_atom/cycles:P/' \___ Bad event or PMU Unable to find PMU or event on a PMU of 'cpu_atom' Initial error: event syntax error: 'cpu_atom/cycles:P/' \___ unknown term 'cycles:P' for pmu 'cpu_atom' valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,cmask,config,config1,config2,config3,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite ,driver-config,percore,aux-output,aux-sample-size,metric-id,raw,legacy-cache,hardware Run 'perf list' for a list of valid events Usage: perf top [<options>] -e, --event <event> event selector. use 'perf list' to list available events # Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Hector Martin <marcan@marcan.st> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZXxyanyZgWBTOnoK@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a61f89bf76 |
perf top: Uniform the event name for the hybrid machine
It's hard to distinguish the default cycles events among hybrid PMUs. For example, $ perf top Available samples 385 cycles:P 903 cycles:P The other tool, e.g., perf record, uniforms the event name and adds the hybrid PMU name before opening the event. So the events can be easily distinguished. Apply the same methodology for the perf top as well. The evlist__uniquify_name() will be invoked by both record and top. Move it to util/evlist.c With the patch: $ perf top Available samples 148 cpu_atom/cycles:P/ 1K cpu_core/cycles:P/ Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hector Martin <marcan@marcan.st> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231214144612.1092028-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4fb54994b2 |
perf unwind-libunwind: Fix base address for .eh_frame
The base address of a DSO mapping should start at the start of the file.
Usually DSOs are mapped from the pgoff 0 so it doesn't matter when it
uses the start of the map address.
But generated DSOs for JIT codes doesn't start from the 0 so it should
subtract the offset to calculate the .eh_frame table offsets correctly.
Fixes:
|
||
![]() |
c966d23a35 |
perf unwind-libdw: Handle JIT-generated DSOs properly
Usually DSOs are mapped from the beginning of the file, so the base
address of the DSO can be calculated by map->start - map->pgoff.
However, JIT DSOs which are generated by `perf inject -j`, are mapped
only the code segment. This makes unwind-libdw code confusing and
rejects processing unwinds in the JIT DSOs. It should use the map
start address as base for them to fix the confusion.
Fixes:
|
||
![]() |
1af478903f |
perf genelf: Set ELF program header addresses properly
The text section starts after the ELF headers so PHDR.p_vaddr and
others should have the correct addresses.
Fixes:
|
||
![]() |
6f33e6fa29 |
perf stat: Combine the -A/--no-aggr and --no-merge options
The -A or --no-aggr option disables aggregation of core events: $ perf stat -A -e cycles,data_total -a true Performance counter stats for 'system wide': CPU0 1,287,665 cycles CPU1 1,831,681 cycles CPU2 27,345,998 cycles CPU3 1,964,799 cycles CPU4 236,174 cycles CPU5 3,302,825 cycles CPU6 9,201,446 cycles CPU7 1,403,043 cycles CPU0 110.90 MiB data_total 0.008961761 seconds time elapsed The --no-merge option disables the aggregation of uncore events: $ perf stat --no-merge -e cycles,data_total -a true Performance counter stats for 'system wide': 38,482,778 cycles 15.04 MiB data_total [uncore_imc_free_running_1] 15.00 MiB data_total [uncore_imc_free_running_0] 0.005915155 seconds time elapsed Having two options confuses users who generally don't appreciate the difference in PMUs. Keep all the options but make it so they all disable aggregation both of core and uncore events: $ perf stat -A -e cycles,data_total -a true Performance counter stats for 'system wide': CPU0 85,878 cycles CPU1 88,179 cycles CPU2 60,872 cycles CPU3 3,265,567 cycles CPU4 82,357 cycles CPU5 83,383 cycles CPU6 84,156 cycles CPU7 220,803 cycles CPU0 2.38 MiB data_total [uncore_imc_free_running_0] CPU0 2.38 MiB data_total [uncore_imc_free_running_1] 0.001397205 seconds time elapsed Update the relevant 'perf stat' man page information. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kaige Ye <ye@kaige.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231214060256.2094017-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
1bc479d665 |
perf hisi-ptt: Fix one memory leakage in hisi_ptt_process_auxtrace_event()
ASan complains a memory leakage in hisi_ptt_process_auxtrace_event()
that the data buffer is not freed. Since currently we only support the
raw dump trace mode, the data buffer is used only within this function.
So fix this by freeing the data buffer before going out.
Fixes:
|
||
![]() |
813900d19b |
perf header: Fix one memory leakage in perf_event__fprintf_event_update()
When dump the raw trace by `perf report -D` ASan reports a memory
leakage in perf_event__fprintf_event_update().
It shows that we allocated a temporary cpumap for dumping the CPUs but
doesn't release it and it's not used elsewhere. Fix this by free the
cpumap after the dumping.
Fixes:
|
||
![]() |
effe957c6b |
libperf cpumap: Replace usage of perf_cpu_map__new(NULL) with perf_cpu_map__new_online_cpus()
Passing NULL to perf_cpu_map__new() performs perf_cpu_map__new_online_cpus(), just directly call perf_cpu_map__new_online_cpus() to be more intention revealing. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231129060211.1890454-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
923ca62a7b |
libperf cpumap: Rename perf_cpu_map__empty() to perf_cpu_map__has_any_cpu_or_is_empty()
The name perf_cpu_map_empty is misleading as true is also returned when the map contains an "any" CPU (aka dummy) map. Rename to perf_cpu_map__has_any_cpu_or_is_empty(), later changes will (re)introduce perf_cpu_map__empty() and perf_cpu_map__has_any_cpu(). Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231129060211.1890454-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
48219b089d |
libperf cpumap: Rename perf_cpu_map__dummy_new() to perf_cpu_map__new_any_cpu()
Rename perf_cpu_map__dummy_new() to perf_cpu_map__new_any_cpu() to better indicate this is creating a CPU map for the perf_event_open "any" CPU case. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231129060211.1890454-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
327f7533cc |
perf annotate: Get rid of local annotation options
It doesn't need the option in the struct annotation which is allocated for each symbol. It can directly use the global options and save 8 bytes per symbol. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
2fa21d694c |
perf annotate: Remove remaining usages of local annotation options
So that it can get rid of the unused data. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7f929aea21 |
perf annotate: Ensure init/exit for global options
Now it only cares about the global options so it can just handle it without the argument. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
22197fb296 |
perf ui/browser/annotate: Use global annotation_options
Now it can use the global options and no need save local browser options separately. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
41fd3cacd2 |
perf annotate: Use global annotation_options
Now it can directly use the global options and no need to pass it as an argument. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-5-namhyung@kernel.org [ Fixup build with GTK2=1 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c9a21a872c |
perf top: Convert to the global annotation_options
Use the global option and drop the local copy. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9d03194a36 |
perf annotate: Introduce global annotation_options
The annotation options are to control the behavior of objdump and the output. It's basically used by 'perf annotate' but 'perf report' and 'perf top' can call it on TUI dynamically. But it doesn't need to have a copy of annotation options in many places. As most of the work is done in the util/annotate.c file, add a global variable and set/use it instead of having their own copies. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
01261d8a0f |
perf thread: Add missing RC_CHK_EQUAL
Comparing pointers without RC_CHK_ACCESS means the indirect object will be compared rather than the underlying maps when REFCNT_CHECKING is enabled. Fix by adding missing RC_CHK_EQUAL. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231127220902.1315692-15-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
0f6ab6a3fb |
perf maps: Move symbol maps functions to maps.c
Move the find and certain other symbol maps__* functions to maps.c for better abstraction. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231127220902.1315692-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9fa688ea34 |
perf map: Simplify map_ip/unmap_ip and make 'struct map' smaller
When mapping an IP it is either an identity mapping or a DSO relative mapping, so a single bit is required in the struct to identify this. The current code uses function pointers, adding 2 pointers per map and also pushing the size of a map beyond 1 cache line. Switch to using a byte to identify the mapping type (as well as priv and erange_warned), to avoid any masking. Change struct maps's layout to avoid holes. Before: ``` struct map { u64 start; /* 0 8 */ u64 end; /* 8 8 */ _Bool erange_warned:1; /* 16: 0 1 */ _Bool priv:1; /* 16: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 20 4 */ u64 pgoff; /* 24 8 */ u64 reloc; /* 32 8 */ u64 (*map_ip)(const struct map *, u64); /* 40 8 */ u64 (*unmap_ip)(const struct map *, u64); /* 48 8 */ struct dso * dso; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ refcount_t refcnt; /* 64 4 */ u32 flags; /* 68 4 */ /* size: 72, cachelines: 2, members: 12 */ /* sum members: 68, holes: 1, sum holes: 3 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* last cacheline: 8 bytes */ }; ``` After: ``` struct map { u64 start; /* 0 8 */ u64 end; /* 8 8 */ u64 pgoff; /* 16 8 */ u64 reloc; /* 24 8 */ struct dso * dso; /* 32 8 */ refcount_t refcnt; /* 40 4 */ u32 prot; /* 44 4 */ u32 flags; /* 48 4 */ enum mapping_type mapping_type:8; /* 52: 0 4 */ /* Bitfield combined with next fields */ _Bool erange_warned; /* 53 1 */ _Bool priv; /* 54 1 */ /* size: 56, cachelines: 1, members: 11 */ /* padding: 1 */ /* last cacheline: 56 bytes */ }; ``` Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231127220902.1315692-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d0acce6828 |
perf symbols: Parse NOTE segments until the build id is found
In the ELF file, multiple NOTE segments may exist. To locate the build id, the process shall persist in parsing NOTE segments until the build id is found. Signed-off-by: Chengen Du <chengen.du@canonical.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231130135723.17562-1-chengen.du@canonical.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
eb2eac0c7b |
perf evsel: Fallback to "task-clock" when not system wide
When the "cycles" event isn't available evsel will fallback to the "cpu-clock" software event. "task-clock" is similar to "cpu-clock" but only runs when the process is running. Falling back to "cpu-clock" when not system wide leads to confusion, by falling back to "task-clock" it is hoped the confusion is less. Pass the target to determine if "task-clock" is more appropriate. Update a nearby comment and debug string for the change. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ajay Kaher <akaher@vmware.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Makhalov <amakhalov@vmware.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231121000420.368075-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a4320085a6 |
perf mem: Fix error on hybrid related to availability of mem event in a PMU
The below error can be triggered on a hybrid machine.
$ perf mem record -t load sleep 1
event syntax error: 'breakpoint/mem-loads,ldlat=30/P'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'breakpoint'
In the perf_mem_events__record_args(), the current perf never checks the
availability of a mem event on a given PMU. All the PMUs will be added
to the perf mem event list. Perf errors out for the unsupported PMU.
Extend perf_mem_event__supported() and take a PMU into account. Check
the mem event for each PMU before adding it to the perf mem event list.
Optimize the perf_mem_events__init() a little bit. The function is to
check whether the mem events are supported in the system. It doesn't
need to scan all PMUs. Just return with the first supported PMU is good
enough.
Fixes:
|
||
![]() |
e2b005d6ec |
perf metrics: Avoid segv if default metricgroup isn't set
A metric is default by having "Default" within its groups. The default
metricgroup name needn't be set and this can result in segv in
default_metricgroup_cmp and perf_stat__print_shadow_stats_metricgroup
that assume it has a value when there is a Default metric group. To
avoid the segv initialize the value to "".
Fixes:
|
||
![]() |
4acef67646 |
perf env: Cache the arch specific strerrno function in perf_env__arch_strerrno()
So that we don't have to go thru the series of strcmp(arch) calls for each id -> string translation. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20231201203046.486596-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
54373b5d53 |
perf env: Introduce perf_env__arch_strerrno()
That will cache the arch specific function translating error numbers to strings. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20231201203046.486596-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5940a20a18 |
perf mmap: Lazily initialize zstd streams to save memory when not using it
Zstd streams create dictionaries that can require significant RAM, especially when there is one per-CPU. Tools like 'perf record' won't use the streams without the -z option, and so the creation of the streams is pure overhead. Switch to creating the streams on first use. Committer notes: ssize_t comes from sys/types.h, size_t from stddef.h. This worked on glibc as stdlib.h includes both, but not on musl libc. So do what 'man size_t' says and include sys/types.h and stddef.h instead of stdlib.h Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231102175735.2272696-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d60469d7c0 |
perf dwarf-aux: Add die_find_variable_by_addr()
The die_find_variable_by_addr() is to find a variables in the given DIE using given (PC-relative) address. Global variables will have a location expression with DW_OP_addr which has an address so can simply compare it with the address. <1><143a7>: Abbrev Number: 2 (DW_TAG_variable) <143a8> DW_AT_name : loops_per_jiffy <143ac> DW_AT_type : <0x1cca> <143b0> DW_AT_external : 1 <143b0> DW_AT_decl_file : 193 <143b1> DW_AT_decl_line : 213 <143b2> DW_AT_location : 9 byte block: 3 b0 46 41 82 ff ff ff ff (DW_OP_addr: ffffffff824146b0) Note that the type-offset should be calculated from the base address of the global variable. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-33-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
08973307d2 |
perf annotate: Check if operand has multiple regs
It needs to check all possible information in an instruction. Let's add a field indicating if the operand has multiple registers. I'll be used to search type information like in an array access on x86 like: mov 0x10(%rax,%rbx,8), %rcx ------------- here Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231012035111.676789-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
70df07838f |
perf header: Fix segfault on build_mem_topology() error path
Do not increase the node count unless a node has been successfully read,
because it can lead to a segfault if an error occurs.
For example, if perf exceeds the open file limit in memory_node__read(),
which, on a test system, could be made to happen by setting the file limit
to exactly 32:
Before:
$ ulimit -n 32
$ perf mem record --all-user -- sleep 1
[ perf record: Woken up 1 times to write data ]
failed: can't open memory sysfs data
perf: Segmentation fault
Obtained 14 stack frames.
perf(sighandler_dump_stack+0x48) [0x55f4b1f59558]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f4ba1c42520]
/lib/x86_64-linux-gnu/libc.so.6(free+0x1e) [0x7f4ba1ca53fe]
perf(+0x178ff4) [0x55f4b1f48ff4]
perf(+0x179a70) [0x55f4b1f49a70]
perf(+0x17ef5d) [0x55f4b1f4ef5d]
perf(+0x85c0b) [0x55f4b1e55c0b]
perf(cmd_record+0xe1d) [0x55f4b1e5920d]
perf(cmd_mem+0xc96) [0x55f4b1e80e56]
perf(+0x130460) [0x55f4b1f00460]
perf(main+0x689) [0x55f4b1e427d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f4ba1c29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f4ba1c29e40]
perf(_start+0x25) [0x55f4b1e42a25]
Segmentation fault (core dumped)
$
After:
$ ulimit -n 32
$ perf mem record --all-user -- sleep 1
[ perf record: Woken up 1 times to write data ]
failed: can't open memory sysfs data
[ perf record: Captured and wrote 0.005 MB perf.data (11 samples) ]
$
Fixes:
|
||
![]() |
8aa1e6e29a |
perf report: Remove warning on missing raw data for s390
Command
# ./perf report -i /tmp/111 -D > /dev/null
emits an error message when a sample for event CRYPTO_ALL in the
perf.data file does not contain any raw data. This is ok. Do not
trigger this warning when the sample in the perf.data files does not
contain any raw data at all. Check for availability of raw data for all
events and return if none is available.
Output before:
# ./perf report -i /tmp/111 -D > /dev/null
Invalid CRYPTO_ALL raw data encountered
Invalid CRYPTO_ALL raw data encountered
Invalid CRYPTO_ALL raw data encountered
#
Output after:
# ./perf report -i /tmp/111 -D > /dev/null
#
Fixes:
|
||
![]() |
a24d9d9dc0 |
perf parse-events: Make legacy events lower priority than sysfs/JSON
The perf tool has previously made legacy events the priority so with or without a PMU the legacy event would be opened: $ perf stat -e cpu-cycles,cpu/cpu-cycles/ true Using CPUID GenuineIntel-6-8D-1 intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch Attempting to add event pmu 'cpu' with 'cpu-cycles,' that may result in non-fatal errors After aliases, add event pmu 'cpu' with 'cpu-cycles,' that may result in non-fatal errors Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 833967 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ ... Fixes to make hybrid/BIG.little PMUs behave correctly, ie as core PMUs capable of opening legacy events on each, removing hard coded "cpu_core" and "cpu_atom" Intel PMU names, etc. caused a behavioral difference on Apple/ARM due to latent issues in the PMU driver reported in: https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/ As part of that report Mark Rutland <mark.rutland@arm.com> requested that legacy events not be higher in priority when a PMU is specified reversing what has until this change been perf's default behavior. With this change the above becomes: $ perf stat -e cpu-cycles,cpu/cpu-cycles/ true Using CPUID GenuineIntel-6-8D-1 Attempt to add: cpu/cpu-cycles=0/ ..after resolving event: cpu/event=0x3c/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 827628 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 4 (PERF_TYPE_RAW) size 136 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ ... So the second event has become a raw event as /sys/devices/cpu/events/cpu-cycles exists. A fix was necessary to config_term_pmu in parse-events.c as check_alias expansion needs to happen after config_term_pmu, and config_term_pmu may need calling a second time because of this. config_term_pmu is updated to not use the legacy event when the PMU has such a named event (either from JSON or sysfs). The bulk of this change is updating all of the parse-events test expectations so that if a sysfs/JSON event exists for a PMU the test doesn't fail - a further sign, if it were needed, that the legacy event priority was a known and tested behavior of the perf tool. Reported-by: Hector Martin <marcan@marcan.st> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Hector Martin <marcan@marcan.st> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com [ Initialize the 'alias_rewrote_terms' variable to false to address a clang warning ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a4271827e6 |
perf cs-etm: Enable itrace option 'T'
Prior to Armv8.4, the feature FEAT_TRF is not supported by Arm CPUs. Consequently, the sysfs node 'ts_source' will not be set as 1 by the CoreSight ETM driver. On the other hand, the perf tool relies on the 'ts_source' node to determine whether the kernel timestamp is traced. Since the 'ts_source' is not set for Arm CPUs prior to Armv8.4, platforms in this case cannot utilize the traced timestamp as the kernel time. This patch enables the 'T' itrace option, which forcibly utilizes the traced timestamp as the kernel time. If users are aware that their working platform's Arm CoreSight shares the same counter with the kernel time, they can specify 'T' option to decode the traced timestamp as the kernel time. An usage example is: # perf record -e cs_etm// -- test_program # perf script --itrace=i10ibT # perf report --itrace=i10ibT Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231014074513.1668000-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
26218331f4 |
perf auxtrace: Add 'T' itrace option for timestamp trace
An AUX trace can contain timestamp, but in some situations, the hardware trace module (e.g. Arm CoreSight) cannot decide the traced timestamp is the same source with CPU's time, thus the decoder can not use the timestamp trace for samples. This patch introduces 'T' itrace option. If users know the platforms they are working on have the same time counter with CPUs, users can use this new option to tell a decoder for using timestamp trace as kernel time. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231014074513.1668000-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
cd38d6b5fa |
perf script perl: Fail check on dynamic allocation
Return ENOMEM when dynamic allocation failed. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20231120112356.8652-1-zhaimingbing@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b457c52607 |
perf script python: Fail check on dynamic allocation
Add PyList_New() Fail check in get_field_numeric_entry() function and dynamic allocation checking for set_regs_in_dict(), python_start_script(). Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: MichelleJin <shjy180909@gmail.com> Signed-off-by: Paran Lee <p4ranlee@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Austin Kim <austindh.kim@gmail.com> Cc: Honggyu Kim <honggyu.kp@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20231120223218.9036-1-p4ranlee@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a29ee6aea7 |
perf build: Ensure sysreg-defs Makefile respects output dir
Currently the sysreg-defs are written out to the source tree unconditionally, ignoring the specified output directory. Correct the build rule to emit the header to the output directory. Opportunistically reorganize the rules to avoid interleaving with the set of beauty make rules. Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231121192956.919380-3-oliver.upton@linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
29b8e94dcf |
perf lock contention: Fix a build error on 32-bit
Fix a build error on 32-bit system:
util/bpf_lock_contention.c: In function 'lock_contention_get_name':
util/bpf_lock_contention.c:253:50: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u64 {aka long long unsigned int}' [-Werror=format=]
snprintf(name_buf, sizeof(name_buf), "cgroup:%lu", cgrp_id);
~~^
%llu
cc1: all warnings being treated as errors
Fixes:
|
||
![]() |
b539deafba |
perf report: Add s390 raw data interpretation for PAI counters
Commit |
||
![]() |
c06547d020 |
perf probe: Convert to check dwarf_getcfi feature
Now it has a feature check for the dwarf_getcfi(), use it and convert the code to check HAVE_DWARF_CFI_SUPPORT definition. Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3f5928e461 |
perf dwarf-aux: Add die_find_variable_by_reg() helper
The die_find_variable_by_reg() will search for a variable or a parameter sub-DIE in the given scope DIE where the location matches to the given register. For the simplest and most common case, memory access usually happens with a base register and an offset to the field so the register holds a pointer in a variable or function parameter. Then we can find one if it has a location expression at the (instruction) address. This function only handles such a simple case for now. In this case, the expression has a DW_OP_regN operation where N < 32. If the register index (N) is greater than or equal to 32, DW_OP_regx operation with an operand which saves the value for the N would be used. It rejects expressions with more operations. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
981620fd27 |
perf dwarf-aux: Add die_get_scopes() alternative to dwarf_getscopes()
The die_get_scopes() returns the number of enclosing DIEs for the given address and it fills an array of DIEs like dwarf_getscopes(). But it doesn't follow the abstract origin of inlined functions as we want information of the concrete instance. This is needed to check the location of parameters and local variables properly. Users can check the origin separately if needed. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3796eba7c1 |
perf dwarf-aux: Move #else block of #ifdef HAVE_DWARF_GETLOCATIONS_SUPPORT code to the header file
It's a usual convention that the conditional code is handled in a header file. As I'm planning to add some more of them, let's move the current code to the header first. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a65e8c0b78 |
perf dwarf-aux: Fix die_get_typename() for void *
The die_get_typename() is to return a C-like type name from DWARF debug entry and it follows data type if the target entry is a pointer type. But I found that void pointers don't have the type attribute to follow and then the function returns an error for that case. This results in a broken type string for void pointer types. For example, the following type entries are pointer types. <1><48c>: Abbrev Number: 4 (DW_TAG_pointer_type) <48d> DW_AT_byte_size : 8 <48d> DW_AT_type : <0x481> <1><491>: Abbrev Number: 211 (DW_TAG_pointer_type) <493> DW_AT_byte_size : 8 <1><494>: Abbrev Number: 4 (DW_TAG_pointer_type) <495> DW_AT_byte_size : 8 <495> DW_AT_type : <0x49e> The first one at offset 48c and the third one at offset 494 have type information. Then they are pointer types for the referenced types. But the second one at offset 491 doesn't have the type attribute. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6f1b6291cf |
perf tools: Add util/debuginfo.[ch] files
Split debuginfo data structure and related functions into a separate file so that it can be used by other components than the probe-finder. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
fb7fd2a14a |
perf annotate: Move raw_comment and raw_func_start fields out of 'struct ins_operands'
Thoese two fields are used only for the jump_ops, so move them into the union to save some bytes. Also add jump__delete() callback not to free the fields as they didn't allocate new strings. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: WANG Rui <wangrui@loongson.cn> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
ded8c48497 |
perf annotate: Pass "-l" option to objdump conditionally
The "-l" option is to print line numbers in the objdump output. perf annotate TUI only can show the line numbers later but it causes big slow downs for the kernel binary. Similarly, showing source code also takes a long time and it already has an option to control it. $ time objdump ... -d -S -C vmlinux > /dev/null real 0m3.474s user 0m3.047s sys 0m0.428s $ time objdump ... -d -l -C vmlinux > /dev/null real 0m1.796s user 0m1.459s sys 0m0.338s $ time objdump ... -d -C vmlinux > /dev/null real 0m0.051s user 0m0.036s sys 0m0.016s As it's not needed for data type profiling, let's make it conditional so that it can skip the unnecessary work. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
dd678532f9 |
perf header: Additional note on AMD IBS for max_precise pmu cap
x86 core PMU exposes supported maximum precision level via max_precise PMU capability. Although, AMD core PMU does not support precise mode, certain core PMU events with precise_ip > 0 are allowed and forwarded to IBS OP PMU. Display a note about this in the 'perf report' header output and document the details in the perf-list man page. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231107083331.901-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6512b6aa23 |
perf bpf: Don't synthesize BPF events when disabled
If BPF sideband events are disabled on the command line, don't synthesize BPF events too. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Song Liu <song@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231102175735.2272696-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b753d48f53 |
perf annotate: Move offsets array from 'struct annotation' to 'struct annotated_source'
The offsets array keeps pointers to 'struct annotation_line' entries which are available in the 'struct annotated_source'. Let's move it to there. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
0aae4c99c5 |
perf annotate: Move some source code related fields from 'struct annotation' to 'struct annotated_source'
Some fields in the 'struct annotation' are only used with 'struct annotated_source' so better to be moved there in order to reduce memory consumption for other symbols. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
2b215ec71b |
perf annotate: Move max_coverage from 'struct annotation' to 'struct annotated_branch'
The max_coverage field is only used when branch stack info is available so it'd be natural to move to 'struct annotated_branch'. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b7f87e3259 |
perf annotate: Split branch stack cycles info from 'struct annotation'
The cycles info is only meaningful when sample has branch stacks. To save the memory for normal cases, move those fields to a new 'struct annotated_branch' and dynamically allocate it when needed. Also move cycles_hist from annotated_source as it's related here. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
de2c7eb59c |
perf annotate: Split branch stack cycles information out of 'struct annotation_line'
The cycles info is used only when branch stack is provided. Separate them from 'struct annotation_line' into a separate struct and lazy allocate them to save some memory. Committer notes: Make annotation__compute_ipc() check if the lazy allocation works, bailing out if so, its callers already do error checking and propagation. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9ffa6c7512 |
perf machine thread: Remove exited threads by default
'struct thread' values hold onto references to mmaps, DSOs, etc. When a
thread exits it is necessary to clean all of this memory up by removing
the thread from the machine's threads. Some tools require this doesn't
happen, such as auxtrace events, 'perf report' if offcpu events exist or
if a task list is being generated, so add a 'struct symbol_conf' member
to make the behavior optional. When an exited thread is left in the
machine's threads, mark it as exited.
This change relates to commit
|
||
![]() |
1a27fc0170 |
perf record: Lazy load kernel symbols
Commit
|
||
![]() |
9fbb4b0230 |
perf tools: Add branch counter knob
Add a new branch filter, "counter", for the branch counter option. It is used to mark the events which should be logged in the branch. If it is applied with the -j option, the counters of all the events should be logged in the branch. If the legacy kernel doesn't support the new branch sample type, switching off the branch counter filter. The stored counter values in each branch are displayed right after the regular branch stack information via perf report -D. Usage examples: # perf record -e "{branch-instructions,branch-misses}:S" -j any,counter Only the first event, branch-instructions, collect the LBR. Both branch-instructions and branch-misses are marked as logged events. The occurrences information of them can be found in the branch stack extension space of each branch. # perf record -e "{cpu/branch-instructions,branch_type=any/,cpu/branch-misses,branch_type=counter/}" Only the first event, branch-instructions, collect the LBR. Only the branch-misses event is marked as a logged event. Committer notes: I noticed 'perf test "Sample parsing"' failing, reported to the list and Kan provided a patch that checks if the evsel has a leader and that evsel->evlist is set, the comment in the source code further explains it. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tinghao Zhang <tinghao.zhang@intel.com> Link: https://lore.kernel.org/r/20231025201626.3000228-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
ac9cd7245f |
perf header: Support num and width of branch counters
To support the branch counters feature, the information of the maximum number of supported counters and the width of the counters is exposed in the sysfs caps folder. The perf tool can use the information to parse the logged counters in each branch. Store the information in the perf_env for later usage. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tinghao Zhang <tinghao.zhang@intel.com> Link: https://lore.kernel.org/r/20231025201626.3000228-7-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7ab89417ed |
perf tools changes for v6.7
Build ----- * Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on. It can be disabled by passing BUILD_BPF_SKEL=0 to make. * Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally. perf record ----------- * Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list. * Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint. perf lock contention -------------------- * Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option). $ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup 835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service * Add -G/--cgroup-filter option to see contention only for given cgroups. This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr. $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller 8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a * Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map. * Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after. perf kwork ---------- * Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items. * Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet). $ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> * Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below: $ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [| 4.60%] %Cpu1 [| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [| 3.81%] <SNIP> perf bench ---------- * Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups. Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches. $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.307 [sec] 3.078180 usecs/op 324867 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000': 200,026 context-switches 63 cgroup-switches 0.321637922 seconds time elapsed You can see small number of cgroup-switches because both write and read tasks are in the same cgroup. $ sudo mkdir /sys/fs/cgroup/{AAA,BBB} $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.351 [sec] 3.512990 usecs/op 284657 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB': 200,020 context-switches 200,019 cgroup-switches 0.365034567 seconds time elapsed Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more. * Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work. perf test --------- * Fix various shellcheck issues on the tests written in shell script. * Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs. Event parsing ------------- * Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case. * Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost. * Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits. For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure. $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Event metrics ------------- * Add "Compat" regex to match event with multiple identifiers. * Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne. Misc ---- * Assorted memory leak fixes and footprint reduction. * Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily. * Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer. * Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events. * Update bash shell completion for events and metrics. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZUMg7wAKCRCMstVUGiXM g8FvAQC9KED6H8rlH7UTvxE6fM947EJbldwGrNA1zGx++Ucd3gD/ewA2A6SUcIh6 Tua/XovmYOQbuDYOwlRHe+sdDag0sgg= =GrCE -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "Build: - Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on. This can be disabled by passing BUILD_BPF_SKEL=0 to make. - Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally. perf record: - Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list. - Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint. perf lock contention: - Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option). $ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup 835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service - Add -G/--cgroup-filter option to see contention only for given cgroups. This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr. $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller 8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a - Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map. - Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after. perf kwork: - Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items. - Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet). $ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> - Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below: $ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [| 4.60%] %Cpu1 [| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [| 3.81%] <SNIP> perf bench: - Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups. Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches. $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.307 [sec] 3.078180 usecs/op 324867 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000': 200,026 context-switches 63 cgroup-switches 0.321637922 seconds time elapsed You can see small number of cgroup-switches because both write and read tasks are in the same cgroup. $ sudo mkdir /sys/fs/cgroup/{AAA,BBB} $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.351 [sec] 3.512990 usecs/op 284657 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB': 200,020 context-switches 200,019 cgroup-switches 0.365034567 seconds time elapsed Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more. - Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work. perf test: - Fix various shellcheck issues on the tests written in shell script. - Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs. Event parsing: - Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case. - Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost. - Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits. For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure. $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Event metrics: - Add "Compat" regex to match event with multiple identifiers. - Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne. Misc: - Assorted memory leak fixes and footprint reduction. - Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily. - Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer. - Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events. - Update bash shell completion for events and metrics" * tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits) perf vendor events intel: Update tsx_cycles_per_elision metrics perf vendor events intel: Update bonnell version number to v5 perf vendor events intel: Update westmereex events to v4 perf vendor events intel: Update meteorlake events to v1.06 perf vendor events intel: Update knightslanding events to v16 perf vendor events intel: Add typo fix for ivybridge FP perf vendor events intel: Update a spelling in haswell/haswellx perf vendor events intel: Update emeraldrapids to v1.01 perf vendor events intel: Update alderlake/alderlake events to v1.23 perf build: Disable BPF skeletons if clang version is < 12.0.1 perf callchain: Fix spelling mistake "statisitcs" -> "statistics" perf report: Fix spelling mistake "heirachy" -> "hierarchy" perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit() perf tests: test_arm_coresight: Simplify source iteration perf vendor events intel: Add tigerlake two metrics perf vendor events intel: Add broadwellde two metrics perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit perf callchain: Minor layout changes to callchain_list perf callchain: Make brtype_stat in callchain_list optional ... |
||
![]() |
45b890f768 |
KVM/arm64 updates for 6.7
- Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZUFJRgAKCRCivnWIJHzd FtgYAP9cMsc1Mhlw3jNQnTc6q0cbTulD/SoEDPUat1dXMqjs+gEAnskwQTrTX834 fgGQeCAyp7Gmar+KeP64H0xm8kPSpAw= =R4M7 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.7 - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes |
||
![]() |
fed3a1be64 |
perf tools fixes for v6.6: 2nd batch
- Fix regression in reading scale and unit files from sysfs for PMU events, so that we can use that info to pretty print instead of printing raw numbers: # perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2 Performance counter stats for 'system wide': 1.64 Joules power/energy-ram/ 0.20 Joules power/energy-gpu/ 2.001228914 seconds time elapsed # # grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz # - The small llvm.cpp file used to check if the llvm devel files are present was incorrectly deleted when removing the BPF event in 'perf trace', put it back as it is also used by tools/bpf/bpftool, that uses llvm routines to do disassembly of BPF object files. - Fix use of addr_location__exit() in dlfilter__object_code(), making sure that it is only used to pair a previous addr_location__init() call. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZTKh5AAKCRCyPKLppCJ+ J/g/AP0f6SNyHJz21JzDTzyjXAeSdMzKwic0LXv+kATQy31HJAD+Kf7UKQieUeZB fxvp60aKyFN8IVIgpYiAjZMS3k9XPAY= =N7Gv -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' into perf-tools-next To get the latest fixes in the perf tools including perf stat output, dlfilter and LLVM feature detection. Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
ee40490dd7 |
perf callchain: Fix spelling mistake "statisitcs" -> "statistics"
There are a couple of spelling mistakes in perror messages. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20231027084633.1167530-1-colin.i.king@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
93c65d6143 |
perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit()
The changes in ("perf evsel: Rename evsel__increase_rlimit to
rlimit__increase_nofile") ended up breaking the python binding that now
references the rlimit__increase_nofile function, add the util/rlimit.o
to the tools/perf/util/python-ext-sources to cure that.
This was detected by the 'perf test python' regression test:
$ perf test python
14: 'import perf' in python : FAILED!
$ perf test -v python
Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
14: 'import perf' in python :
--- start ---
test child forked, pid 2912462
python usage test: "echo "import sys ; sys.path.insert(0, '/tmp/build/perf-tools-next/python'); import perf" | '/usr/bin/python3' "
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /tmp/build/perf-tools-next/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: rlimit__increase_nofile
test child finished with -1
---- end ----
'import perf' in python: FAILED!
$
Fixes:
|
||
![]() |
56e144fe98 |
perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
Fix leak where mem_info__put wouldn't release the maps/map as used by perf mem. Add exit functions and use elsewhere that the maps and map are released. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
dec07fe5d4 |
perf callchain: Minor layout changes to callchain_list
Avoid 6 byte hole for padding. Place more frequently used fields first in an attempt to use just 1 cacheline in the common case. Before: ``` struct callchain_list { u64 ip; /* 0 8 */ struct map_symbol ms; /* 8 24 */ struct { _Bool unfolded; /* 32 1 */ _Bool has_children; /* 33 1 */ }; /* 32 2 */ /* XXX 6 bytes hole, try to pack */ u64 branch_count; /* 40 8 */ u64 from_count; /* 48 8 */ u64 predicted_count; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 abort_count; /* 64 8 */ u64 cycles_count; /* 72 8 */ u64 iter_count; /* 80 8 */ u64 iter_cycles; /* 88 8 */ struct branch_type_stat * brtype_stat; /* 96 8 */ const char * srcline; /* 104 8 */ struct list_head list; /* 112 16 */ /* size: 128, cachelines: 2, members: 13 */ /* sum members: 122, holes: 1, sum holes: 6 */ }; ``` After: ``` struct callchain_list { struct list_head list; /* 0 16 */ u64 ip; /* 16 8 */ struct map_symbol ms; /* 24 24 */ const char * srcline; /* 48 8 */ u64 branch_count; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 from_count; /* 64 8 */ u64 cycles_count; /* 72 8 */ u64 iter_count; /* 80 8 */ u64 iter_cycles; /* 88 8 */ struct branch_type_stat * brtype_stat; /* 96 8 */ u64 predicted_count; /* 104 8 */ u64 abort_count; /* 112 8 */ struct { _Bool unfolded; /* 120 1 */ _Bool has_children; /* 121 1 */ }; /* 120 2 */ /* size: 128, cachelines: 2, members: 13 */ /* padding: 6 */ }; ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
6ba29fbb0b |
perf callchain: Make brtype_stat in callchain_list optional
struct callchain_list is 352bytes in size, 232 of which are brtype_stat. brtype_stat is only used for certain callchain_list items so make it optional, allocating when necessary. So that printing doesn't need to deal with an optional brtype_stat, pass an empty/zero version. Before: ``` struct callchain_list { u64 ip; /* 0 8 */ struct map_symbol ms; /* 8 24 */ struct { _Bool unfolded; /* 32 1 */ _Bool has_children; /* 33 1 */ }; /* 32 2 */ /* XXX 6 bytes hole, try to pack */ u64 branch_count; /* 40 8 */ u64 from_count; /* 48 8 */ u64 predicted_count; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 abort_count; /* 64 8 */ u64 cycles_count; /* 72 8 */ u64 iter_count; /* 80 8 */ u64 iter_cycles; /* 88 8 */ struct branch_type_stat brtype_stat; /* 96 232 */ /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */ const char * srcline; /* 328 8 */ struct list_head list; /* 336 16 */ /* size: 352, cachelines: 6, members: 13 */ /* sum members: 346, holes: 1, sum holes: 6 */ /* last cacheline: 32 bytes */ }; ``` After: ``` struct callchain_list { u64 ip; /* 0 8 */ struct map_symbol ms; /* 8 24 */ struct { _Bool unfolded; /* 32 1 */ _Bool has_children; /* 33 1 */ }; /* 32 2 */ /* XXX 6 bytes hole, try to pack */ u64 branch_count; /* 40 8 */ u64 from_count; /* 48 8 */ u64 predicted_count; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 abort_count; /* 64 8 */ u64 cycles_count; /* 72 8 */ u64 iter_count; /* 80 8 */ u64 iter_cycles; /* 88 8 */ struct branch_type_stat * brtype_stat; /* 96 8 */ const char * srcline; /* 104 8 */ struct list_head list; /* 112 16 */ /* size: 128, cachelines: 2, members: 13 */ /* sum members: 122, holes: 1, sum holes: 6 */ }; ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
d47d876d72 |
perf callchain: Make display use of branch_type_stat const
Display code doesn't modify the branch_type_stat so switch uses to const. This is done to aid refactoring struct callchain_list where current the branch_type_stat is embedded even if not used. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
67a3ebf1c3 |
perf offcpu: Add missed btf_free
Caught by address/leak sanitizer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
7b2e444b76 |
perf threads: Remove unused dead thread list
Commit
|
||
![]() |
c1149037f6 |
perf hist: Add missing puts to hist__account_cycles
Caught using reference count checking on perf top with
"--call-graph=lbr". After this no memory leaks were detected.
Fixes:
|
||
![]() |
78c32f4cb1 |
libperf rc_check: Add RC_CHK_EQUAL
Comparing pointers with reference count checking is tricky to avoid a SEGV. Add a convenience macro to simplify and use. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
ab8ce15078 |
perf machine: Avoid out of bounds LBR memory read
Running perf top with address sanitizer and "--call-graph=lbr" fails
due to reading sample 0 when no samples exist. Add a guard to prevent
this.
Fixes:
|
||
![]() |
7a8f349e9d |
perf rwsem: Add debug mode that uses a mutex
Mutex error check will capture trying to take the lock recursively and other problems that rwlock won't. At the expense of concurrency, adda debug mode that uses a mutex in place of a rwsem. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
b5711042a1 |
perf lock contention: Use per-cpu array map for spinlocks
Currently lock contention timestamp is maintained in a hash map keyed by pid. That means it needs to get and release a map element (which is proctected by spinlock!) on each contention begin and end pair. This can impact on performance if there are a lot of contention (usually from spinlocks). It used to go with task local storage but it had an issue on memory allocation in some critical paths. Although it's addressed in recent kernels IIUC, the tool should support old kernels too. So it cannot simply switch to the task local storage at least for now. As spinlocks create lots of contention and they disabled preemption during the spinning, it can use per-cpu array to keep the timestamp to avoid overhead in hashmap update and delete. In contention_begin, it's easy to check the lock types since it can see the flags. But contention_end cannot see it. So let's try to per-cpu array first (unconditionally) if it has an active element (lock != 0). Then it should be used and per-task tstamp map should not be used until the per-cpu array element is cleared which means nested spinlock contention (if any) was finished and it nows see (the outer) lock. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org |
||
![]() |
6a070573f2 |
perf lock contention: Check race in tstamp elem creation
When pelem is NULL, it'd create a new entry with zero data. But it might be preempted by IRQ/NMI just before calling bpf_map_update_elem() then there's a chance to call it twice for the same pid. So it'd be better to use BPF_NOEXIST flag and check the return value to prevent the race. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org |
||
![]() |
d99317f214 |
perf lock contention: Clear lock addr after use
It checks the current lock to calculated the delta of contention time. The address is saved in the tstamp map which is allocated at begining of contention and released at end of contention. But it's possible for bpf_map_delete_elem() to fail. In that case, the element in the tstamp map kept for the current lock and it makes the next contention for the same lock tracked incorrectly. Specificially the next contention begin will see the existing element for the task and it'd just return. Then the next contention end will see the element and calculate the time using the timestamp for the previous begin. This can result in a large value for two small contentions happened from time to time. Let's clear the lock address so that it can be updated next time even if the bpf_map_delete_elem() failed. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org |
||
![]() |
e093a222d7 |
perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile
evsel__increase_rlimit() helper does nothing with evsel, and description of the functionality is inaccurate, rename it and move to util/rlimit.c. By the way, fix a checkppatch warning about misplaced license tag: WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead #160: FILE: tools/perf/util/rlimit.h:3: /* SPDX-License-Identifier: LGPL-2.1 */ No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
c4a852635e |
perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir()
If using parallel threads to collect data, perf record needs at least 6 fds per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX) For an environment with more than 100 cores, if perf record uses both `-a` and `--threads` options, it is easy to exceed the upper limit of the file descriptor number, when we run out of them try to increase the limits. Before: $ ulimit -n 1024 $ lscpu | grep 'On-line CPU(s)' On-line CPU(s) list: 0-159 $ perf record --threads -a sleep 1 Failed to create data directory: Too many open files After: $ ulimit -n 1024 $ lscpu | grep 'On-line CPU(s)' On-line CPU(s) list: 0-159 $ perf record --threads -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
5069211e2f |
perf trace: Use the right bpf_probe_read(_str) variant for reading user data
Perf test case 111 Check open filename arg using perf trace + vfs_getname
fails on s390. This is caused by a failing function
bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c.
The root cause is the lookup by address. Function bpf_probe_read()
is used. This function works only for architectures
with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
On s390 is not possible to determine from the address to which
address space the address belongs to (user or kernel space).
Replace bpf_probe_read() by bpf_probe_read_kernel()
and bpf_probe_read_str() by bpf_probe_read_user_str() to
explicity specify the address space the address refers to.
Output before:
# ./perf trace -eopen,openat -- touch /tmp/111
libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function sys_enter#75
0: R1=ctx(off=0,imm=0) R10=fp0
; int sys_enter(struct syscall_enter_args *args)
0: (bf) r6 = r1 ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
; return bpf_get_current_pid_tgid();
1: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
2: (63) *(u32 *)(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm
3: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
;
.....
lines deleted here
.....
23: (bf) r3 = r6 ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0)
24: (85) call bpf_probe_read#4
unknown func bpf_probe_read#4
processed 23 insns (limit 1000000) max_states_per_insn 0 \
total_states 2 peak_states 2 mark_read 2
-- END PROG LOAD LOG --
libbpf: prog 'sys_enter': failed to load: -22
libbpf: failed to load object 'augmented_raw_syscalls_bpf'
libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22
....
Output after:
# ./perf test -Fv 111
111: Check open filename arg using perf trace + vfs_getname :
--- start ---
1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \
"/tmp/temporary_file.SWH85", \
flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
---- end ----
Check open filename arg using perf trace + vfs_getname: Ok
#
Test with the sleep command shows:
Output before:
# ./perf trace -e *sleep sleep 1.234567890
0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \
{ .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0
#
Output after:
# ./perf trace -e *sleep sleep 1.234567890
0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \
{ .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0
#
Fixes:
|
||
![]() |
e2bdd172e6 |
perf build: Generate arm64's sysreg-defs.h and add to include path
Start generating sysreg-defs.h in anticipation of updating sysreg.h to a version that needs the generated output. Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231011195740.3349631-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev> |
||
![]() |
1f36b190ad |
perf tools: Do not ignore the default vmlinux.h
The recent change made it possible to generate vmlinux.h from BTF and
to ignore the file. But we also have a minimal vmlinux.h that will be
used by default. It should not be ignored by GIT.
Fixes:
|
||
![]() |
0197da7aff |
perf pmu: Lazily compute default config
The default config is computed during creation of the PMU and may do things like scanning sysfs, when the PMU may just be used as part of scanning. Change default_config to perf_event_attr_init_default, a callback that is used when a default config needs initializing. This avoids holding onto the memory for a perf_event_attr and copying. On a tigerlake laptop running the pmu-scan benchmark: Before: Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 28.780 usec (+- 0.503 usec) Average PMU scanning took: 283.480 usec (+- 18.471 usec) Number of openat syscalls: 30,227 After: Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 27.880 usec (+- 0.169 usec) Average PMU scanning took: 245.260 usec (+- 15.758 usec) Number of openat syscalls: 28,914 Over 3 runs it is a nearly 12% reduction in execution time and a 4.3% of openat calls. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
63883cb063 |
perf pmu: Const-ify perf_pmu__config_terms
Add const to related APIs, this is so they can be used to default initialize a perf_event_attr from a const pmu. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
3a42f4c796 |
perf pmu: Const-ify file APIs
File APIs don't alter the struct pmu so allow const ones to be passed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
aa61360155 |
perf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init
Assign default_config as part of the init. perf_pmu__get_default_config was doing more than just getting the default config and so this is intended to better align with the code. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
661ce78105 |
perf intel-pt: Prefer get_unaligned_le64 to memcpy_le64
Use get_unaligned_le64() instead of memcpy_le64(..., 8) because it produces simpler code. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
3b4fa67fc6 |
perf intel-pt: Use get_unaligned_le16() etc
Avoid unaligned access by using get_unaligned_le16(), get_unaligned_le32() and get_unaligned_le64(). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-5-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
f058fa5b07 |
perf intel-pt: Use existing definitions of le16_to_cpu() etc
Use definitions from tools/include/linux/kernel.h Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-4-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
1d2dbce9bb |
perf intel-pt: Simplify intel_pt_get_vmcs()
Simplify and remove unnecessary constant expressions. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-3-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
a16afcc58a |
perf cs-etm: Fix incorrect or missing decoder for raw trace
The decoder creation for raw trace uses metadata from the first CPU. On per-cpu mode, this metadata is incorrectly used for every decoder. On per-process/per-thread traces, the first CPU is CPU0. If CPU0 trace is not enabled, its metadata will be marked unused and the decoder is not created. Perf report dump skips the decoding part because the decoder is missing. To fix this, use metadata of the CPU associated with sample object. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: suzuki.poulose@arm.com Cc: mike.leach@linaro.org Cc: jonathanh@nvidia.com Cc: rwiley@nvidia.com Cc: treding@nvidia.com Cc: vsethi@nvidia.com Cc: ywan@nvidia.com Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: linux-tegra@vger.kernel.org Link: https://lore.kernel.org/r/20231010234803.5419-1-bwicaksono@nvidia.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
b84b3f4792 |
perf bpf_counter: Fix a few memory leaks
Memory leaks were detected by clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-20-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
1052545017 |
perf header: Fix various error path memory leaks
Memory leaks were detected by clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
97fe038374 |
perf trace-event-info: Avoid passing NULL value to closedir
If opendir failed then closedir was passed NULL which is erroneous. Caught by clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
7875c72c8b |
perf parse-events: Fix unlikely memory leak when cloning terms
Add missing free on an error path as detected by clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
63d471979e |
perf svghelper: Avoid memory leak
On success path the sib_core and sib_thr values weren't being freed. Detected by clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
52a5ad12f2 |
perf dlfilter: Be defensive against potential NULL dereference
In the unlikely case of having a symbol without a mapping, avoid a NULL dereference that clang-tidy warns about. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
85f73c377b |
perf mem-events: Avoid uninitialized read
pmu should be initialized to NULL before perf_pmus__scan loop. Fix and
shrink the scope of pmu at the same time. Issue detected by clang-tidy.
Fixes:
|
||
![]() |
b3aa09ee78 |
perf jitdump: Avoid memory leak
jit_repipe_unwinding_info is called in a loop by jit_process_dump, avoid leaking unwinding_data by free-ing before overwriting. Error detected by clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
e237213670 |
perf env: Remove unnecessary NULL tests
clang-tidy was warning: ``` util/env.c:334:23: warning: Access to field 'nr_pmu_mappings' results in a dereference of a null pointer (loaded from variable 'env') [clang-analyzer-core.NullDereference] env->nr_pmu_mappings = pmu_num; ``` As functions are called potentially when !env was true. This condition could never be true as it would produce a segv, so remove the unnecessary NULL tests and silence clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
b20576fd7f |
perf parse-events: Fix for term values that are raw events
Raw events can be strings like 'r0xead' but the 0x is optional so they
can also be 'read'. On IcelakeX uncore_imc_free_running has an event
called 'read' which may be programmed as:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
```
However, the PE_RAW type isn't allowed on the right of a term, even
though in this case we just want to interpret it as a string. This
leads to the following error on IcelakeX:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
event syntax error: '..nning/event=read/'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Fix this by allowing raw types on the right of terms and treat them as
strings, just as is already done for PE_LEGACY_CACHE. Make this
consistent by just entirely removing name_or_legacy and always using
name_or_raw that covers all three cases.
Fixes:
|
||
![]() |
29a2fd7c72 |
perf symbols: Add 'intel_idle_ibrs' to the list of idle symbols
This is a longstanding to do list entry: we need a way to see that a sample took place while in idle state, as the current way to do it is to infer that by the name of the functions that in such state have more samples, IOW: a hack. Maybe we can do flip a bit in samples that take place inside the enter/exit idle section in do_idle()? But till then, add one more :-\ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/ZR66Qgbcltt+zG7F@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
03ff4c6b3e |
perf parse-events: Avoid erange from hex numbers
We specify that a "num_hex" comprises 1 or more digits, however, that allows strtoull to fail with ERANGE. Limit the number of hex digits to being between 1 and 16. Before: ``` $ perf stat -e 'cpu/rE7574c47490475745/' true perf: util/parse-events.c:215: fix_raw: Assertion `errno == 0' failed. Aborted (core dumped) ``` After: ``` $ perf stat -e 'cpu/rE7574c47490475745/' true event syntax error: 'cpu/rE7574c47490475745/' \___ Bad event or PMU Unable to find PMU or event on a PMU of 'cpu' Initial error: event syntax error: 'cpu/rE7574c47490475745/' \___ unknown term 'rE7574c47490475745' for pmu 'cpu' valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,config3,name,period,percore,metric-id Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ``` Issue found through fuzz testing. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230907210533.3712979-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
87cd3d4819 |
perf tools fixes for v6.6: 1st batch
Build: - Update header files in the tools/**/include directory to sync with the kernel sources as usual. - Remove unused bpf-prologue files. While it's not strictly a fix, but the functionality was removed in this cycle so better to get rid of the code together. - Other minor build fixes. Misc: - Fix uninitialized memory access in PMU parsing code - Fix segfaults on software event Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZRIFKAAKCRCMstVUGiXM g/pXAP9HLB2s+beBTK5iQU4/NfqmAVSl303QCoR9xLByo38vfAEAlLiRIh061pTi PRlXVuY9bUQPyCSYsiBHv/fmLqdQdwU= =ti6G -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' into perf-tools-next To pick up the 'perf bench sched-seccomp-notify' changes to allow us to continue build testing perf-tools-next with the set of distro containers, where some older ones don't have a recent enough seccomp.h UAPI header that contains defines needed by this new 'perf bench' workload. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6be5d82862 |
tools/perf: Add "is_kmod" to struct dso to check if it is kernel module
Update "struct dso" to include new member "is_kmod". This new field will determine if the file is a kernel module or not. To resolve the address from a sample, perf looks at the DSO maps. In case of address from a kernel module, there were some address found to be not resolved. This was observed while running perf test for "Object code reading". Though the ip falls beteen the start address of the loaded module (perf map->start ) and end address ( perf map->end), it was unresolved. This was happening because in some cases for kernel modules, address from sample points to stub instructions. To identify if the DSO is a kernel module, the new field "is_kmod" is added to "struct dso". Reported-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230928075213.84392-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
26a5262d30 |
tools/perf: Add text_end to "struct dso" to save .text section size
Update "struct dso" to include new member "text_end". This new field will represent the offset for end of text section for a dso. For elf, this value is derived as: sh_size (Size of section in byes) + sh_offset (Section file offst) of the elf header for text. For bfd, this value is derived as: 1. For PE file, section->size + ( section->vma - dso->text_offset) 2. Other cases: section->filepos (file position) + section->size (size of section) To resolve the address from a sample, perf looks at the DSO maps. In case of address from a kernel module, there were some address found to be not resolved. This was observed while running perf test for "Object code reading". Though the ip falls beteen the start address of the loaded module (perf map->start ) and end address ( perf map->end), it was unresolved. Example: Reading object code for memory address: 0xc008000007f0142c File is: /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko On file address is: 0x1114cc Objdump command is: objdump -z -d --start-address=0x11142c --stop-address=0x1114ac /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko objdump read too few bytes: 128 test child finished with -1 Here, module is loaded at: # cat /proc/modules | grep xfs xfs 2228224 3 - Live 0xc008000007d00000 From objdump for xfs module, text section is: text 0010f7bc 0000000000000000 0000000000000000 000000a0 2**4 Here the offset for 0xc008000007f0142c ie 0x112074 falls out .text section which is up to 0x10f7bc. In this case for module, the address 0xc008000007e11fd4 is pointing to stub instructions. This address range represents the module stubs which is allocated on module load and hence is not part of DSO offset. To identify such address, which falls out of text section and within module end, added the new field "text_end" to "struct dso". Reported-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230928075213.84392-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
be7a4caa7c |
perf hisi-ptt: Fix memory leak in lseek failure handling
In the previous code, there was a memory leak issue where the previously allocated memory was not freed upon a failed lseek operation. This patch addresses the problem by releasing the old memory before returning -errno in case of a lseek failure. This ensures that memory is properly managed and avoids potential memory leaks. Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: yangyicong@hisilicon.com Cc: jonathan.cameron@huawei.com Link: https://lore.kernel.org/r/20230930072719.1267784-1-visitorckw@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
f2d87895cb |
perf intel-pt: Fix async branch flags
Ensure PERF_IP_FLAG_ASYNC is set always for asynchronous branches (i.e.
interrupts etc).
Fixes:
|
||
![]() |
7a48b58eb5 |
perf dlfilter: Fix use of addr_location__exit() in dlfilter__object_code()
Stop calling addr_location__exit() when addr_location__init() was not
called.
Fixes:
|
||
![]() |
b1f05622fe |
perf pmus: Make PMU alias name loading lazy
PMU alias names were computed when the first perf_pmu is created, scanning all PMUs in event sources for a file called alias that generally doesn't exist. Switch to trying to load the file when all PMU related files are loaded in lookup. This would cause a PMU name lookup of an alias name to fail if no PMUs were loaded, so in that case all PMUs are loaded and the find repeated. The overhead is similar but in the (very) general case not all PMUs are scanned for the alias file. As the overhead occurs once per invocation it doesn't show in perf bench internals pmu-scan. On a tigerlake machine, the number of openat system calls for an event of cpu/cycles/ with perf stat reduces from 94 to 69 (ie 25 fewer openat calls). Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230925062323.840799-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
54409997d4 |
perf metric: "Compat" supports regular expression matching identifiers
The jevent "Compat" is used for uncore PMU alias or metric definitions. The same PMU driver has different PMU identifiers due to different hardware versions and types, but they may have some common PMU metric. Since a Compat value can only match one identifier, when adding the same metric to PMUs with different identifiers, each identifier needs to be defined once, which is not streamlined enough. So let "Compat" support using regular expression to match multiple identifiers for uncore PMU metric. Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Zhuo Song <zhuo.song@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/1695794391-34817-3-git-send-email-renyu.zj@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
2879ff36f5 |
perf pmu: "Compat" supports regular expression matching identifiers
The jevent "Compat" is used for uncore PMU alias or metric definitions. The same PMU driver has different PMU identifiers due to different hardware versions and types, but they may have some common PMU event. Since a Compat value can only match one identifier, when adding the same event alias to PMUs with different identifiers, each identifier needs to be defined once, which is not streamlined enough. So let "Compat" support using regular expression to match identifiers for uncore PMU alias. For example, if the "Compat" value is set to "43401|43c01", it would be able to match PMU identifiers such as "43401" or "43c01", which correspond to CMN600_r0p0 or CMN700_r0p0. Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Zhuo Song <zhuo.song@linux.alibaba.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/1695794391-34817-2-git-send-email-renyu.zj@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
0e501a65d3 |
perf record: Fix BTF type checks in the off-cpu profiling
The BTF func proto for a tracepoint has one more argument than the
actual tracepoint function since it has a context argument at the
begining. So it should compare to 5 when the tracepoint has 4
arguments.
typedef void (*btf_trace_sched_switch)(void *, bool, struct task_struct *, struct task_struct *, unsigned int);
Also, recent change in the perf tool would use a hand-written minimal
vmlinux.h to generate BTF in the skeleton. So it won't have the info
of the tracepoint. Anyway it should use the kernel's vmlinux BTF to
check the type in the kernel.
Fixes:
|
||
![]() |
f9cdeb58a9 |
perf evlist: Avoid frequency mode for the dummy event
Dummy events are created with an attribute where the period and freq
are zero. evsel__config will then see the uninitialized values and
initialize them in evsel__default_freq_period. As fequency mode is
used by default the dummy event would be set to use frequency
mode. However, this has no effect on the dummy event but does cause
unnecessary timers/interrupts. Avoid this overhead by setting the
period to 1 for dummy events.
evlist__add_aux_dummy calls evlist__add_dummy then sets freq=0 and
period=1. This isn't necessary after this change and so the setting is
removed.
From Stephane:
The dummy event is not counting anything. It is used to collect mmap
records and avoid a race condition during the synthesize mmap phase of
perf record. As such, it should not cause any overhead during active
profiling. Yet, it did. Because of a bug the dummy event was
programmed as a sampling event in frequency mode. Events in that mode
incur more kernel overheads because on timer tick, the kernel has to
look at the number of samples for each event and potentially adjust
the sampling period to achieve the desired frequency. The dummy event
was therefore adding a frequency event to task and ctx contexts we may
otherwise not have any, e.g.,
perf record -a -e cpu/event=0x3c,period=10000000/.
On each timer tick the perf_adjust_freq_unthr_context() is invoked and
if ctx->nr_freq is non-zero, then the kernel will loop over ALL the
events of the context looking for frequency mode ones. In doing, so it
locks the context, and enable/disable the PMU of each hw event. If all
the events of the context are in period mode, the kernel will have to
traverse the list for nothing incurring overhead. The overhead is
multiplied by a very large factor when this happens in a guest kernel.
There is no need for the dummy event to be in frequency mode, it does
not count anything and therefore should not cause extra overhead for
no reason.
Fixes:
|
||
![]() |
48a3adcf47 |
perf pmu: Fix perf stat output with correct scale and unit
The perf_pmu__parse_* functions for the sysfs files of pmu event’s scale, unit, per-pkg and snapshot were updated in commit |
||
![]() |
ede72dca45 |
perf parse-events: Fix tracepoint name memory leak
Fuzzing found that an invalid tracepoint name would create a memory
leak with an address sanitizer build:
```
$ perf stat -e '*:o/' true
event syntax error: '*:o/'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
=================================================================
==59380==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 2 object(s) allocated from:
#0 0x7f38ac07077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
#1 0x55f2f41be73b in str util/parse-events.l:49
#2 0x55f2f41d08e8 in parse_events_lex util/parse-events.l:338
#3 0x55f2f41dc3b1 in parse_events_parse util/parse-events-bison.c:1464
#4 0x55f2f410b8b3 in parse_events__scanner util/parse-events.c:1822
#5 0x55f2f410d1b9 in __parse_events util/parse-events.c:2094
#6 0x55f2f410e57f in parse_events_option util/parse-events.c:2279
#7 0x55f2f4427b56 in get_value tools/lib/subcmd/parse-options.c:251
#8 0x55f2f4428d98 in parse_short_opt tools/lib/subcmd/parse-options.c:351
#9 0x55f2f4429d80 in parse_options_step tools/lib/subcmd/parse-options.c:539
#10 0x55f2f442acb9 in parse_options_subcommand tools/lib/subcmd/parse-options.c:654
#11 0x55f2f3ec99fc in cmd_stat tools/perf/builtin-stat.c:2501
#12 0x55f2f4093289 in run_builtin tools/perf/perf.c:322
#13 0x55f2f40937f5 in handle_internal_command tools/perf/perf.c:375
#14 0x55f2f4093bbd in run_argv tools/perf/perf.c:419
#15 0x55f2f409412b in main tools/perf/perf.c:535
SUMMARY: AddressSanitizer: 4 byte(s) leaked in 2 allocation(s).
```
Fix by adding the missing destructor.
Fixes:
|
||
![]() |
3ecf87b2d8 |
perf kwork top: Simplify bool conversion
./tools/perf/util/bpf_kwork_top.c:120:53-58: WARNING: conversion to bool not needed here Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230915063832.120274-1-yang.lee@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
eaaebb01a7 |
perf pmu: Ensure all alias variables are initialized
Fix an error detected by memory sanitizer:
```
==4033==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6
#1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2
#2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9
#3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32
#4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6
#5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8
#6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8
#7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9
#8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8
#9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15
#10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9
#11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8
#12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5
#13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9
#14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11
#15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8
#16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2
#17 0x55fb0f941dab in main tools/perf/perf.c:535:3
```
Fixes:
|
||
![]() |
33b725ce7b |
perf trace: Avoid compile error wrt redefining bool
Make part of an existing TODO conditional to avoid the following build error: ``` tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:14: error: cannot combine with previous 'char' declaration specifier 26 | typedef char bool; | ^ include/stdbool.h:20:14: note: expanded from macro 'bool' 20 | #define bool _Bool | ^ tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:1: error: typedef requires a name [-Werror,-Wmissing-declarations] 26 | typedef char bool; | ^~~~~~~~~~~~~~~~~ 2 errors generated. ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230913184957.230076-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
4a73fca226 |
perf bpf-prologue: Remove unused file
Commit |
||
![]() |
70360fad91 |
perf pmu: Remove unused function
pmu_events_table__find() is no longer used so remove it and its Arm specific version. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230913153355.138331-4-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
105e5b433e |
perf pmus: Simplify perf_pmus__find_core_pmu()
Currently the while loop always either exits on the first iteration with a core PMU, or exits with NULL on heterogeneous systems or when not all CPUs are online. Both of the latter behaviors are undesirable for platforms other than Arm so simplify it to always return the first core PMU, or NULL if none exist. This behavior was depended on by the Arm version of pmu_metrics_table__find(), so the logic has been moved there instead. Signed-off-by: James Clark <james.clark@arm.com> Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230913153355.138331-3-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
3d0f5f456a |
perf pmu: Move pmu__find_core_pmu() to pmus.c
pmu__find_core_pmu() more logically belongs in pmus.c because it iterates over all PMUs, so move it to pmus.c At the same time rename it to perf_pmus__find_core_pmu() to match the naming convention in this file. list_prepare_entry() can't be used in perf_pmus__scan_core() anymore now that it's called from the same compilation unit. This is with -O2 (specifically -O1 -ftree-vrp -finline-functions -finline-small-functions) which allow the bounds of the array access to be determined at compile time. list_prepare_entry() subtracts the offset of the 'list' member in struct perf_pmu from &core_pmus, which isn't a struct perf_pmu. The compiler sees that pmu results in &core_pmus - 8 and refuses to compile. At runtime this works because list_for_each_entry_continue() always adds the offset back again before dereferencing ->next, but it's technically undefined behavior. With -fsanitize=undefined an additional warning is generated. Using list_first_entry_or_null() to get the first entry here avoids doing &core_pmus - 8 but has the same result and fixes both the compile warning and the undefined behavior warning. There are other uses of list_prepare_entry() in pmus.c, but the compiler doesn't seem to be able to see that they can also be called with &core_pmus, so I won't change any at this time. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230913153355.138331-2-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
21ce931e55 |
perf symbol: Avoid an undefined behavior warning
The node (nd) may be NULL and pointer arithmetic on NULL is undefined behavior. Move the computation of next below the NULL check on the node. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20230914044233.1550195-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
999b81b907 |
perf bpf-filter: Add YYDEBUG
YYDEBUG enables line numbers and other error helpers in the generated bpf-filter-bison.c. Conditionally enabled only for debug builds. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f0f4cd1003 |
perf pmu: Add YYDEBUG
YYDEBUG enables line numbers and other error helpers in the generated pmu-bison.c. Conditionally enabled only for debug builds. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
1344a7077d |
perf expr: Make YYDEBUG dependent on doing a debug build
YYDEBUG enables line numbers and other error helpers in the generated expr-bison.c. These shouldn't be generated when debugging isn't enabled. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d4ce60190e |
perf parse-events: Make YYDEBUG dependent on doing a debug build
YYDEBUG enables line numbers and other error helpers in the generated parse-events-bison.c. These shouldn't be generated when debugging isn't enabled. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
dc2cfef9a9 |
perf parse-events: Remove unused header files
The fnmatch header is now used in the PMU matching logic in pmu.c. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
8a55c1e2c9 |
perf util: Add a function for replacing characters in a string
It finds all occurrences of a single character and replaces them with a multi character string. This will be used in a test in a following commit. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230904095104.1162928-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6bd8c2ea6b |
perf list pfm: Retry supported test with exclude_kernel
With paranoia set at 2 evsel__open will fail with EACCES for non-root users. To avoid this stopping libpfm4 events from being printed, retry with exclude_kernel enabled - copying the regular is_event_supported test. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230906234416.3472339-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4f19fc1839 |
perf list: Avoid a hardcoded cpu PMU name
Use the first core PMU instead. On a Raspberry Pi, before: $ perf list ... cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] [(see 'man perf-list' on how to encode it)] ... After: $ perf list ... armv8_cortex_a72/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] [(see 'man perf-list' on how to encode it)] ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230906234416.3472339-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4fd06bd2dc |
perf lock contention: Add -G/--cgroup-filter option
The -G/--cgroup-filter is to limit lock contention collection on the tasks in the specific cgroups only. $ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \ ./perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.174 [sec] contended total wait max wait avg wait pid comm 4 114.45 us 60.06 us 28.61 us 214847 sched-messaging 2 111.40 us 60.84 us 55.70 us 214848 sched-messaging 2 106.09 us 59.42 us 53.04 us 214837 sched-messaging 1 81.70 us 81.70 us 81.70 us 214709 sched-messaging 68 78.44 us 6.83 us 1.15 us 214633 sched-messaging 69 73.71 us 2.69 us 1.07 us 214632 sched-messaging 4 72.62 us 60.83 us 18.15 us 214850 sched-messaging 2 71.75 us 67.60 us 35.88 us 214840 sched-messaging 2 69.29 us 67.53 us 34.65 us 214804 sched-messaging 2 69.00 us 68.23 us 34.50 us 214826 sched-messaging ... Export cgroup__new() function as it's needed from outside. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4d1792d0a2 |
perf lock contention: Add --lock-cgroup option
The --lock-cgroup option shows lock contention stats break down by cgroups. Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field. $ sudo ./perf lock con -ab --lock-cgroup sleep 1 contended total wait max wait avg wait cgroup 8 15.70 us 6.34 us 1.96 us / 2 1.48 us 747 ns 738 ns /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope 1 848 ns 848 ns 848 ns /user.slice/.../session.slice/org.gnome.Shell@x11.service 1 220 ns 220 ns 220 ns /user.slice/.../session.slice/pipewire-pulse.service For now, the cgroup mode only works with BPF (-b). Committer notes: Remove -g as it is used in the other tools with a clear meaning of collect/show callchains. As agreed with Namhyung off list. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d0c502e46e |
perf lock contention: Prepare to handle cgroups
Save cgroup info and display cgroup names if requested. This is a preparation for the next patch. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
2bc12abce8 |
perf tools: Add read_all_cgroups() and __cgroup_find()
The read_all_cgroups() is to build a tree of cgroups in the system and users can look up a cgroup using __cgroup_find(). Committer notes: Had to do this to cover that #else block: -static inline u64 __read_cgroup_id(const char *path) { return -1ULL; } +static inline u64 __read_cgroup_id(const char *path __maybe_unused) { return -1ULL; } Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
36019dff30 |
perf kwork top: Add BPF-based statistics on softirq event support
Use BPF to collect statistics on softirq events based on perf BPF skeletons. Example usage: # perf kwork top -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 135445.704 ms, 8 cpus %Cpu(s): 28.35% id, 0.00% hi, 0.25% si %Cpu0 [|||||||||||||||||||| 69.85%] %Cpu1 [|||||||||||||||||||||| 74.10%] %Cpu2 [||||||||||||||||||||| 71.18%] %Cpu3 [|||||||||||||||||||| 69.61%] %Cpu4 [|||||||||||||||||||||| 74.05%] %Cpu5 [|||||||||||||||||||| 69.33%] %Cpu6 [|||||||||||||||||||| 69.71%] %Cpu7 [|||||||||||||||||||||| 73.77%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 30.43 5271.005 ms [swapper/5] 0 0 30.17 5226.644 ms [swapper/3] 0 0 30.08 5210.257 ms [swapper/6] 0 0 29.89 5177.177 ms [swapper/0] 0 0 28.51 4938.672 ms [swapper/2] 0 0 25.93 4223.464 ms [swapper/7] 0 0 25.69 4181.411 ms [swapper/4] 0 0 25.63 4173.804 ms [swapper/1] 16665 16265 2.16 360.600 ms sched-messaging 16537 16265 2.05 356.275 ms sched-messaging 16503 16265 2.01 343.063 ms sched-messaging 16424 16265 1.97 336.876 ms sched-messaging 16580 16265 1.94 323.658 ms sched-messaging 16515 16265 1.92 321.616 ms sched-messaging 16659 16265 1.91 325.538 ms sched-messaging 16634 16265 1.88 327.766 ms sched-messaging 16454 16265 1.87 326.843 ms sched-messaging 16382 16265 1.87 322.591 ms sched-messaging 16642 16265 1.86 320.506 ms sched-messaging 16582 16265 1.86 320.164 ms sched-messaging 16315 16265 1.86 326.872 ms sched-messaging 16637 16265 1.85 323.766 ms sched-messaging 16506 16265 1.82 311.688 ms sched-messaging 16512 16265 1.81 304.643 ms sched-messaging 16560 16265 1.80 314.751 ms sched-messaging 16320 16265 1.80 313.405 ms sched-messaging 16442 16265 1.80 314.403 ms sched-messaging 16626 16265 1.78 295.380 ms sched-messaging 16600 16265 1.77 309.444 ms sched-messaging 16550 16265 1.76 301.161 ms sched-messaging 16525 16265 1.75 296.560 ms sched-messaging 16314 16265 1.75 298.338 ms sched-messaging 16595 16265 1.74 304.390 ms sched-messaging 16555 16265 1.74 287.564 ms sched-messaging 16520 16265 1.74 295.734 ms sched-messaging 16507 16265 1.73 293.956 ms sched-messaging 16593 16265 1.72 296.443 ms sched-messaging 16531 16265 1.72 299.950 ms sched-messaging 16281 16265 1.72 301.339 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-17-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d2956b3acf |
perf kwork top: Add BPF-based statistics on hardirq event support
Use BPF to collect statistics on hardirq events based on perf BPF skeletons. Example usage: # perf kwork top -k sched,irq -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 136717.945 ms, 8 cpus %Cpu(s): 17.10% id, 0.01% hi, 0.00% si %Cpu0 [||||||||||||||||||||||||| 84.26%] %Cpu1 [||||||||||||||||||||||||| 84.77%] %Cpu2 [|||||||||||||||||||||||| 83.22%] %Cpu3 [|||||||||||||||||||||||| 80.37%] %Cpu4 [|||||||||||||||||||||||| 81.49%] %Cpu5 [||||||||||||||||||||||||| 84.68%] %Cpu6 [||||||||||||||||||||||||| 84.48%] %Cpu7 [|||||||||||||||||||||||| 80.21%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 19.78 3482.833 ms [swapper/7] 0 0 19.62 3454.219 ms [swapper/3] 0 0 18.50 3258.339 ms [swapper/4] 0 0 16.76 2842.749 ms [swapper/2] 0 0 15.71 2627.905 ms [swapper/0] 0 0 15.51 2598.206 ms [swapper/6] 0 0 15.31 2561.820 ms [swapper/5] 0 0 15.22 2548.708 ms [swapper/1] 13253 13018 2.95 513.108 ms sched-messaging 13092 13018 2.67 454.167 ms sched-messaging 13401 13018 2.66 454.790 ms sched-messaging 13240 13018 2.64 454.587 ms sched-messaging 13251 13018 2.61 442.273 ms sched-messaging 13075 13018 2.61 438.932 ms sched-messaging 13220 13018 2.60 443.245 ms sched-messaging 13235 13018 2.59 443.268 ms sched-messaging 13222 13018 2.50 426.344 ms sched-messaging 13410 13018 2.49 426.191 ms sched-messaging 13228 13018 2.46 425.121 ms sched-messaging 13379 13018 2.38 409.950 ms sched-messaging 13236 13018 2.37 413.159 ms sched-messaging 13095 13018 2.36 396.572 ms sched-messaging 13325 13018 2.35 408.089 ms sched-messaging 13242 13018 2.32 394.750 ms sched-messaging 13386 13018 2.31 396.997 ms sched-messaging 13046 13018 2.29 383.833 ms sched-messaging 13109 13018 2.28 388.482 ms sched-messaging 13388 13018 2.28 393.576 ms sched-messaging 13238 13018 2.26 388.487 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-16-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
8c98420987 |
perf kwork top: Implements BPF-based cpu usage statistics
Use BPF to collect statistics on the CPU usage based on perf BPF skeletons. Example usage: # perf kwork top -h Usage: perf kwork top [<options>] -b, --use-bpf Use BPF to measure task cpu usage -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid --time <str> Time span for analysis (start,stop) # # perf kwork -k sched top -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> # perf kwork -k sched top -b -n perf Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 151563.332 ms, 8 cpus %Cpu(s): 26.49% id, 0.00% hi, 0.00% si %Cpu0 [ 0.01%] %Cpu1 [ 0.00%] %Cpu2 [ 0.00%] %Cpu3 [ 0.00%] %Cpu4 [ 0.00%] %Cpu5 [ 0.00%] %Cpu6 [ 0.00%] %Cpu7 [ 0.00%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 9754 9754 0.01 2.303 ms perf # # perf kwork -k sched top -b -C 2,3,4 Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 48016.721 ms, 3 cpus %Cpu(s): 27.82% id, 0.00% hi, 0.00% si %Cpu2 [|||||||||||||||||||||| 74.68%] %Cpu3 [||||||||||||||||||||| 71.06%] %Cpu4 [||||||||||||||||||||| 70.91%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 29.08 4734.998 ms [swapper/4] 0 0 28.93 4710.029 ms [swapper/3] 0 0 25.31 3912.363 ms [swapper/2] 10248 10158 1.62 264.931 ms sched-messaging 10253 10158 1.62 265.136 ms sched-messaging 10158 10158 1.60 263.013 ms bash 10360 10158 1.49 243.639 ms sched-messaging 10413 10158 1.48 238.604 ms sched-messaging 10531 10158 1.47 234.067 ms sched-messaging 10400 10158 1.47 240.631 ms sched-messaging 10355 10158 1.47 230.586 ms sched-messaging 10377 10158 1.43 234.835 ms sched-messaging 10526 10158 1.42 232.045 ms sched-messaging 10298 10158 1.41 222.396 ms sched-messaging 10410 10158 1.38 221.853 ms sched-messaging 10364 10158 1.38 226.042 ms sched-messaging 10480 10158 1.36 213.633 ms sched-messaging 10370 10158 1.36 223.620 ms sched-messaging 10553 10158 1.34 217.169 ms sched-messaging 10291 10158 1.34 211.516 ms sched-messaging 10251 10158 1.34 218.813 ms sched-messaging 10522 10158 1.33 218.498 ms sched-messaging 10288 10158 1.33 216.787 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-15-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e29090d28c |
perf kwork top: Add statistics on softirq event support
Calculate the runtime of the softirq events and subtract it from the corresponding task runtime to improve the precision. Example usage: # perf kwork -k sched,irq,softirq record -- perf record -e cpu-clock -o perf_record.data -a sleep 10 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.467 MB perf_record.data (7154 samples) ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.152 MB perf.data (22846 samples) ] # perf kwork top Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.02%] %Cpu1 [ 0.01%] %Cpu2 [| 4.61%] %Cpu3 [ 0.04%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] %Cpu6 [ 0.51%] %Cpu7 [||| 11.42%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.98 17073.515 ms swapper/4 0 99.98 17072.173 ms swapper/1 0 99.93 17064.229 ms swapper/3 0 99.62 17011.013 ms swapper/0 0 99.47 16985.180 ms swapper/6 0 95.17 16250.874 ms swapper/2 0 88.51 15111.684 ms swapper/7 0 82.62 14108.577 ms swapper/5 4342 33.00 5644.045 ms perf 4344 0.43 74.351 ms perf 16 0.13 22.296 ms rcu_preempt 4345 0.05 10.093 ms perf 4343 0.05 8.769 ms perf 4341 0.02 4.882 ms perf 4095 0.02 4.605 ms kworker/7:1 75 0.02 4.261 ms kworker/2:1 120 0.01 1.909 ms systemd-journal 98 0.01 2.540 ms jbd2/sda-8 61 0.01 3.404 ms kcompactd0 667 0.01 2.542 ms kworker/u16:2 4340 0.00 1.052 ms kworker/7:2 97 0.00 0.489 ms kworker/7:1H 51 0.00 0.209 ms ksoftirqd/7 50 0.00 0.646 ms migration/7 76 0.00 0.753 ms kworker/6:1 45 0.00 0.572 ms migration/6 87 0.00 0.145 ms kworker/5:1H 73 0.00 0.596 ms kworker/5:1 41 0.00 0.041 ms ksoftirqd/5 40 0.00 0.718 ms migration/5 64 0.00 0.115 ms kworker/4:1 35 0.00 0.556 ms migration/4 353 0.00 2.600 ms sshd 74 0.00 0.205 ms kworker/3:1 33 0.00 1.576 ms kworker/3:0H 30 0.00 0.996 ms migration/3 26 0.00 1.665 ms ksoftirqd/2 25 0.00 0.662 ms migration/2 397 0.00 0.057 ms kworker/1:1 20 0.00 1.005 ms migration/1 2909 0.00 1.053 ms kworker/0:2 17 0.00 0.720 ms migration/0 15 0.00 0.039 ms ksoftirqd/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-13-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
2f21f5e4b4 |
perf kwork top: Add statistics on hardirq event support
Calculate the runtime of the hardirq events and subtract it from the corresponding task runtime to improve the precision. Example usage: # perf kwork -k sched,irq record -- perf record -o perf_record.data -a sleep 10 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 1.054 MB perf_record.data (18019 samples) ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.798 MB perf.data (16334 samples) ] # # perf kwork top Total : 139240.869 ms, 8 cpus %Cpu(s): 94.91% id, 0.05% hi %Cpu0 [ 0.05%] %Cpu1 [| 5.00%] %Cpu2 [ 0.43%] %Cpu3 [ 0.57%] %Cpu4 [ 1.19%] %Cpu5 [|||||| 20.46%] %Cpu6 [ 0.48%] %Cpu7 [||| 12.10%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.54 17325.622 ms swapper/2 0 99.54 17327.527 ms swapper/0 0 99.51 17319.909 ms swapper/6 0 99.42 17304.934 ms swapper/3 0 98.80 17197.385 ms swapper/4 0 94.99 16534.991 ms swapper/1 0 87.89 15295.264 ms swapper/7 0 79.53 13843.182 ms swapper/5 4252 36.50 6361.768 ms perf 4256 1.17 205.215 ms bash 151 0.53 93.298 ms systemd-resolve 4254 0.39 69.468 ms perf 423 0.34 59.368 ms bash 412 0.29 51.204 ms sshd 249 0.20 35.288 ms sd-resolve 16 0.17 30.287 ms rcu_preempt 153 0.09 17.266 ms systemd-timesyn 1 0.09 17.078 ms systemd 4253 0.07 12.457 ms perf 4255 0.06 11.559 ms perf 4234 0.03 6.105 ms kworker/u16:1 69 0.03 6.259 ms kworker/1:1H 4251 0.02 4.615 ms perf 4095 0.02 4.890 ms kworker/7:1 61 0.02 4.005 ms kcompactd0 75 0.02 3.546 ms kworker/2:1 97 0.01 3.106 ms kworker/7:1H 98 0.01 1.995 ms jbd2/sda-8 4088 0.01 1.779 ms kworker/u16:3 2909 0.01 1.795 ms kworker/0:2 4246 0.00 1.117 ms kworker/7:2 51 0.00 0.327 ms ksoftirqd/7 50 0.00 0.369 ms migration/7 102 0.00 0.160 ms kworker/6:1H 76 0.00 0.609 ms kworker/6:1 45 0.00 0.779 ms migration/6 87 0.00 0.504 ms kworker/5:1H 73 0.00 1.130 ms kworker/5:1 41 0.00 0.152 ms ksoftirqd/5 40 0.00 0.702 ms migration/5 64 0.00 0.316 ms kworker/4:1 35 0.00 0.791 ms migration/4 353 0.00 2.211 ms sshd 74 0.00 0.272 ms kworker/3:1 30 0.00 0.819 ms migration/3 25 0.00 0.784 ms migration/2 397 0.00 0.539 ms kworker/1:1 21 0.00 1.600 ms ksoftirqd/1 20 0.00 0.773 ms migration/1 17 0.00 1.682 ms migration/0 15 0.00 0.076 ms ksoftirqd/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-12-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a8792242e4 |
perf evsel: Add evsel__intval_common() helper
Add evsel__intval_common() helper to search for common_field in tracepoint format. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-11-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
55c40e5052 |
perf kwork top: Introduce new top utility
Some common tools for collecting statistics on CPU usage, such as top, obtain statistics from timer interrupt sampling, and then periodically read statistics from /proc/stat. This method has some deviations: 1. In the tick interrupt, the time between the last tick and the current tick is counted in the current task. However, the task may be running only part of the time. 2. For each task, the top tool periodically reads the /proc/{PID}/status information. For tasks with a short life cycle, it may be missed. In conclusion, the top tool cannot accurately collect statistics on the CPU usage and running time of tasks. The statistical method based on sched_switch tracepoint can accurately calculate the CPU usage of all tasks. This method is applicable to scenarios where performance comparison data is of high precision. Example usage: # perf kwork Usage: perf kwork [<options>] {record|report|latency|timehist|top} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, sched, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork -k sched record -- perf bench sched messaging -g 1 -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1 groups == 40 processes run Total time: 14.074 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 15.886 MB perf.data (129472 samples) ] # perf kwork top Total : 115708.178 ms, 8 cpus %Cpu(s): 9.78% id %Cpu0 [||||||||||||||||||||||||||| 90.55%] %Cpu1 [||||||||||||||||||||||||||| 90.51%] %Cpu2 [|||||||||||||||||||||||||| 88.57%] %Cpu3 [||||||||||||||||||||||||||| 91.18%] %Cpu4 [||||||||||||||||||||||||||| 91.09%] %Cpu5 [||||||||||||||||||||||||||| 90.88%] %Cpu6 [|||||||||||||||||||||||||| 88.64%] %Cpu7 [||||||||||||||||||||||||||| 90.28%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 4113 22.23 3221.547 ms sched-messaging 4105 21.61 3131.495 ms sched-messaging 4119 21.53 3120.937 ms sched-messaging 4103 21.39 3101.614 ms sched-messaging 4106 21.37 3095.209 ms sched-messaging 4104 21.25 3077.269 ms sched-messaging 4115 21.21 3073.188 ms sched-messaging 4109 21.18 3069.022 ms sched-messaging 4111 20.78 3010.033 ms sched-messaging 4114 20.74 3007.073 ms sched-messaging 4108 20.73 3002.137 ms sched-messaging 4107 20.47 2967.292 ms sched-messaging 4117 20.39 2955.335 ms sched-messaging 4112 20.34 2947.080 ms sched-messaging 4118 20.32 2942.519 ms sched-messaging 4121 20.23 2929.865 ms sched-messaging 4110 20.22 2930.078 ms sched-messaging 4122 20.15 2919.542 ms sched-messaging 4120 19.77 2866.032 ms sched-messaging 4116 19.72 2857.660 ms sched-messaging 4127 16.19 2346.334 ms sched-messaging 4142 15.86 2297.600 ms sched-messaging 4141 15.62 2262.646 ms sched-messaging 4136 15.41 2231.408 ms sched-messaging 4130 15.38 2227.008 ms sched-messaging 4129 15.31 2217.692 ms sched-messaging 4126 15.21 2201.711 ms sched-messaging 4139 15.19 2200.722 ms sched-messaging 4137 15.10 2188.633 ms sched-messaging 4134 15.06 2182.082 ms sched-messaging 4132 15.02 2177.530 ms sched-messaging 4131 14.73 2131.973 ms sched-messaging 4125 14.68 2125.439 ms sched-messaging 4128 14.66 2122.255 ms sched-messaging 4123 14.65 2122.113 ms sched-messaging 4135 14.56 2107.144 ms sched-messaging 4133 14.51 2103.549 ms sched-messaging 4124 14.27 2066.671 ms sched-messaging 4140 14.17 2052.251 ms sched-messaging 4138 13.81 2000.361 ms sched-messaging 0 11.42 1652.009 ms swapper/2 0 11.35 1641.694 ms swapper/6 0 9.71 1405.108 ms swapper/7 0 9.48 1372.338 ms swapper/1 0 9.44 1366.013 ms swapper/0 0 9.11 1318.382 ms swapper/5 0 8.90 1287.582 ms swapper/4 0 8.81 1274.356 ms swapper/3 4100 2.61 379.328 ms perf 4101 1.16 169.487 ms perf-exec 151 0.65 94.741 ms systemd-resolve 249 0.36 53.030 ms sd-resolve 153 0.14 21.405 ms systemd-timesyn 1 0.10 16.200 ms systemd 16 0.09 15.785 ms rcu_preempt 4102 0.06 9.727 ms perf 4095 0.03 5.464 ms kworker/7:1 98 0.02 3.231 ms jbd2/sda-8 353 0.02 4.115 ms sshd 75 0.02 3.889 ms kworker/2:1 73 0.01 1.552 ms kworker/5:1 64 0.01 1.591 ms kworker/4:1 74 0.01 1.952 ms kworker/3:1 61 0.01 2.608 ms kcompactd0 397 0.01 1.602 ms kworker/1:1 69 0.01 1.817 ms kworker/1:1H 10 0.01 2.553 ms kworker/u16:0 2909 0.01 2.684 ms kworker/0:2 1211 0.00 0.426 ms kworker/7:0 97 0.00 0.153 ms kworker/7:1H 51 0.00 0.100 ms ksoftirqd/7 120 0.00 0.856 ms systemd-journal 76 0.00 1.414 ms kworker/6:1 46 0.00 0.246 ms ksoftirqd/6 45 0.00 0.164 ms migration/6 41 0.00 0.098 ms ksoftirqd/5 40 0.00 0.207 ms migration/5 86 0.00 1.339 ms kworker/4:1H 36 0.00 0.252 ms ksoftirqd/4 35 0.00 0.090 ms migration/4 31 0.00 0.156 ms ksoftirqd/3 30 0.00 0.073 ms migration/3 26 0.00 0.180 ms ksoftirqd/2 25 0.00 0.085 ms migration/2 21 0.00 0.106 ms ksoftirqd/1 20 0.00 0.118 ms migration/1 302 0.00 1.440 ms systemd-logind 17 0.00 0.132 ms migration/0 15 0.00 0.255 ms ksoftirqd/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-10-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
38d8d013a5 |
perf kwork: Add sched record support
The kwork_class type of sched is added to support recording and parsing of sched_switch events. As follows: # perf kwork -h Usage: perf kwork [<options>] {record|report|latency|timehist} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, sched, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork -k sched record true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.083 MB perf.data (47 samples) ] # perf evlist sched:sched_switch dummy:HG # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-8-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
95064b3352 |
perf kwork: Add kwork and src_type to work_init() for 'struct kwork_class'
To support different types of reports, two parameters `struct perf_kwork * kwork` and `enum kwork_trace_type src_type` are added to work_init() of struct kwork_class for initialization in different scenarios. No functional change intended. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-5-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9c95e4ef06 |
perf evlist: Add evlist__findnew_tracking_event() helper
Currently, intel-bts, intel-pt, and arm-spe may add tracking event to the evlist. We may need to search for the tracking event for some settings. Therefore, add evlist__findnew_tracking_event() helper. If system_wide is true, evlist__findnew_tracking_event() set the cpu map of the evsel to all online CPUs. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230904023340.12707-3-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a6e414a4cb |
perf tools: Update copy of libbpf's hashmap.c
To pick the changes in:
|
||
![]() |
0d3f0e6f94 |
perf parse-events: Introduce 'struct parse_events_terms'
parse_events_terms() existed in function names but was passed a 'struct list_head'. As many parse_events functions take an evsel_config list as well as a parse_event_term list, and the naming head_terms and head_config is inconsistent, there's a potential to switch the lists and get errors. Introduce a 'struct parse_events_terms', that just wraps a list_head, to avoid this. Add the regular init/exit functions and transition the code to use them. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230901233949.2930562-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
727adeed06 |
perf parse-events: Copy fewer term lists
When trying to add events to multiple PMUs the term list is copied first as adding the event will rewrite the event's name term into the sysfs and/or json encoding terms (see perf_pmu__check_alias). Change the parse events add API so the passed in term list is const, then copy the list when modification is necessary. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230901233949.2930562-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4163644818 |
perf parse-events: Avoid enum casts
Add term_type to union of values returned by the lexer to avoid casts to and from an integer. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230901233949.2930562-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
8f91662ef8 |
perf parse-events: Tidy up str parameter
Add a const and rename str to event_name. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230901233949.2930562-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6fcfe54d2c |
perf parse-events: Remove unnecessary __maybe_unused
The parameter head_terms is always used in get_config_terms. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230901233949.2930562-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6066622c97 |
perf machine: Use true and false for bool variable
Fix the following coccicheck warnings: ./tools/perf/util/machine.c:2000:9-10: WARNING: return of 0/1 in function 'symbol__match_regex' with return type bool. Committer notes: Found this in the pile, it was already returning bool, but this patch simplifies it further, from 3 lines to just 1. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Suggested-by: David Laight <David.Laight@ACULAB.COM> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/1614247483-102665-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
535a265d7f |
perf tools changes for v6.6:
perf tools maintainership: - Add git information for perf-tools and perf-tools-next trees/branches to the MAINTAINERS file. That is where development now takes place and myself and Namhyung Kim have write access, more people to come as we emulate other maintainer groups. perf record: - Record kernel data maps when 'perf record --data' is used, so that global variables can be resolved and used in tools that do data profiling. perf trace: - Remove the old, experimental support for BPF events in which a .c file was passed as an event: "perf trace -e hello.c" to then get compiled and loaded. The only known usage for that, that shipped with the kernel as an example for such events, augmented the raw_syscalls tracepoints and was converted to a libbpf skeleton, reusing all the user space components and the BPF code connected to the syscalls. In the end just the way to glue the BPF part and the user space type beautifiers changed, now being performed by libbpf skeletons. The next step is to use BTF to do pretty printing of all syscall types, as discussed with Alan Maguire and others. Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all path/filenames/strings, some of the networking data structures, perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5 seconds: # perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5 0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 ? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0 10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ... ? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 ? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0 3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ... 3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0 Performance counter stats for 'sleep 5': 2,617,347 cycles 1,855,997 instructions # 0.71 insn per cycle 5.002282128 seconds time elapsed 0.000855000 seconds user 0.000852000 seconds sys # perf annotate: - Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1) for licensing reasons, and we missed a build test on tools/perf/tests makefile. Since we now default to NDEBUG=1, we ended up segfaulting when building with BUILD_NONDISTRO=1 because a needed initialization routine was being "error checked" via an assert. Fix it by explicitly checking the result and aborting instead if it fails. We better back propagate the error, but at least 'perf annotate' on samples collected for a BPF program is back working when perf is built with BUILD_NONDISTRO=1. perf report/top: - Add back TUI hierarchy mode header, that is seen when using 'perf report/top --hierarchy'. - Fix the number of entries for 'e' key in the TUI that was preventing navigation of lines when expanding an entry. perf report/script: - Support cross platform register handling, allowing a perf.data file collected on one architecture to have registers sampled correctly displayed when analysis tools such as 'perf report' and 'perf script' are used on a different architecture. - Fix handling of event attributes in pipe mode, i.e. when one uses: perf record -o - | perf report -i - When no perf.data files are used. - Handle files generated via pipe mode with a version of perf and then read also via pipe mode with a different version of perf, where the event attr record may have changed, use the record size field to properly support this version mismatch. perf probe: - Accessing global variables from uprobes isn't supported, make the error message state that instead of stating that some minimal kernel version is needed to have that feature. This seems just a tool limitation, the kernel probably has all that is needed. perf tests: - Fix a reference count related leak in the dlfilter v0 API where the result of a thread__find_symbol_fb() is not matched with an addr_location__exit() to drop the reference counts of the resolved components (machine, thread, map, symbol, etc). Add a dlfilter test to make sure that doesn't regresses. - Lots of fixes for the 'perf test' written in shell script related to problems found with the shellcheck utility. - Fixes for 'perf test' shell scripts testing features enabled when perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf counters. - Add perf record sample filtering test, things like the following example, that gets implemented as a BPF filter attached to the event: # perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000' - Improve the way the task_analyzer test checks if libtraceevent is linked, using 'perf version --build-options' instead of the more expensinve 'perf record -e "sched:sched_switch"'. - Add support for riscv in the mmap-basic test. (This went as well via the RiscV tree, same contents). libperf: - Implement riscv mmap support (This went as well via the RiscV tree, same contents). perf script: - New tool that converts perf.data files to the firefox profiler format so that one can use the visualizer at https://profiler.firefox.com/. Done by Anup Sharma as part of this year's Google Summer of Code. One can generate the output and upload it to the web interface but Anup also automated everything: perf script gecko -F 99 -a sleep 60 - Support syscall name parsing on arm64. - Print "cgroup" field on the same line as "comm". perf bench: - Add new 'uprobe' benchmark to measure the overhead of uprobes with/without BPF programs attached to it. - breakpoints are not available on power9, skip that test. perf stat: - Add #num_cpus_online literal to be used in 'perf stat' metrics, and add this extra 'perf test' check that exemplifies its purpose: TEST_ASSERT_VAL("#num_cpus_online", expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0); TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0); TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online); Miscellaneous: - Improve tool startup time by lazily reading PMU, JSON, sysfs data. - Improve error reporting in the parsing of events, passing YYLTYPE to error routines, so that the output can show were the parsing error was found. - Add 'perf test' entries to check the parsing of events improvements. - Fix various leak for things detected by -fsanitize=address, mostly things that would be freed at tool exit, including: - Free evsel->filter on the destructor. - Allow tools to register a thread->priv destructor and use it in 'perf trace'. - Free evsel->priv in 'perf trace'. - Free string returned by synthesize_perf_probe_point() when the caller fails to do all it needs. - Adjust various compiler options to not consider errors some warnings when building with broken headers found in things like python, flex, bison, as we otherwise build with -Werror. Some for gcc, some for clang, some for some specific version of those, some for some specific version of flex or bison, or some specific combination of these components, bah. - Allow customization of clang options for BPF target, this helps building on gentoo where there are other oddities where BPF targets gets passed some compiler options intended for the native build, so building with WERROR=0 helps while these oddities are fixed. - Dont pass ERR_PTR() values to perf_session__delete() in 'perf top' and 'perf lock', fixing some segfaults when handling some odd failures. - Add LTO build option. - Fix format of unordered lists in the perf docs (tools/perf/Documentation). - Overhaul the bison files, using constructs such as YYNOMEM. - Remove unused tokens from the bison .y files. - Add more comments to various structs. - A few LoongArch enablement patches. Vendor events (JSON): - Add JSON metrics for Yitian 710 DDR (aarch64). Things like: EventName, BriefDescription visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.", visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.", op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.", op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.", op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.", - Add AmpereOne metrics (aarch64). - Update N2 and V2 metrics (aarch64) and events using Arm telemetry repo. - Update scale units and descriptions of common topdown metrics on aarch64. Things like: - "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)", - "BriefDescription": "Frontend bound L1 topdown metric", + "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))", + "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.", - Update events for intel: meteorlake to 1.04, sapphirerapids to 1.15, Icelake+ metric constraints. - Update files for the power10 platform. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZPfJZgAKCRCyPKLppCJ+ J1/eAP9lgtavD0V75wy1p5zyotkceOmPTkk1DYFVx2Euhxa/lAD/YW/JvuVSo0Gr HqJP52XaV0tF8gG+YxL+Lay/Ke0P5AQ= =d12c -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "perf tools maintainership: - Add git information for perf-tools and perf-tools-next trees and branches to the MAINTAINERS file. That is where development now takes place and myself and Namhyung Kim have write access, more people to come as we emulate other maintainer groups. perf record: - Record kernel data maps when 'perf record --data' is used, so that global variables can be resolved and used in tools that do data profiling. perf trace: - Remove the old, experimental support for BPF events in which a .c file was passed as an event: "perf trace -e hello.c" to then get compiled and loaded. The only known usage for that, that shipped with the kernel as an example for such events, augmented the raw_syscalls tracepoints and was converted to a libbpf skeleton, reusing all the user space components and the BPF code connected to the syscalls. In the end just the way to glue the BPF part and the user space type beautifiers changed, now being performed by libbpf skeletons. The next step is to use BTF to do pretty printing of all syscall types, as discussed with Alan Maguire and others. Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all path/filenames/strings, some of the networking data structures, perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5 seconds: # perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5 0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 ? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0 10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ... ? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 ? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0 3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ... 3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0 Performance counter stats for 'sleep 5': 2,617,347 cycles 1,855,997 instructions # 0.71 insn per cycle 5.002282128 seconds time elapsed 0.000855000 seconds user 0.000852000 seconds sys perf annotate: - Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1) for licensing reasons, and we missed a build test on tools/perf/tests makefile. Since we now default to NDEBUG=1, we ended up segfaulting when building with BUILD_NONDISTRO=1 because a needed initialization routine was being "error checked" via an assert. Fix it by explicitly checking the result and aborting instead if it fails. We better back propagate the error, but at least 'perf annotate' on samples collected for a BPF program is back working when perf is built with BUILD_NONDISTRO=1. perf report/top: - Add back TUI hierarchy mode header, that is seen when using 'perf report/top --hierarchy'. - Fix the number of entries for 'e' key in the TUI that was preventing navigation of lines when expanding an entry. perf report/script: - Support cross platform register handling, allowing a perf.data file collected on one architecture to have registers sampled correctly displayed when analysis tools such as 'perf report' and 'perf script' are used on a different architecture. - Fix handling of event attributes in pipe mode, i.e. when one uses: perf record -o - | perf report -i - When no perf.data files are used. - Handle files generated via pipe mode with a version of perf and then read also via pipe mode with a different version of perf, where the event attr record may have changed, use the record size field to properly support this version mismatch. perf probe: - Accessing global variables from uprobes isn't supported, make the error message state that instead of stating that some minimal kernel version is needed to have that feature. This seems just a tool limitation, the kernel probably has all that is needed. perf tests: - Fix a reference count related leak in the dlfilter v0 API where the result of a thread__find_symbol_fb() is not matched with an addr_location__exit() to drop the reference counts of the resolved components (machine, thread, map, symbol, etc). Add a dlfilter test to make sure that doesn't regresses. - Lots of fixes for the 'perf test' written in shell script related to problems found with the shellcheck utility. - Fixes for 'perf test' shell scripts testing features enabled when perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf counters. - Add perf record sample filtering test, things like the following example, that gets implemented as a BPF filter attached to the event: # perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000' - Improve the way the task_analyzer test checks if libtraceevent is linked, using 'perf version --build-options' instead of the more expensinve 'perf record -e "sched:sched_switch"'. - Add support for riscv in the mmap-basic test. (This went as well via the RiscV tree, same contents). libperf: - Implement riscv mmap support (This went as well via the RiscV tree, same contents). perf script: - New tool that converts perf.data files to the firefox profiler format so that one can use the visualizer at https://profiler.firefox.com/. Done by Anup Sharma as part of this year's Google Summer of Code. One can generate the output and upload it to the web interface but Anup also automated everything: perf script gecko -F 99 -a sleep 60 - Support syscall name parsing on arm64. - Print "cgroup" field on the same line as "comm". perf bench: - Add new 'uprobe' benchmark to measure the overhead of uprobes with/without BPF programs attached to it. - breakpoints are not available on power9, skip that test. perf stat: - Add #num_cpus_online literal to be used in 'perf stat' metrics, and add this extra 'perf test' check that exemplifies its purpose: TEST_ASSERT_VAL("#num_cpus_online", expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0); TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0); TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online); Miscellaneous: - Improve tool startup time by lazily reading PMU, JSON, sysfs data. - Improve error reporting in the parsing of events, passing YYLTYPE to error routines, so that the output can show were the parsing error was found. - Add 'perf test' entries to check the parsing of events improvements. - Fix various leak for things detected by -fsanitize=address, mostly things that would be freed at tool exit, including: - Free evsel->filter on the destructor. - Allow tools to register a thread->priv destructor and use it in 'perf trace'. - Free evsel->priv in 'perf trace'. - Free string returned by synthesize_perf_probe_point() when the caller fails to do all it needs. - Adjust various compiler options to not consider errors some warnings when building with broken headers found in things like python, flex, bison, as we otherwise build with -Werror. Some for gcc, some for clang, some for some specific version of those, some for some specific version of flex or bison, or some specific combination of these components, bah. - Allow customization of clang options for BPF target, this helps building on gentoo where there are other oddities where BPF targets gets passed some compiler options intended for the native build, so building with WERROR=0 helps while these oddities are fixed. - Dont pass ERR_PTR() values to perf_session__delete() in 'perf top' and 'perf lock', fixing some segfaults when handling some odd failures. - Add LTO build option. - Fix format of unordered lists in the perf docs (tools/perf/Documentation) - Overhaul the bison files, using constructs such as YYNOMEM. - Remove unused tokens from the bison .y files. - Add more comments to various structs. - A few LoongArch enablement patches. Vendor events (JSON): - Add JSON metrics for Yitian 710 DDR (aarch64). Things like: EventName, BriefDescription visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.", visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.", op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.", op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.", op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.", - Add AmpereOne metrics (aarch64). - Update N2 and V2 metrics (aarch64) and events using Arm telemetry repo. - Update scale units and descriptions of common topdown metrics on aarch64. Things like: - "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)", - "BriefDescription": "Frontend bound L1 topdown metric", + "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))", + "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.", - Update events for intel: meteorlake to 1.04, sapphirerapids to 1.15, Icelake+ metric constraints. - Update files for the power10 platform" * tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits) perf parse-events: Fix driver config term perf parse-events: Fixes relating to no_value terms perf parse-events: Fix propagation of term's no_value when cloning perf parse-events: Name the two term enums perf list: Don't print Unit for "default_core" perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake perf dlfilter: Avoid leak in v0 API test use of resolve_address() perf metric: Add #num_cpus_online literal perf pmu: Remove str from perf_pmu_alias perf parse-events: Make common term list to strbuf helper perf parse-events: Minor help message improvements perf pmu: Avoid uninitialized use of alias->str perf jevents: Use "default_core" for events with no Unit perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test perf test shell stat_bpf_counters: Fix test on Intel perf test shell record_bpf_filter: Skip 6.2 kernel libperf: Get rid of attr.id field perf tools: Convert to perf_record_header_attr_id() libperf: Add perf_record_header_attr_id() perf tools: Handle old data in PERF_RECORD_ATTR ... |
||
![]() |
45fc4628c1 |
perf parse-events: Fix driver config term
Inadvertently deleted in commit |
||
![]() |
9ea150a8d0 |
perf parse-events: Fixes relating to no_value terms
A term may have no value in which case it is assumed to have a value of 1. It doesn't just apply to alias/event terms so change the parse_events_term__to_strbuf assert. Commit |
||
![]() |
64199ae4b8 |
perf parse-events: Fix propagation of term's no_value when cloning
The no_value field in 'struct parse_events_term' indicates that the val variable isn't used, the case for an event name. Cloning wasn't propagating this, making cloned event name terms appearing to have a constant assinged to them. Working around the bug would check for a value of 1 assigned to value, but then this meant a user value of 1 couldn't be differentiated causing the value to be lost in debug printing and perf list. The change fixes the cloning and updates the "val.num ==/!= 1" tests to use no_value instead. To better check the no_value is set appropriately parameter comments are added for constant values. This found that no_value wasn't set correctly in parse_events_multi_pmu_add, which matters now that no_value is used to indicate an event name. Fixes: |
||
![]() |
58d3a4cea4 |
perf parse-events: Name the two term enums
Name the enums used by 'struct parse_events_term' to parse_events__term_val_type and parse_events__term_type. This allows greater compile time error checking. Fix -Wswitch related issues by explicitly listing all enum values prior to default. Add config_term_name to safely look up a parse_events__term_type name, bounds checking the array access first. Add documentation to 'struct parse_events_terms' and reorder to save space. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230831071421.2201358-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
45210e1ada |
perf dlfilter: Avoid leak in v0 API test use of resolve_address()
The introduction of reference counting causes the v0 API perf_dlfilter_fns.resolve_address() to leak. v2 API introduced perf_dlfilter_fns.al_cleanup() to prevent that. For the v0 API, avoid the leak by exiting the addr_location immediately, since the documentation makes it clear that pointers obtained via perf_dlfilter_fns are not necessarily valid (dereferenceable) after 'filter_event' and 'filter_event_early' return. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/oe-lkp/202308232146.94d82cb4-oliver.sang@intel.com Link: http://lore.kernel.org/lkml/20230830090539.68206-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f0005f1732 |
perf metric: Add #num_cpus_online literal
Returns the number of CPUs online, unlike #num_cpus that returns the number present. Add a test of the property. This will be used in future Intel metrics. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20230830073026.1829912-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
30f0b435bb |
perf pmu: Remove str from perf_pmu_alias
Currently the value is only used in perf list. Compute the value just when needed to avoid unnecessary overhead. Recycle the strbuf to avoid memory allocation overhead. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20230830070753.1821629-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7a6e916447 |
perf parse-events: Make common term list to strbuf helper
A term list is turned into a string for debug output and for the str value in the alias. Add a helper to do this based on existing code, but then fix for situations like events being identified. Use strbuf to manage the dynamic memory allocation and remove the 256 byte limit. Use in various places the string of the term list is required. Before: $ sudo perf stat -vv -e inst_retired.any true Using CPUID GenuineIntel-6-8D-1 intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch Attempting to add event pmu 'cpu' with 'inst_retired.any,' that may result in non-fatal errors After aliases, add event pmu 'cpu' with 'event,period,' that may result in non-fatal errors inst_retired.any -> cpu/inst_retired.any/ ... After: $ sudo perf stat -vv -e inst_retired.any true Using CPUID GenuineIntel-6-8D-1 intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch Attempt to add: cpu/inst_retired.any/ ..after resolving event: cpu/event=0xc0,period=0x1e8483/ inst_retired.any -> cpu/event=0xc0,period=0x1e8483/ ... Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20230830070753.1821629-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6beb6cfddf |
perf parse-events: Minor help message improvements
Be more specific and fix a typo. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20230830070753.1821629-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
196e355877 |
perf pmu: Avoid uninitialized use of alias->str
alias is allocated with malloc allowing uninitialized memory to be
accessed.
The initialization of str was moved late after it could have been
updated by a JSON event, however, this create a potential for an
uninitialized use.
Fix this by assigning str to NULL early.
Testing on ARM (Raspberry Pi) showed a memory leak in the same code so
add a zfree.
Fixes:
|
||
![]() |
d2045f8715 |
perf jevents: Use "default_core" for events with no Unit
The JSON Unit field encodes the name of the PMU to match the events
to. When no name is given it has meant the "cpu" core PMU except for
tests.
On ARM, Intel hybrid and s390 the core PMU is named differently which
means that using "cpu" for this case causes the events not to get
matched to the PMU.
Introduce a new "default_core" string for this case and in the
pmu__name_match force all core PMUs to match this name.
Fixes:
|
||
![]() |
f174341d0d |
perf tools: Convert to perf_record_header_attr_id()
Instead of accessing the attr.id directly, use the perf_record_header_attr_id() helper to handle old versions. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230825152552.112913-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9bf63282ea |
perf tools: Handle old data in PERF_RECORD_ATTR
The PERF_RECORD_ATTR is used for a pipe mode to describe an event with
attribute and IDs. The ID table comes after the attr and it calculate
size of the table using the total record size and the attr size.
n_ids = (total_record_size - end_of_the_attr_field) / sizeof(u64)
This is fine for most use cases, but sometimes it saves the pipe output
in a file and then process it later. And it becomes a problem if there
is a change in attr size between the record and report.
$ perf record -o- > perf-pipe.data # old version
$ perf report -i- < perf-pipe.data # new version
For example, if the attr size is 128 and it has 4 IDs, then it would
save them in 168 byte like below:
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
128 byte: perf event attr { .size = 128, ... },
32 byte: event IDs [] = { 1234, 1235, 1236, 1237 },
But when report later, it thinks the attr size is 136 then it only read
the last 3 entries as ID.
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
136 byte: perf event attr { .size = 136, ... },
24 byte: event IDs [] = { 1235, 1236, 1237 }, // 1234 is missing
So it should use the recorded version of the attr. The attr has the
size field already then it should honor the size when reading data.
Fixes:
|
||
![]() |
cd4e1efbbc |
perf pmus: Skip duplicate PMUs and don't print list suffix by default
Add a PMUs scan that ignores duplicates. When there are multiple PMUs that differ only by suffix, by default just list the first one and skip all others. The scan routine checks that the PMU names match but doesn't enforce that the numbers are consecutive as for some PMUs there are gaps. If "-v" is passed to "perf list" then list all PMUs. With the previous change duplicate PMUs are no longer printed but the suffix of the first is printed. When duplicate PMUs are being skipped avoid printing the suffix. Before: $ perf list ... uncore_imc_free_running_0/data_read/ [Kernel PMU event] uncore_imc_free_running_0/data_total/ [Kernel PMU event] uncore_imc_free_running_0/data_write/ [Kernel PMU event] uncore_imc_free_running_1/data_read/ [Kernel PMU event] uncore_imc_free_running_1/data_total/ [Kernel PMU event] uncore_imc_free_running_1/data_write/ [Kernel PMU event] After: $ perf list ... uncore_imc_free_running/data_read/ [Kernel PMU event] uncore_imc_free_running/data_total/ [Kernel PMU event] uncore_imc_free_running/data_write/ [Kernel PMU event] ... $ perf list -v uncore_imc_free_running_0/data_read/ [Kernel PMU event] uncore_imc_free_running_0/data_total/ [Kernel PMU event] uncore_imc_free_running_0/data_write/ [Kernel PMU event] uncore_imc_free_running_1/data_read/ [Kernel PMU event] uncore_imc_free_running_1/data_total/ [Kernel PMU event] uncore_imc_free_running_1/data_write/ [Kernel PMU event] ... Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20230825135237.921058-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
8d9f5146f5 |
perf pmus: Sort pmus by name then suffix
Sort PMUs by name. If two PMUs have the same name but differ by suffix, sort the suffixes numerically. For example, "breakpoint" comes before "cpu", "uncore_imc_free_running_0" comes before "uncore_imc_free_running_1". Suffixes need to be treated specially as otherwise they will be ordered like 0, 1, 10, 11, .., 2, 20, 21, .., etc. Only PMUs starting 'uncore_' are considered to have a potential suffix. Sorting of PMUs is done so that later patches can skip duplicate uncore PMUs that differ only by there suffix. Committer notes: Used the more compact, intention revealing strstarts() function we got from the kernel sources: - if (strncmp(str, "uncore_", 7)) + if (!strstarts(str, "uncore_")) Also in pmus_cmp() the lhs_num and rhs_num variables may end up not being set for non "uncore_" prefixed PMUs in pmu_name_len_no_suffix(), or at least gcc 7.5 in some distros (opensuse 15.5, to be EOLed in Dec/2024) thins so, so initialize both to zero. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20230825135237.921058-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c56f286f24 |
perf tools: Allow to use cpuinfo on LoongArch
Define these macros so that the CPU name can be displayed when running 'perf report' and 'perf timechart'. Committer notes: No need to have: if (strcasestr(buf, "Model Name")) { strlcpy(cpu_m, &buf[13], 255); break; } else if (strcasestr(buf, "model name")) { strlcpy(cpu_m, &buf[13], 255); break; } As the point of strcasestr() is to be case insensitive to both the haystack and the needle, so simplify the above to just: if (strcasestr(buf, "model name")) { strlcpy(cpu_m, &buf[13], 255); break; } Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Link: https://lore.kernel.org/r/db968a186a10e4629fe10c26a1210f7126ad41ec.1692962043.git.siyanteng@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7512e96957 |
perf build-id: Simplify build_id_cache__cachedir()
Initialize realname to NULL, rather than name. This avoids a cast and as realpath is either NULL or an allocated string, free can be called unconditionally. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230825024002.801955-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b7823045ec |
perf pmu: Make id const and add missing free
The struct pmu id is initialized from pmu_id that is read into allocated memory from a file, as such it needs free-ing in pmu__delete(). Make the id value const so that we can remove casts in tests. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230825024002.801955-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
970ef02e98 |
perf parse-events: Make term's config const
This avoids casts in tests. Use zfree in a few places to avoid warnings about a freeing a const pointer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230825024002.801955-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c091ee9089 |
perf pmu: Remove logic for PMU name being NULL
The PMU name could be NULL in the case of the fake_pmu. Initialize the name for the fake_pmu to "fake" so that all other logic can assume it is initialized. Add a const to the type of name so that a literal can be used to avoid additional initialization code. Propagate the cost through related routines and remove now unnecessary "(char *)" casts. Doing this located a bug in builtin-list for the pmu_glob that was missing a strdup. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20230825024002.801955-3-irogers@google.com Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Li <liwei391@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Ming Wang <wangming01@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9897009eec |
perf header: Fix missing PMU caps
PMU caps are written as HEADER_PMU_CAPS or for the special case of the
PMU "cpu" as HEADER_CPU_PMU_CAPS. As the PMU "cpu" is special, and not
any "core" PMU, the logic had become broken and core PMUs not called
"cpu" were not having their caps written.
This affects ARM and s390 non-hybrid PMUs.
Simplify the PMU caps writing logic to scan one fewer time and to be
more explicit in its behavior.
Fixes:
|
||
![]() |
8d4b6d37ea |
perf pmu: Lazily load sysfs aliases
Don't load sysfs aliases for a PMU when the PMU is first created, defer until an alias needs to be found. For the pmu-scan benchmark, average core PMU scanning is reduced by 30.8%, and average PMU scanning by 12.6%. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7b723dbb96 |
perf pmu: Be lazy about loading event info files from sysfs
Event info is only needed when an event is parsed or when merging data from an JSON and sysfs event. Be lazy in its loading to reduce file accesses. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-16-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
88ed91848d |
perf pmu: Scan type early to fail an invalid PMU quickly
Scan sysfs PMU's type early so that format and aliases aren't attempted to be loaded if the PMU name is invalid. This is the case for event_pmu tokens in parse-events.y where a wildcard name is first assumed to be a PMU name. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-15-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e6ff1eed35 |
perf pmu: Lazily add JSON events
Rather than scanning all JSON events and adding them when a PMU is created, add the alias when the JSON event is needed. Average core PMU scanning run time reduced by 60.2%. Average PMU scanning run time reduced by 15%. Page faults with no events reduced by 74 page faults, 4% of total. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7c52f10c0d |
perf pmu: Cache JSON events table
Cache the JSON events table so that finding it isn't done per event/alias. Change the events table find so that when the PMU is given, if the PMU has no JSON events return null. Update usage to always use the PMU variable. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f63a536f03 |
perf pmu: Merge JSON events with sysfs at load time
Rather than load all sysfs events then parsing all JSON events and merging with ones that already exist. When a sysfs event is loaded, look for a corresponding JSON event and merge immediately. To simplify the logic, early exit the perf_pmu__new_alias function if an alias is attempted to be added twice - as merging has already been explicitly handled. Fix the copying of terms to a merged alias and some ENOMEM paths. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f26d22f1ba |
perf pmu: Prefer passing pmu to aliases list
The aliases list is part of the PMU. Rather than pass the aliases list, pass the full PMU simplifying some callbacks. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
edb217ff14 |
perf pmu: Parse sysfs events directly from a file
Rather than read a sysfs events file into a 256 byte char buffer, pass the FILE* directly to the lex/yacc parser. This avoids there being a maximum events file size. While changing the API, constify some arguments to remove unnecessary casts. Allocating the read buffer decreases the performance of pmu-scan by around 3%. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e3edd6cf63 |
perf pmu-events: Reduce processed events by passing PMU
Pass the PMU to pmu_events_table__for_each_event so that entries that don't match don't need to be processed by callback. If a NULL PMU is passed then all PMUs are processed. 'perf bench internals pmu-scan's "Average PMU scanning" performance is reduced by about 5% on an Intel tigerlake. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c4ac7f7542 |
perf s390 s390_cpumcfdg_dump: Don't scan all PMUs
Rather than scanning all PMUs for a counter name, scan the PMU associated with the evsel of the sample. This is done to remove a dependence on pmu-events.h. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9d31cb9395 |
perf parse-events: Improve error message for double setting
Double setting information for an event would produce an error message associated with the PMU rather than the term that was double setting. Improve the error message to be on the term. Before: $ perf stat -e 'cpu/inst_retired.any,inst_retired.any/' true event syntax error: 'cpu/inst_retired.any,inst_retired.any/' \___ Bad event or PMU Unabled to find PMU or event on a PMU of 'cpu' Run 'perf list' for a list of valid events $ After: $ perf stat -e 'cpu/inst_retired.any,inst_retired.any/' true event syntax error: '..etired.any,inst_retired.any/' \___ Bad event or PMU Unabled to find PMU or event on a PMU of 'cpu' Initial error: event syntax error: '..etired.any,inst_retired.any/' \___ Attempt to set event's scale twice Run 'perf list' for a list of valid events Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4000519eb0 |
perf pmu-events: Add extra underscore to function names
Add extra underscore before "for" of pmu_events_table_for_each_event and pmu_metrics_table_for_each_metric. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c3245d2093 |
perf pmu: Abstract alias/event struct
In order to be able to lazily compute aliases/events for a PMU, move the struct perf_pmu_alias into pmu.c. Add perf_pmu__find_event and perf_pmu__for_each_event that take a callback that is called for the found event or for each event. The layout of struct pmu and the event/alias list is unchanged but the API is altered so that aliases are no longer directly accessed, allowing for later changes. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5040264121 |
perf pmu: Make the loading of formats lazy
The sysfs format files are loaded eagerly in a PMU. Add a flag so that we create the format but only load the contents when necessary. Reduce the size of the value in struct perf_pmu_format and avoid holes so there is no additional space requirement. For "perf stat -e cycles true" this reduces the number of openat calls from 648 to 573 (about 12%). The benchmark pmu scan speed is improved by roughly 5%. Before: $ perf bench internals pmu-scan Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 1061.100 usec (+- 9.965 usec) Average PMU scanning took: 4725.300 usec (+- 260.599 usec) After: $ perf bench internals pmu-scan Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 989.170 usec (+- 6.873 usec) Average PMU scanning took: 4520.960 usec (+- 251.272 usec) Committer testing: On a AMD Ryzen 5950x: Before: $ perf bench internals pmu-scan -i1000 # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 563.466 usec (+- 1.008 usec) Average PMU scanning took: 1619.174 usec (+- 23.627 usec) $ perf stat -r5 perf bench internals pmu-scan -i1000 # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 583.401 usec (+- 2.098 usec) Average PMU scanning took: 1677.352 usec (+- 24.636 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 553.254 usec (+- 0.825 usec) Average PMU scanning took: 1635.655 usec (+- 24.312 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 557.733 usec (+- 0.980 usec) Average PMU scanning took: 1600.659 usec (+- 23.344 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 554.906 usec (+- 0.774 usec) Average PMU scanning took: 1595.338 usec (+- 23.288 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 551.798 usec (+- 0.967 usec) Average PMU scanning took: 1623.213 usec (+- 23.998 usec) Performance counter stats for 'perf bench internals pmu-scan -i1000' (5 runs): 3276.82 msec task-clock:u # 0.990 CPUs utilized ( +- 0.82% ) 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 1008 page-faults:u # 307.615 /sec ( +- 0.04% ) 12049614778 cycles:u # 3.677 GHz ( +- 0.07% ) (83.34%) 117507478 stalled-cycles-frontend:u # 0.98% frontend cycles idle ( +- 0.33% ) (83.32%) 27106761 stalled-cycles-backend:u # 0.22% backend cycles idle ( +- 9.55% ) (83.36%) 33294953848 instructions:u # 2.76 insn per cycle # 0.00 stalled cycles per insn ( +- 0.03% ) (83.31%) 6849825049 branches:u # 2.090 G/sec ( +- 0.03% ) (83.37%) 71533903 branch-misses:u # 1.04% of all branches ( +- 0.20% ) (83.30%) 3.3088 +- 0.0302 seconds time elapsed ( +- 0.91% ) $ After: $ perf stat -r5 perf bench internals pmu-scan -i1000 # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 550.702 usec (+- 0.958 usec) Average PMU scanning took: 1566.577 usec (+- 22.747 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 548.315 usec (+- 0.555 usec) Average PMU scanning took: 1565.499 usec (+- 22.760 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 548.073 usec (+- 0.555 usec) Average PMU scanning took: 1586.097 usec (+- 23.299 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 561.184 usec (+- 2.709 usec) Average PMU scanning took: 1567.153 usec (+- 22.548 usec) # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 1000 times Average core PMU scanning took: 546.987 usec (+- 0.553 usec) Average PMU scanning took: 1562.814 usec (+- 22.729 usec) Performance counter stats for 'perf bench internals pmu-scan -i1000' (5 runs): 3170.86 msec task-clock:u # 0.992 CPUs utilized ( +- 0.22% ) 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 1010 page-faults:u # 318.526 /sec ( +- 0.04% ) 11890047674 cycles:u # 3.750 GHz ( +- 0.14% ) (83.27%) 119090499 stalled-cycles-frontend:u # 1.00% frontend cycles idle ( +- 0.46% ) (83.40%) 32502449 stalled-cycles-backend:u # 0.27% backend cycles idle ( +- 8.32% ) (83.30%) 33119141261 instructions:u # 2.79 insn per cycle # 0.00 stalled cycles per insn ( +- 0.01% ) (83.37%) 6812816561 branches:u # 2.149 G/sec ( +- 0.01% ) (83.29%) 70157855 branch-misses:u # 1.03% of all branches ( +- 0.28% ) (83.38%) 3.19710 +- 0.00826 seconds time elapsed ( +- 0.26% ) $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
838a8c5f40 |
perf pmu: Pass PMU rather than aliases and format
Pass the pmu so the aliases and format list can be better abstracted and later lazily loaded. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
da6a5afda5 |
perf pmu: Avoid passing format list to perf_pmu__format_bits()
Pass the PMU so the format list can be better abstracted and later lazily loaded. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-8-irogers@google.com [ Did missing conversions in tools/perf/arch/arm*/util/cs-etm.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7eb5473314 |
perf pmu: Avoid passing format list to perf_pmu__format_type
Pass the pmu so the format list can be better abstracted and later lazily loaded. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
804fee5d0f |
perf pmu: Avoid passing format list to perf_pmu__config_terms()
Abstract the format list better, hiding it in the PMU, by changing perf_pmu__config_terms() the PMU rather than the format list in the PMU. Change the PMU test to pass a dummy PMU for this purpose. Changing the test allows perf_pmu__del_formats() to become static. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6f2f6eafcd |
perf pmu: Reduce scope of perf_pmu_error()
Move declaration from header file to pmu.y and make static. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
cc5adb7347 |
perf pmu: Move perf_pmu__set_format to pmu.y
Avoid having the function in the C and header file, as it is only used locally by pmu.y. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e1a3aad31c |
perf pmu: Avoid a path name copy
Rather than read a base path and append into a 2nd path, read the base path directly into output buffer and append to that. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
91e2e9f0b8 |
perf script ibs: Remove unused include
Done to reduce dependencies on pmu-events.h. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7a46404b3c |
perf lzma: Convert some pr_err() to pr_debug() as callers already use pr_debug()
I noticed some error with: # perf list ex_ret_brn lzma: fopen failed on /usr/lib/modules/5.15.14-100.fc34.x86_64/kernel/net/bluetooth/bnep/bnep.ko.xz: 'No such file or directory' lzma: fopen failed on /usr/lib/modules/5.16.16-200.fc35.x86_64/kernel/drivers/gpu/drm/drm_kms_helper.ko.xz: 'No such file or directory' lzma: fopen failed on /usr/lib/modules/5.18.16-200.fc36.x86_64/kernel/arch/x86/crypto/crct10dif-pclmul.ko.xz: 'No such file or directory' lzma: fopen failed on /usr/lib/modules/5.16.16-200.fc35.x86_64/kernel/drivers/i2c/busses/i2c-piix4.ko.xz: 'No such file or directory' <BIG SNIP> Then using 'perf probe' + 'perf trace' to debug 'perf list', it seems its some inconsistency in the ~/.debug/ cache where broken build id symlinks that ends up making it try to uncompress some kernel modules using the lzma routines: 395.309 perf/3594447 probe_perf:lzma_decompress_to_file(__probe_ip: 6118448, input_string: "/usr/lib/modules/5.18.17-200.fc36.x86_64/kernel/drivers/nvme/host/nvme.ko.xz") lzma_decompress_to_file (/var/home/acme/bin/perf) filename__decompress (/var/home/acme/bin/perf) filename__read_build_id (/var/home/acme/bin/perf) filename__sprintf_build_id (inlined) build_id_cache__valid_id (inlined) build_id_cache__list_all (/var/home/acme/bin/perf) print_sdt_events (/var/home/acme/bin/perf) cmd_list (/var/home/acme/bin/perf) run_builtin (/var/home/acme/bin/perf) handle_internal_command (inlined) run_argv (inlined) main (/var/home/acme/bin/perf) __libc_start_call_main (/usr/lib64/libc.so.6) __libc_start_main@@GLIBC_2.34 (/usr/lib64/libc.so.6) _start (/var/home/acme/bin/perf) But callers of filename__decompress() already check its return and use pr_debug(), so be consistent and make functions it calls also use pr_debug(). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZOUD0+GkuCVkYF7n@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
58a8d2edd5 |
perf stat-display: Check if snprintf()'s fmt argument is NULL
It is undefined behavior to pass NULL as snprintf()'s fmt argument. Here is an example to trigger the problem: $ perf stat --metric-only -x, -e instructions -- sleep 1 insn per cycle, Segmentation fault (core dumped) With this patch: $ perf stat --metric-only -x, -e instructions -- sleep 1 insn per cycle, , Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kaige Ye <ye@kaige.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/01CA7674B690CA24+20230804020907.144562-2-ye@kaige.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7d9642311b |
perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two.
Similar to what was done in the previous cset for sizeof(saddr), we need to make sure sizeof(augmented_arg->value) is a power of two to do bounds checking using &=: augmented_len &= sizeof(augmented_arg->value) - 1; Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZONrPo0NSqdbXiGx@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
262b54b6c9 |
perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two.
We're using the BPF verifier suggestion: 22: (85) call bpf_probe_read#4 R2 min value is negative, either use unsigned or 'var &= const' That works only when const is a (power of two - 1) so add an assert to make sure that that is the case. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZONrFmJBNlQpSpZj@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9d5da30e4a |
perf jevents: Add a new expression builtin strcmp_cpuid_str()
This will allow writing formulas that are conditional on a specific CPU type or CPU version. It calls through to the existing strcmp_cpuid_str() function in Perf which has a default weak version, and an arch specific version for x86 and arm64. The function takes an 'ID' type value, which is a string. But in this case Arm CPU IDs are hex numbers prefixed with '0x'. metric.py assumes strings are only used by event names, and that they can't start with a number ('0'), so an additional change has to be made to the regex to convert hex numbers back to 'ID' types. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230816114841.1679234-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
1836480429 |
perf bpf_skel augmented_raw_syscalls: Cap the socklen parameter using &= sizeof(saddr)
This works with: $ clang -v clang version 14.0.5 (Fedora 14.0.5-2.fc36) $ But not with: $ clang -v clang version 16.0.6 (Fedora 16.0.6-2.fc38) $ [root@quaco ~]# perf trace -e connect*,sendto* ping -c 10 localhost libbpf: prog 'sys_enter_sendto': BPF program load failed: Permission denied libbpf: prog 'sys_enter_sendto': -- BEGIN PROG LOAD LOG -- reg type unsupported for arg#0 function sys_enter_sendto#59 0: R1=ctx(off=0,imm=0) R10=fp0 ; int sys_enter_sendto(struct syscall_enter_args *args) 0: (bf) r6 = r1 ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0) 1: (b7) r1 = 0 ; R1_w=0 ; int key = 0; 2: (63) *(u32 *)(r10 -4) = r1 ; R1_w=0 R10=fp0 fp-8=0000???? 3: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 ; 4: (07) r2 += -4 ; R2_w=fp-4 ; return bpf_map_lookup_elem(&augmented_args_tmp, &key); 5: (18) r1 = 0xffff8de5a5b8bc00 ; R1_w=map_ptr(off=0,ks=4,vs=8272,imm=0) 7: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) 8: (bf) r7 = r0 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) R7_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) 9: (b7) r0 = 1 ; R0_w=1 ; if (augmented_args == NULL) 10: (15) if r7 == 0x0 goto pc+25 ; R7_w=map_value(off=0,ks=4,vs=8272,imm=0) ; unsigned int socklen = args->args[5]; 11: (79) r1 = *(u64 *)(r6 +56) ; R1_w=scalar() R6_w=ctx(off=0,imm=0) ; 12: (bf) r2 = r1 ; R1_w=scalar(id=2) R2_w=scalar(id=2) 13: (67) r2 <<= 32 ; R2_w=scalar(smax=9223372032559808512,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32_max=0) 14: (77) r2 >>= 32 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 15: (b7) r8 = 128 ; R8=128 ; if (socklen > sizeof(augmented_args->saddr)) 16: (25) if r2 > 0x80 goto pc+1 ; R2=scalar(umax=128,var_off=(0x0; 0xff)) 17: (bf) r8 = r1 ; R1=scalar(id=2) R8_w=scalar(id=2) ; const void *sockaddr_arg = (const void *)args->args[4]; 18: (79) r3 = *(u64 *)(r6 +48) ; R3_w=scalar() R6=ctx(off=0,imm=0) ; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg); 19: (bf) r1 = r7 ; R1_w=map_value(off=0,ks=4,vs=8272,imm=0) R7=map_value(off=0,ks=4,vs=8272,imm=0) 20: (07) r1 += 64 ; R1_w=map_value(off=64,ks=4,vs=8272,imm=0) ; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg); 21: (bf) r2 = r8 ; R2_w=scalar(id=2) R8_w=scalar(id=2) 22: (85) call bpf_probe_read#4 R2 min value is negative, either use unsigned or 'var &= const' processed 22 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'sys_enter_sendto': failed to load: -13 libbpf: failed to load object 'augmented_raw_syscalls_bpf' libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -13 So use the suggested &= variant since sizeof(saddr) == 128 bytes. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
ff382c1ce8 |
perf parse-regs: Move out arch specific header from util/perf_regs.h
util/perf_regs.h includes another perf_regs.h: #include <perf_regs.h> Here it includes architecture specific header, for example, if we build arm64 target, the header tools/perf/arch/arm64/include/perf_regs.h is included. We use this implicit way to include architecture specific header, which is not directive; furthermore, util/perf_regs.c is coupled with the architecture specific definitions. This patch moves out arch specific header from util/perf_regs.h for generalizing the 'util' folder, as a result, the source files in 'arch' folder explicitly include architecture's perf_regs.h. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
856caabf72 |
perf parse-regs: Remove PERF_REGS_{MAX|MASK} from common code
The macros PERF_REGS_MAX and PERF_REGS_MASK are architecture specific, let's remove them from the common file util/perf_regs.c. As a side effect, the weak functions arch__intr_reg_mask() and arch__user_reg_mask() just return zeros, every arch defines its own functions in the 'arch' folder for returning right values. Note, we don't need to return intr/user register masks dynamically, this is because these two functions are invoked during recording phase but not decoding phase, they are always invoked on the native environment, thus we don't need to parse them dynamically. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d8f69fb6fa |
perf unwind: Use perf_arch_reg_{ip|sp}() to substitute macros
We use perf_arch_reg_ip() and perf_arch_reg_sp() to substitute macros for obtaining the register numbers of SP and IP. This modification enables cross analysis in the unwinding, therefore, the unwinding is not restricted to the predefined values by the macros. Consequently, the macros LIBUNWIND__ARCH_REG_{IP|SP} are removed since they are no longer used. Committer notes: Add missing "util/env.h" header to make sure we have the definition for perf_env__arch(), that when built with NO_LIBUNWIND=1 isn't available, i.e. it was being included by sheer luck. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
34af56afac |
perf parse-regs: Introduce functions perf_arch_reg_{ip|sp}()
The current code uses macros PERF_REG_IP and PERF_REG_SP for parsing registers and we build perf with these macros statically, which means it only can correctly analyze CPU registers for the native architecture and fails to support cross analysis (e.g. we build perf on x86 and cannot analyze Arm64's registers). We need to generalize util/perf_regs.c for support multi architectures, as a first step, this commit introduces new functions perf_arch_reg_ip() and perf_arch_reg_sp(), these two functions dynamically return IP and SP register index respectively according to the parameter "arch". Every architecture has its own functions (like __perf_reg_ip_arm64 and __perf_reg_sp_arm64), these architecture specific functions are defined in each arch source file under folder util/perf-regs-arch; at the end all of them are built into the tool for cross analysis. Committer notes: Make DWARF_MINIMAL_REGS() an inline function, so that we can use the __maybe_unused attribute for the 'arch' parameter, as this will avoid a build failure when that variable is unused in the callers. That happens when building on unsupported architectures, the ones without HAVE_PERF_REGS_SUPPORT defined. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5000e7f61a |
perf parse-regs: Refactor arch register parsing functions
Every architecture has a specific register parsing function for returning register name based on register index, to support cross analysis (e.g. we use perf x86 binary to parse Arm64's perf data), we build all these register parsing functions into the tool, this is why we place all related functions into util/perf_regs.c. Unfortunately, since util/perf_regs.c needs to include every arch's perf_regs.h, this easily introduces duplicated definitions coming from multiple headers, finally it's fragile for building and difficult for maintenance. We cannot simply move these register parsing functions into the corresponding 'arch' folder, the folder is only conditionally built based on the target architecture. Therefore, this commit creates a new folder util/perf-regs-arch/ and uses a dedicated source file to keep every architecture's register parsing function to avoid definition conflicts. This is only a refactoring, no functionality change is expected. Committer notes: Had to add util/perf-regs-arch/*.c to tools/perf/util/python-ext-sources to keep 'perf test python' passing. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a4b6452af7 |
perf cs-etm: Don't duplicate FIELD_GET()
linux/bitfield.h can be included as long as linux/kernel.h is included first, so change the order of the includes and drop the duplicate macro. Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230811144017.491628-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
82b0a10390 |
perf dlfilter: Add al_cleanup()
Add perf_dlfilter_fns.al_cleanup() to do addr_location__exit() on data
passed via perf_dlfilter_fns.resolve_address().
Add dlfilter-test-api-v2 to the "dlfilter C API" test to test it.
Update documentation, clarifying that data returned by APIs should not
be dereferenced after filter_event() and filter_event_early() return.
Fixes:
|
||
![]() |
42c6dd9d23 |
perf dlfilter: Initialize addr_location before passing it to thread__find_symbol_fb()
As thread__find_symbol_fb() will end up calling thread__find_map() and
it in turn will call these on uninitialized memory:
maps__zput(al->maps);
map__zput(al->map);
thread__zput(al->thread);
Fixes:
|
||
![]() |
d095ad45e2 |
perf evsel: Remove duplicate check for field in evsel__intval()
The `file` parameter in evsel__intval() is checked repeatedly, fix it. No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230815221009.3641751-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
dc7f01f1bc |
perf bpf-filter: Fix sample flag check with ||
For logical OR operator, the actual sample_flags are in the 'groups'
list so it needs to check entries in the list instead. Otherwise it
would show the following error message.
$ sudo perf record -a -e cycles:p --filter 'period > 100 || weight > 0' sleep 1
Error: cycles:p event does not have sample flags 0
failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Actually it should warn on 'weight' is used without WEIGHT flag.
Error: cycles:p event does not have PERF_SAMPLE_WEIGHT
Hint: please add -W option to perf record
failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Fixes:
|
||
![]() |
cd2cece61a |
perf trace: Tidy comments related to BPF + syscall augmentation
Now tools/perf/examples/bpf/augmented_syscalls.c is tools/perf/util/bpf_skel/augmented_syscalls.bpf.c and not enabled as a BPF event, tidy the comments to reflect this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230810184853.2860737-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5e6da6be30 |
perf trace: Migrate BPF augmentation to use a skeleton
Previously a BPF event of augmented_raw_syscalls.c could be used to enable augmentation of syscalls by perf trace. As BPF events are no longer supported, switch to using a BPF skeleton which when attached explicitly opens the sysenter and sysexit tracepoints. The dump map is removed as debugging wasn't supported by the augmentation and bpf_printk can be used when necessary. Remove tools/perf/examples/bpf/augmented_raw_syscalls.c so that the rename/migration to a BPF skeleton captures that this was the source. Committer notes: Some minor stylistic changes to help visualizing the diff. Use libbpf_strerror when failing to load the augmented raw syscalls BPF. Use bpf_object__for_each_program(prog, trace.skel->obj) to disable auto attachment for all but the sys_enter, sys_exit tracepoints, to avoid having to add extra lines as we go adding support for more pointer receiving syscalls. Committer testing: # perf trace -e open* --max-events=10 0.000 ( 0.022 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11 208.833 ( ): gnome-terminal/3223 openat(dfd: CWD, filename: "/proc/51250/cmdline") ... 249.993 ( 0.024 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11 250.118 ( 0.030 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11 250.205 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.current", flags: RDONLY|CLOEXEC) = 11 250.244 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.min", flags: RDONLY|CLOEXEC) = 11 250.282 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low", flags: RDONLY|CLOEXEC) = 11 250.320 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.swap.current", flags: RDONLY|CLOEXEC) = 11 250.355 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.stat", flags: RDONLY|CLOEXEC) = 11 250.717 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11 # # perf trace -e *nanosleep* --max-events=10 ? ( ): SCTP timer/28304 ... [continued]: clock_nanosleep()) = 0 0.007 (10.058 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 10.069 ( ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ... 10.069 (10.056 ms): SCTP timer/28304 ... [continued]: clock_nanosleep()) = 0 17.059 ( ): podman/3572 nanosleep(rqtp: 0x7fc4f4d75be0) ... 17.059 (10.061 ms): podman/3572 ... [continued]: nanosleep()) = 0 20.131 (10.059 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 30.195 (10.038 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 40.238 (10.057 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 50.301 ( ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ... # # perf trace -e perf_event* -- perf stat -e instructions,cycles,cache-misses sleep 0.1 0.000 ( 0.011 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 0.013 ( 0.003 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 0.017 ( 0.002 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x3 (PERF_COUNT_HW_CACHE_MISSES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 Performance counter stats for 'sleep 0.1': 1,495,051 instructions # 1.11 insn per cycle 1,347,641 cycles 35,424 cache-misses 0.100935279 seconds time elapsed 0.000924000 seconds user 0.000000000 seconds sys # # perf trace -e connect* ssh localhost 0.000 ( 0.012 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.118 ( 0.004 ms): ssh/51346 connect(fd: 6, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.399 ( 0.007 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.426 ( 0.003 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.754 ( 0.009 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET, port: 22, addr: 127.0.0.1 }, addrlen: 16) = 0 0.771 ( 0.010 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0 0.798 ( 0.053 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0 0.870 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.904 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.930 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.957 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.981 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 1.006 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 1.036 ( 0.005 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 65.077 ( 0.022 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0 66.608 ( 0.014 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0 root@localhost's password: # # perf trace -e sendto* ping -c 2 localhost PING localhost(localhost (::1)) 56 data bytes 64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.024 ms 0.000 ( 0.011 ms): ping/51357 sendto(fd: 5, buff: 0x7ffcca35e620, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20 0.135 ( 0.026 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64 1014.929 ( 0.050 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, flags: CONFIRM, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64 64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.046 ms --- localhost ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1015ms rtt min/avg/max/mdev = 0.024/0.035/0.046/0.011 ms # Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230810184853.2860737-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3d6dfae889 |
perf parse-events: Remove BPF event support
New features like the BPF --filter support in perf record have made the
BPF event functionality somewhat redundant. As shown by commit
fcb027c1a4f6 ("perf tools: Revert enable indices setting syntax for BPF
map") and commit
|
||
![]() |
56b11a2126 |
perf bpf: Remove support for embedding clang for compiling BPF events (-e foo.c)
This never was in the default build for perf, is difficult to maintain as it uses clang/llvm internals so ditch it, keeping, for now, the external compilation of .c BPF into .o bytecode and its subsequent loading, that is also going to be removed, do it separately to help bisection and to properly document what is being removed and why. Committer notes: Extracted from a larger patch and removed some leftovers, namely deleting these now unused feature tests: tools/build/feature/test-clang.cpp tools/build/feature/test-cxx.cpp tools/build/feature/test-llvm-version.cpp tools/build/feature/test-llvm.cpp Testing the use of BPF events after applying this patch: To use the external clang/llvm toolchain to compile a .c event and then use libbpf to load it, to get the syscalls:sys_enter_open* tracepoints and read the filename pointer, putting it into the ring buffer right after the usual tracepoint payload for 'perf trace' to then print it: [root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.c,open* --max-events=10 0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.063 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.082 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 250.124 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 250.521 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.pressure", flags: RDONLY|CLOEXEC) = 12 251.047 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.current", flags: RDONLY|CLOEXEC) = 12 251.162 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.min", flags: RDONLY|CLOEXEC) = 12 251.242 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.low", flags: RDONLY|CLOEXEC) = 12 251.353 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.swap.current", flags: RDONLY|CLOEXEC) = 12 [root@quaco ~]# Same thing, but with a prebuilt .o BPF bytecode: [root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.o,open* --max-events=10 0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.083 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.062 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 249.985 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 466.763 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/energy_uj") = 13 467.145 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj") = 13 467.311 thermald/1234 openat(dfd: CWD, filename: "/sys/class/thermal/thermal_zone2/temp") = 13 500.040 cgroupify/24006 openat(dfd: 4, filename: ".", flags: RDONLY|CLOEXEC|DIRECTORY|NONBLOCK) = 5 500.295 cgroupify/24006 openat(dfd: 4, filename: "24616/cgroup.procs") = 5 [root@quaco ~]# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/lkml/ZNZWsAXg2px1sm2h@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
833fd800bf |
x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT
The kprobes optimization check can_optimize() calls
insn_is_indirect_jump() to detect indirect jump instructions in
a target function. If any is found, creating an optprobe is disallowed
in the function because the jump could be from a jump table and could
potentially land in the middle of the target optprobe.
With retpolines, insn_is_indirect_jump() additionally looks for calls to
indirect thunks which the compiler potentially used to replace original
jumps. This extra check is however unnecessary because jump tables are
disabled when the kernel is built with retpolines. The same is currently
the case with IBT.
Based on this observation, remove the logic to look for calls to
indirect thunks and skip the check for indirect jumps altogether if the
kernel is built with retpolines or IBT. Remove subsequently the symbols
__indirect_thunk_start and __indirect_thunk_end which are no longer
needed.
Dropping this logic indirectly fixes a problem where the range
[__indirect_thunk_start, __indirect_thunk_end] wrongly included also the
return thunk. It caused that machines which used the return thunk as
a mitigation and didn't have it patched by any alternative ended up not
being able to use optprobes in any regular function.
Fixes:
|
||
![]() |
33d9c50621 |
perf script python: Add stub for PMU symbol to the python binding
Fix missing symbol seen in: ``` 19: 'import perf' in python : --- start --- test child forked, pid 2640936 python usage test: "echo "import sys ; sys.path.insert(0, 'python'); import perf" | '/usr/bin/python3' " Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: tools/perf/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: perf_pmus__supports_extended_type test child finished with -1 ---- end ---- 'import perf' in python: FAILED! ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230810180944.2794188-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e59fea47f8 |
perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols
Test "object code reading" fails sometimes for kernel address as below: Reading object code for memory address: 0xc000000000004c3c File is: [kernel.kallsyms] On file address is: 0x14c3c dso__data_read_offset failed test child finished with -1 ---- end ---- Object code reading: FAILED! Here dso__data_read_offset() fails for symbol address 0xc000000000004c3c. This is because the DSO long_name here is "[kernel.kallsyms]" and hence open_dso() fails to open this file. There is an incorrect DSO to map handling here. The key points here are: - The DSO long_name is set to "[kernel.kallsyms]". This file is not present and hence returns error - The DSO binary type is set to DSO_BINARY_TYPE__NOT_FOUND - The DSO adjust_symbols member is set to zero In the end dso__data_read_offset() returns -1 and the address 0x14c3c can not be resolved. Hence the test fails. But the address actually maps to the kernel DSO # objdump -z -d --start-address=0xc000000000004c3c --stop-address=0xc000000000004cbc /home/athira/linux/vmlinux /home/athira/linux/vmlinux: file format elf64-powerpcle Disassembly of section .head.text: c000000000004c3c <exc_virt_0x4c00_system_call+0x3c>: c000000000004c3c: a6 02 9b 7d mfsrr1 r12 c000000000004c40: 78 13 42 7c mr r2,r2 c000000000004c44: 18 00 4d e9 ld r10,24(r13) c000000000004c48: 60 c6 4a 61 ori r10,r10,50784 c000000000004c4c: a6 03 49 7d mtctr r10 Fix dso__process_kernel_symbol() to set the binary_type and adjust_symbols members. dso->adjust_symbols is used by map__rip_2objdump() which converts the symbol start address to the objdump address. Also set dso->long_name in dso__load_vmlinux(). Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230811051546.70039-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
878460e8d0 |
perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0
clang < 13.0.0 doesn't grok -Wno-unused-but-set-variable, so just remove
it to avoid:
error: unknown warning option '-Wno-unused-but-set-variable'; did you mean '-Wno-unused-const-variable'? [-Werror,-Wunknown-warning-option]
make[4]: *** [/git/perf-6.5.0-rc4/tools/build/Makefile.build:128: /tmp/build/perf/util/pmu-flex.o] Error 1
make[4]: *** Waiting for unfinished jobs....
Fixes:
|
||
![]() |
55b2905019 |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up some more fixes that went upstream via the perf-tools fixes branch. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
487ae3b42d |
perf stat: Don't display zero tool counts
Andi reported (see link below) a regression when printing the 'duration_time' tool event, where it gets printed as "not counted" for most of the CPUs, fix it by skipping zero counts for tool events. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Claire Jensen <cjense@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/ZMlrzcVrVi1lTDmn@tassilo/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c0b067588a |
Revert "perf report: Append inlines to non-DWARF callchains"
This reverts commit
|
||
![]() |
aeb50d3f2c |
perf probe: Make synthesize_perf_probe_point() private to probe-event.c
Not used in any other place, so just make it static. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZM0pjfOe6R4X%2Fcql@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a612bbf8b8 |
perf probe: Free string returned by synthesize_perf_probe_point() on failure in synthesize_perf_probe_command()
Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak was detected elsewhere and lead to an audit, where we found that synthesize_perf_probe_command() may leak synthesize_perf_probe_point() return on failure, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZM0mzpQktHnhXJXr@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7bc0153c53 |
perf probe: Free string returned by synthesize_perf_probe_point() on failure to add a probe
Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak is detect when trying to add a probe to a non-existent function: # perf probe -x ~/bin/perf dso__neW Probe point 'dso__neW' not found. Error: Failed to add events. ================================================================= ==296634==ERROR: LeakSanitizer: detected memory leaks Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7f67642ba097 in calloc (/lib64/libasan.so.8+0xba097) #1 0x7f67641a76f1 in allocate_cfi (/lib64/libdw.so.1+0x3f6f1) Direct leak of 65 byte(s) in 1 object(s) allocated from: #0 0x7f67642b95b5 in __interceptor_realloc.part.0 (/lib64/libasan.so.8+0xb95b5) #1 0x6cac75 in strbuf_grow util/strbuf.c:64 #2 0x6ca934 in strbuf_init util/strbuf.c:25 #3 0x9337d2 in synthesize_perf_probe_point util/probe-event.c:2018 #4 0x92be51 in try_to_find_probe_trace_events util/probe-event.c:964 #5 0x93d5c6 in convert_to_probe_trace_events util/probe-event.c:3512 #6 0x93d6d5 in convert_perf_probe_events util/probe-event.c:3529 #7 0x56f37f in perf_add_probe_events /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:354 #8 0x572fbc in __cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:738 #9 0x5730f2 in cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:766 #10 0x635d81 in run_builtin /var/home/acme/git/perf-tools-next/tools/perf/perf.c:323 #11 0x6362c1 in handle_internal_command /var/home/acme/git/perf-tools-next/tools/perf/perf.c:377 #12 0x63667a in run_argv /var/home/acme/git/perf-tools-next/tools/perf/perf.c:421 #13 0x636b8d in main /var/home/acme/git/perf-tools-next/tools/perf/perf.c:537 #14 0x7f676302950f in __libc_start_call_main (/lib64/libc.so.6+0x2950f) SUMMARY: AddressSanitizer: 193 byte(s) leaked in 2 allocation(s). # synthesize_perf_probe_point() returns a "detachec" strbuf, i.e. a malloc'ed string that needs to be free'd. An audit will be performed to find other such cases. Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZM0l1Oxamr4SVjfY@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
bf1842996a |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up the fixes that were just merged from perf-tools/perf-tools for v6.5. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
979e9c9fc9 |
perf annotate bpf: Don't enclose non-debug code with an assert()
In |
||
![]() |
c43888e739 |
perf script python: Cope with declarations after statements found in Python.h
With -Werror the build was failing on fedora rawhide: [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC) [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ In file included from /usr/include/python3.12/Python.h:44, from scripts/python/Perf-Trace-Util/Context.c:14: /usr/include/python3.12/object.h: In function 'Py_SIZE': /usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 217 | PyVarObject *var_ob = _PyVarObject_CAST(ob); | ^~~~~~~~~~~ In file included from /usr/include/python3.12/Python.h:53: /usr/include/python3.12/cpython/longintrepr.h: In function '_PyLong_CompactValue': /usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK); | ^~~~~~~~~~ <SNIP> In file included from /usr/include/python3.12/Python.h:44, from util/scripting-engines/trace-event-python.c:22: /usr/include/python3.12/object.h: In function 'Py_SIZE': /usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 217 | PyVarObject *var_ob = _PyVarObject_CAST(ob); | ^~~~~~~~~~~ CC /tmp/build/perf/util/units.o CC /tmp/build/perf/util/time-utils.o In file included from /usr/include/python3.12/Python.h:53: /usr/include/python3.12/cpython/longintrepr.h: In function '_PyLong_CompactValue': /usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK); | ^~~~~~~~~~ So add -Wno-declaration-after-statement to the python scripting CFLAGS. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZMpdKeO8gU%2FcWDqH@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a7789d3f2e |
perf python: Cope with declarations after statements found in Python.h
With -Werror the build was failing on fedora rawhide: [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC) [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ In file included from /usr/include/python3.12/Python.h:44, from /git/perf-6.5.0-rc2/tools/perf/util/python.c:2: /usr/include/python3.12/object.h: In function ‘Py_SIZE’: /usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 217 | PyVarObject *var_ob = _PyVarObject_CAST(ob); | ^~~~~~~~~~~ LD /tmp/build/perf/arch/perf-in.o In file included from /usr/include/python3.12/Python.h:53: /usr/include/python3.12/cpython/longintrepr.h: In function ‘_PyLong_CompactValue’: /usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK); | ^~~~~~~~~~ So add -Wno-declaration-after-statement to the python binding CFLAGS. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZMpcTMvnQns81YWA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e8ca4f0f8c |
perf probe: Show correct error message about @symbol usage for uprobe
Since @symbol variable access is not supported by uprobe event, it must be correctly warn user instead of kernel version update. Committer testing: With/without the patch: [root@quaco ~]# perf probe -x ~/bin/perf -L sigtrap_handler <sigtrap_handler@/home/acme/git/perf-tools-next/tools/perf/tests/sigtrap.c:0> 0 sigtrap_handler(int signum __maybe_unused, siginfo_t *info, void *ucontext __maybe_unused) 1 { 2 if (!__atomic_fetch_add(&ctx.signal_count, 1, __ATOMIC_RELAXED)) 3 ctx.first_siginfo = *info; 4 __atomic_fetch_sub(&ctx.tids_want_signal, syscall(SYS_gettid), __ATOMIC_RELAXED); 5 } static void *test_thread(void *arg) { [root@quaco ~]# perf probe -x ~/bin/perf sigtrap_handler:4 "ctx.signal_count" Without the patch: [root@quaco ~]# perf probe -x ~/bin/perf sigtrap_handler:4 "ctx.signal_count" Failed to write event: Invalid argument Please upgrade your kernel to at least 3.14 to have access to feature @ctx Error: Failed to add events. [root@quaco ~]# With the patch: [root@quaco ~]# Failed to write event: Invalid argument @ctx accesses a variable by symbol name, but that is not supported for user application probe. Error: Failed to add events. [root@quaco ~]# Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Closes: https://lore.kernel.org/all/ZLWDEjvFjrrEJODp@kernel.org/ Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/169055397023.67089.12693645664676964310.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c9b57eb8dc |
perf parse-events: Remove array remnants
parse_events_array was set up by event term parsing, which no longer exists. Remove this struct and references to it. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230728001212.457900-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
30f4ade33d |
perf tools: Revert enable indices setting syntax for BPF map
This reverts commit
|
||
![]() |
c76a1444c0 |
perf parse-event: Avoid BPF test SEGV
loc is passed as NULL in tools/perf/tests/bpf.c do_test, meaning errors trigger a SEGV when trying to access. Add the missing NULL check. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230728001212.457900-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c7e97f215a |
perf build: Include generated header files properly
The flex and bison generate header files from the source. When user
specified a build directory with O= option, it'd generate files under
the directory. The build command has -I option to specify the header
include directory.
But the -I option only affects the files included like <...>. Let's
change the flex and bison headers to use it instead of "...".
Fixes:
|
||
![]() |
f776b0435e |
perf build: Remove -Wno-redundant-decls in 2 cases
Properly fix a warning and remove the -Wno-redundant-decls C flag. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
ddc8e4c966 |
perf build: Disable fewer bison warnings
If bison is version 3.8.2, reduce the number of bison C warnings disabled. Earlier bison versions have all C warnings disabled. Avoid implicit declarations of yylex by adding the declaration in the C file. A header can't be included as a circular dependency would occur due to the lexer using the bison defined tokens. Committer notes: Some recent versions of gcc and clang (noticed on Alpine Linux 3.17, edge, clearlinux, fedora 37, etc. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
10c775afa5 |
perf build: Disable fewer flex warnings
If flex is version 2.6.4, reduce the number of flex C warnings disabled. Earlier flex versions have all C warnings disabled. Committer notes: Added this to the list of ignored warnings to get it building on a Fedora 36 machine with flex 2.6.4: -Wno-misleading-indentation Noticed when building with: $ make LLVM=1 -C tools/perf NO_BPF_SKEL=1 DEBUG=1 Take two: We can't just try to canonicalize flex versions by just removing the dots, as we end up with: 2.6.4 >= 2.5.37 becoming: 264 >= 2537 Failing the build on flex 2.5.37, so instead use the back to the past added $(call version_ge3,$(FLEX_VERSION),2.6.4) variant to check for that. Making sure $(FLEX_VERSION) keeps the dots as we may want to use 'sort -V' or something nicer when available everywhere. Some other tweaks for other flex versions and combinations with gcc and clang versions were added, notes on the patch. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
07d2b820fd |
perf test parse-events: Test complex name has required event format
test__checkevent_complex_name will use an "event" format which if not
present, such as with a placeholder PMU, will cause test failures. Skip
the test in this case to avoid failures in restricted environments.
Add perf_pmu__has_format utility as a general PMU utility.
Fixes:
|
||
![]() |
34bc65d6d8 |
perf pmus: Create placholder regardless of scanning core_only
If scanning all PMUs the placeholder is still necessary if no core PMU
is found. This situation occurs in perf test's parse-events test,
when uncore events appear before core.
Fixes:
|
||
![]() |
e5764ae4c9 |
perf build: Add Wextra for C++ compilation
Commit
|
||
![]() |
1134f290d0 |
perf bpf-loader: Remove unneeded diagnostic pragma
Added during the progress to libbpf 1.0 the deprecated functions are no longer used and so the pragma can be removed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
69a87a32f5 |
perf machine: Include data symbols in the kernel map
When 'perf record -d' is used, it needs data mmaps to symbolize global data. But it missed to collect kernel data maps so it cannot symbolize them. Instead of having a separate map, just increase the kernel map size to include the data section. Probably we can have a separate kernel map for data, but the current code assumes a single kernel map. So it'd require more changes in other places and looks error-prone. I decided not to go that way for now. Also it seems the kernel module size already includes the data section. For example, my system has the following. $ grep -e _stext -e _etext -e _edata /proc/kallsyms ffffffff99800000 T _stext ffffffff9a601ac8 T _etext ffffffff9b446a00 D _edata Size of the text section is (0x9a601ac8 - 0x99800000 = 0xe01ac8) and size including data section is (0x9b446a00 - 0x99800000 = 0x1c46a00). Before: $ perf record -d true $ perf report -D | grep MMAP | head -1 0 0 0x460 [0x60]: PERF_RECORD_MMAP -1/0: [0xffffffff99800000(0xe01ac8) @ 0xffffffff99800000]: x [kernel.kallsyms]_text ^^^^^^^^ here After: $ perf report -D | grep MMAP | head -1 0 0 0x460 [0x60]: PERF_RECORD_MMAP -1/0: [0xffffffff99800000(0x1c46a00) @ 0xffffffff99800000]: x [kernel.kallsyms]_text ^^^^^^^^^ Instead of just replacing it to _edata, try _edata first and then fall back to _etext just in case. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230725001929.368041-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f9dd531c5b |
perf symbols: Add kallsyms__get_symbol_start()
The kallsyms__get_symbol_start() to get any symbol address from kallsyms. The existing kallsyms__get_function_start() only allows text symbols so create this to allow data symbols too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230725001929.368041-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
4c11adff67 |
perf parse-events: Remove ABORT_ON
Prefer informative messages rather than none with ABORT_ON. Document one failure mode and add an error message for another. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
81a4e31f8c |
perf parse-events: Improve location for add pmu
Improve the location for add PMU for cases when PMUs aren't found. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d81fa63b09 |
perf parse-events: Populate error column for BPF/tracepoint events
Follow convention from parse_events_terms__num/str and pass the YYLTYPE for the location. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b30d4f0b69 |
perf parse-events: Additional error reporting
When no events or PMUs match report an error for event_pmu:
Before:
```
$ perf stat -e 'asdfasdf' -a sleep 1
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
After:
```
$ perf stat -e 'asdfasdf' -a sleep 1
event syntax error: 'asdfasdf'
\___ Bad event name
Unabled to find PMU or event on a PMU of 'asdfasdf'
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Fixes the inadvertent removal when hybrid parsing was modified.
Fixes:
|
||
![]() |
b52cb995f1 |
perf parse-events: Separate ENOMEM memory handling
Add PE_ABORT that will YYNOMEM or YYABORT accordingly. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
77cdd787fc |
perf parse-events: Move instances of YYABORT to YYNOMEM
Migration to improve error reporting as YYABORT cases should carry event parsing errors. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
a7a3252dad |
perf parse-events: Separate YYABORT and YYNOMEM cases
Split cases in event_pmu for greater accuracy. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
9462e4de62 |
perf parse-event: Add memory allocation test for name terms
If the name memory allocation fails then propagate to the parser. Committer notes: Use $(BISON_FALLBACK_FLAGS) on the bison call so that we continue building with older bison versions, before 3.81, where YYNOMEM isn't present. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
88cc47e245 |
perf build: Define YYNOMEM as YYNOABORT for bison < 3.81
YYNOMEM was introduced in bison 3.81, so define it as YYABORT for older versions, which should provide the previous perf behaviour. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b161f25fa3 |
perf parse-events: Only move force grouped evsels when sorting
Prior to this change, events without a group would be sorted as if they
were from the location of the first event without a group. For example
instructions and cycles are without a group:
instructions,{imc_free_running/data_read/,imc_free_running/data_write/},cycles
parse events would create an eventual evlist like:
instructions,cycles,{uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/,uncore_imc_free_running_0/data_write/,uncore_imc_free_running_1/data_write/}
This is done so that perf metric events, that must always be in a
group, will be adjacent and so can be forced into a group.
This change modifies the sorting so that only force grouped events,
like perf metrics, are sorted and all other events keep their position
with respect to groups in the evlist. The location of the force
grouped event is chosen to match the first force grouped event.
For architectures without force grouped events, ie anything not Intel
Icelake or newer, this should mean sorting and fixing doesn't modify
the event positions except when fixing the grouping for PMUs of things
like uncore events.
Fixes:
|
||
![]() |
e8d38345da |
perf parse-events: When fixing group leaders always set the leader
The evsel grouping fix iterates over evsels tracking the leader group
and the current position's group, updating the current position's leader
if an evsel is being forced into a group or groups changed. However,
groups changing isn't a sufficient condition as sorting may have
reordered events and the leader may no longer come first. For this
reason update all leaders whenever they disagree.
This change breaks certain Icelake+ metrics due to bugs in the
kernel. For example, tma_l3_bound with threshold enabled tries to
program the events:
{topdown-retiring,slots,CYCLE_ACTIVITY.STALLS_L2_MISS,topdown-fe-bound,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.1_PORTS_UTIL,topdown-be-bound,cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/,CYCLE_ACTIVITY.STALLS_L3_MISS,CPU_CLK_UNHALTED.THREAD,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.2_PORTS_UTIL,CYCLE_ACTIVITY.STALLS_TOTAL,topdown-bad-spec}:W
fixing the perf metric event order gives:
{slots,topdown-retiring,topdown-fe-bound,topdown-be-bound,topdown-bad-spec,CYCLE_ACTIVITY.STALLS_L2_MISS,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.1_PORTS_UTIL,cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/,CYCLE_ACTIVITY.STALLS_L3_MISS,CPU_CLK_UNHALTED.THREAD,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.2_PORTS_UTIL,CYCLE_ACTIVITY.STALLS_TOTAL}:W
Both of these return "<not counted>" for all events, whilst they work
with the group removed respecting that the perf metric events must still
be grouped. A vendor events update will need to add METRIC_NO_GROUP to
these metrics to workaround the kernel PMU driver issue.
Fixes:
|
||
![]() |
5c49b6c3f2 |
perf parse-events: Extra care around force grouped events
Perf metric (topdown) events on Intel Icelake+ machines require a
group, however, they may be next to events that don't require a group.
Consider:
cycles,slots,topdown-fe-bound
The cycles event needn't be grouped but slots and topdown-fe-bound need
grouping.
Prior to this change, as slots and topdown-fe-bound need a group forcing
and all events share the same PMU, slots and topdown-fe-bound would be
forced into a group with cycles.
This is a bug on two fronts, cycles wasn't supposed to be grouped and
cycles can't be a group leader with a perf metric event.
This change adds recognition that cycles isn't force grouped and so it
shouldn't be force grouped with slots and topdown-fe-bound.
Fixes:
|
||
![]() |
93d7e9c8fb |
perf parse-events: Avoid regrouped warning for wild card events
There is logic to avoid printing the regrouping warning for wild card PMUs, this logic also needs to apply for wild card events. Before: ``` $ perf stat -e '{data_read,data_write}' -a sleep 1 WARNING: events were regrouped to match PMUs Performance counter stats for 'system wide': 2,979.16 MiB data_read 410.26 MiB data_write 1.001541923 seconds time elapsed ``` After: ``` $ perf stat -e '{data_read,data_write}' -a sleep 1 Performance counter stats for 'system wide': 2,975.94 MiB data_read 432.05 MiB data_write 1.001119499 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
22881e2b45 |
perf parse-events: Add more comments to 'struct parse_events_state'
Improve documentation of 'struct parse_events_state'. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7e34daa550 |
perf parse-events: Remove two unused tokens
The tokens PE_PREFIX_RAW and PE_PREFIX_GROUP are unused so remove them. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
bf7d46b3a0 |
perf parse-events: Remove unused PE_KERNEL_PMU_EVENT token
Removed by commit
|
||
![]() |
84efbdb7fb |
perf parse-events: Remove unused PE_PMU_EVENT_FAKE token
Removed by commit
|
||
![]() |
0f97a3a0de |
perf parse-events: Avoid use uninitialized warning
With GCC LTO a potential use uninitialized is spotted: ``` In function ‘parse_events_config_bpf’, inlined from ‘parse_events_load_bpf’ at util/parse-events.c:874:8: util/parse-events.c:792:37: error: ‘error_pos’ may be used uninitialized [-Werror=maybe-uninitialized] 792 | idx = term->err_term + error_pos; | ^ util/parse-events.c: In function ‘parse_events_load_bpf’: util/parse-events.c:765:13: note: ‘error_pos’ was declared here 765 | int error_pos; | ^ ``` So initialize at declaration. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230724201247.748146-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
91f88a0ac8 |
perf stat: Avoid uninitialized use of perf_stat_config
perf_event__read_stat_config will assign values based on number of tags and tag values. Initialize the structs to zero before they are assigned so that no uninitialized values can be seen. This potential error was reported by GCC with LTO enabled. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230724201247.748146-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7b47623b8c |
perf bench uprobe trace_printk: Add entry attaching an BPF program that does a trace_printk
[root@five ~]# perf bench uprobe all # Running uprobe/baseline benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,053,963 usecs 1,053.963 usecs/op # Running uprobe/empty benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,056,293 usecs +2,330 to baseline 1,056.293 usecs/op 2.330 usecs/op to baseline # Running uprobe/trace_printk benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,056,977 usecs +3,014 to baseline +684 to previous 1,056.977 usecs/op 3.014 usecs/op to baseline 0.684 usecs/op to previous [root@five ~]# Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andre Fredette <anfredet@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719204910.539044-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6af5e4cf3a |
perf bench uprobe empty: Add entry attaching an empty BPF program
Using libbpf and a BPF skel: # perf bench uprobe all # Running uprobe/baseline benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,055,618 usecs 1,055.618 usecs/op # Running uprobe/empty benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,057,146 usecs +1,528 to baseline 1,057.146 usecs/op # Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andre Fredette <anfredet@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719204910.539044-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
04cb4fc4d4 |
perf thread: Allow tools to register a thread->priv destructor
So that when thread__delete() runs it can be called and free stuff tools stashed into thread->priv, like 'perf trace' does and will use this new facility to plug some leaks. Added an assert(thread__priv_destructor == NULL) as suggested in Ian's review. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fV3Er=Ek8=iE=bSGbEBmM56_PJffMWot1g_5Bh8B5hO7A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3f6a74bd62 |
perf evsel: Free evsel->filter on the destructor
Noticed with: make EXTRA_CFLAGS="-fsanitize=address" BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools-next -C tools/perf install-bin Direct leak of 45 byte(s) in 1 object(s) allocated from: #0 0x7f213f87243b in strdup (/lib64/libasan.so.8+0x7243b) #1 0x63d15f in evsel__set_filter util/evsel.c:1371 #2 0x63d15f in evsel__append_filter util/evsel.c:1387 #3 0x63d15f in evsel__append_tp_filter util/evsel.c:1400 #4 0x62cd52 in evlist__append_tp_filter util/evlist.c:1145 #5 0x62cd52 in evlist__append_tp_filter_pids util/evlist.c:1196 #6 0x541e49 in trace__set_filter_loop_pids /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3646 #7 0x541e49 in trace__set_filter_pids /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3670 #8 0x541e49 in trace__run /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3970 #9 0x541e49 in cmd_trace /home/acme/git/perf-tools/tools/perf/builtin-trace.c:5141 #10 0x5ef1a2 in run_builtin /home/acme/git/perf-tools/tools/perf/perf.c:323 #11 0x4196da in handle_internal_command /home/acme/git/perf-tools/tools/perf/perf.c:377 #12 0x4196da in run_argv /home/acme/git/perf-tools/tools/perf/perf.c:421 #13 0x4196da in main /home/acme/git/perf-tools/tools/perf/perf.c:537 #14 0x7f213e84a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) Free it on evsel__exit(). Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719202951.534582-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5b10c18d1b |
perf parse-events: Avoid SEGV if PMU lookup fails for legacy cache terms
libfuzzer found the following command could SEGV:
$ perf stat -e cpu/L2,L2/ true
This is because the L2 term rewrites the perf_event_attr type to
PERF_TYPE_HW_CACHE which then fails the PMU lookup for the second
legacy cache term.
The new failure is consistent with repeated hardware terms:
$ perf stat -e cpu/L2,L2/ true
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Initial error:
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
$ perf stat -e cpu/cycles,cycles/ true
event syntax error: 'cpu/cycles,cycles/'
\___ Failed to find PMU for type 0
Initial error:
event syntax error: 'cpu/cycles,cycles/'
\___ Failed to find PMU for type 0
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Committer testing:
Before:
$ perf stat -e cpu/L2,L2/ true
Segmentation fault (core dumped)
$
After:
$ perf stat -e cpu/L2,L2/ true
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Initial error:
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
$
Fixes:
|
||
![]() |
c66e1c68c1 |
perf probe: Read DWARF files from the correct CU
After switching from dwarf_decl_file() to die_get_decl_file(), it is not
possible to add probes for certain functions:
$ perf probe -x /usr/lib/systemd/systemd-logind match_unit_removed
A function DIE doesn't have decl_line. Maybe broken DWARF?
A function DIE doesn't have decl_line. Maybe broken DWARF?
Probe point 'match_unit_removed' not found.
Error: Failed to add events.
The problem is that die_get_decl_file() uses the wrong CU to search for
the file. elfutils commit e1db5cdc9f has some good explanation for this:
dwarf_decl_file uses dwarf_attr_integrate to get the DW_AT_decl_file
attribute. This means the attribute might come from a different DIE
in a different CU. If so, we need to use the CU associated with the
attribute, not the original DIE, to resolve the file name.
This patch uses the same source of information as elfutils: use attribute
DW_AT_decl_file and use this CU to search for the file.
Fixes:
|
||
![]() |
c206353dfd |
perf tools changes and fixes for v6.5: 2nd batch
Build: - Allow to generate vmlinux.h from BTF using `make GEN_VMLINUX_H=1` and skip if the vmlinux has no BTF. - Replace deprecated clang -target xxx option by --target=xxx. perf record: - Print event attributes with well known type and config symbols in the debug output like below: # perf record -e cycles,cpu-clock -C0 -vv true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 - Update AMD IBS event error message since it now support per-process profiling but no priviledge filters. $ sudo perf record -e ibs_op//k -C 0 Error: AMD IBS doesn't support privilege filtering. Try again without the privilege modifiers (like 'k') at the end. perf lock contention: - Support CSV style output using -x option $ sudo perf lock con -ab -x, sleep 1 # output: contended, total wait, max wait, avg wait, type, caller 19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0 15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e 4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d 1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135 8, 67608, 27404, 8451, spinlock, __queue_work+0x174 3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff 3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248 2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad 1, 24619, 24619, 24619, spinlock, rcu_core+0xd4 - Add --output option to save the data to a file not to be interfered by other debug messages. Test: - Fix event parsing test on ARM where there's no raw PMU nor supports PERF_PMU_CAP_EXTENDED_HW_TYPE. - Update the lock contention test case for CSV output. - Fix a segfault in the daemon command test. Vendor events (JSON): - Add has_event() to check if the given event is available on system at runtime. On Intel machines, some transaction events may not be present when TSC extensions are disabled. - Update Intel event metrics. Misc: - Sort symbols by name using an external array of pointers instead of a rbtree node in the symbol. This will save 16-bytes or 24-bytes per symbol whether the sorting is actually requested or not. - Fix unwinding DWARF callstacks using libdw when --symfs option is used. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZKb4mwAKCRCMstVUGiXM g1QqAPwKZow/DhAzyN7KvzdNd+SojRGpUMl6RkVphY/9ntDqPAD+L3V5aXLTiC1L 8kUzdpRX5VMjqdR9U7TycUOi4QU40QA= =dEF1 -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next Pull more perf tools updates from Namhyung Kim: "These are remaining changes and fixes for this cycle. Build: - Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1` and skip if the vmlinux has no BTF. - Replace deprecated clang -target xxx option by --target=xxx. perf record: - Print event attributes with well known type and config symbols in the debug output like below: # perf record -e cycles,cpu-clock -C0 -vv true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 - Update AMD IBS event error message since it now support per-process profiling but no priviledge filters. $ sudo perf record -e ibs_op//k -C 0 Error: AMD IBS doesn't support privilege filtering. Try again without the privilege modifiers (like 'k') at the end. perf lock contention: - Support CSV style output using -x option $ sudo perf lock con -ab -x, sleep 1 # output: contended, total wait, max wait, avg wait, type, caller 19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0 15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e 4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d 1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135 8, 67608, 27404, 8451, spinlock, __queue_work+0x174 3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff 3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248 2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad 1, 24619, 24619, 24619, spinlock, rcu_core+0xd4 - Add --output option to save the data to a file not to be interfered by other debug messages. Test: - Fix event parsing test on ARM where there's no raw PMU nor supports PERF_PMU_CAP_EXTENDED_HW_TYPE. - Update the lock contention test case for CSV output. - Fix a segfault in the daemon command test. Vendor events (JSON): - Add has_event() to check if the given event is available on system at runtime. On Intel machines, some transaction events may not be present when TSC extensions are disabled. - Update Intel event metrics. Misc: - Sort symbols by name using an external array of pointers instead of a rbtree node in the symbol. This will save 16-bytes or 24-bytes per symbol whether the sorting is actually requested or not. - Fix unwinding DWARF callstacks using libdw when --symfs option is used" * tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits) perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported. perf test: Fix event parsing test on Arm perf evsel amd: Fix IBS error message perf: unwind: Fix symfs with libdw perf symbol: Fix uninitialized return value in symbols__find_by_name() perf test: Test perf lock contention CSV output perf lock contention: Add --output option perf lock contention: Add -x option for CSV style output perf lock: Remove stale comments perf vendor events intel: Update tigerlake to 1.13 perf vendor events intel: Update skylakex to 1.31 perf vendor events intel: Update skylake to 57 perf vendor events intel: Update sapphirerapids to 1.14 perf vendor events intel: Update icelakex to 1.21 perf vendor events intel: Update icelake to 1.19 perf vendor events intel: Update cascadelakex to 1.19 perf vendor events intel: Update meteorlake to 1.03 perf vendor events intel: Add rocketlake events/metrics perf vendor metrics intel: Make transaction metrics conditional perf jevents: Support for has_event function ... |
||
![]() |
b2ad9549bf |
perf evsel amd: Fix IBS error message
AMD IBS can do per-process profiling[1] and is no longer restricted to per-cpu or systemwide only. Remove stale error message. Also, checking just exclude_kernel is not sufficient since IBS does not support any privilege filters. So include all exclude_* checks. And finally, move these checks under tools/perf/arch/x86/ from generic code. Before: $ sudo ./perf record -e ibs_op//k -C 0 Error: AMD IBS may only be available in system-wide/per-cpu mode. Try using -a, or -C and workload affinity After: $ sudo ./perf record -e ibs_op//k -C 0 Error: AMD IBS doesn't support privilege filtering. Try again without the privilege modifiers (like 'k') at the end. [1] https://git.kernel.org/torvalds/c/30093056f7b2 Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: ananth.narayan@amd.com Cc: sandipan.das@amd.com Cc: santosh.shukla@amd.com Cc: irogers@google.com Cc: peterz@infradead.org Cc: adrian.hunter@intel.com Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lore.kernel.org/r/20230630085230.437-1-ravi.bangoria@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
5f06267b6e |
perf: unwind: Fix symfs with libdw
Pass the full path including the symfs (if any) to libdw. Without this unwinding fails with errors like this when a symfs is used: unwind: failed with 'No such file or directory'" Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kernel@axis.com Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230630-perf-libdw-symfs-v2-1-469760dd4d5b@axis.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
78a175c462 |
perf symbol: Fix uninitialized return value in symbols__find_by_name()
found_idx and s aren't initialized, so if no symbol is found then the
assert at the end will index off the end of the array causing a
segfault. The function also doesn't return NULL when the symbol isn't
found even if the assert passes. Fix it by initializing the values and
only setting them when something is found.
Fixes the following test failure:
$ perf test 1
1: vmlinux symtab matches kallsyms : FAILED!
Fixes:
|
||
![]() |
b30d7a77c5 |
perf tools changes and fixes for v6.5: 1st batch
Internal cleanup: - Refactor PMU data management to handle hybrid systems in a generic way. Do more work in the lexer so that legacy event types parse more easily. A side-effect of this is that if a PMU is specified, scanning sysfs is avoided improving start-up time. - Fix hybrid metrics, for example, the TopdownL1 works for both performance and efficiency cores on Intel machines. To support this, sort and regroup events after parsing. - Add reference count checking for the 'thread' data structure. - Lots of fixes for memory leaks in various places thanks to the ASAN and Ian's refcount checker. - Reduce the binary size by replacing static variables with local or dynamically allocated memory. - Introduce shared_mutex for annotate data to reduce memory footprint. - Make filesystem access library functions more thread safe. Test: - Organize cpu_map tests into a single suite. - Add metric value validation test to check if the values are within correct value ranges. - Add perf stat stdio output test to check if event and metric names match. - Add perf data converter JSON output test. - Fix a lot of issues reported by shellcheck(1). This is a preparation to enable shellcheck by default. - Make the large x86 new instructions test optional at build time using EXTRA_TESTS=1. - Add a test for libpfm4 events. perf script: - Add 'dsoff' outpuf field to display offset from the DSO. $ perf script -F comm,pid,event,ip,dsoff ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5) ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms]) ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms]) ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms]) ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0) ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms]) - Adjust width for large PID/TID values. perf report: - Robustify reading addr2line output for srcline by checking sentinel output before the actual data and by using timeout of 1 second. - Allow config terms (like 'name=ABC') with breakpoint events. $ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1 perf annotate: - Handle x86 instruction suffix like 'l' in 'movl' generally. - Parse instruction operands properly even with a whitespace. This is needed for llvm-objdump output. - Support RISC-V binutils lookup using the triplet prefixes. - Add '<' and '>' key to navigate to prev/next symbols in TUI. - Fix instruction association and parsing for LoongArch. perf stat: - Add --per-cache aggregation option, optionally specify a cache level like `--per-cache=L2`. $ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\ taskset -c 0-15,64-79,128-143,192-207\ perf bench sched messaging -p -t -l 100000 -g 8 # Running 'sched/messaging' benchmark: # 20 sender and receiver threads per group # 8 groups == 320 threads run Total time: 7.648 [sec] Performance counter stats for 'system wide': S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote 7.779171484 seconds time elapsed - Change default (no event/metric) formatting for default metrics so that events are hidden and the metric and group appear. Performance counter stats for 'ls /': 1.85 msec task-clock # 0.594 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 97 page-faults # 52.517 K/sec 2,187,173 cycles # 1.184 GHz 2,474,459 instructions # 1.13 insn per cycle 531,584 branches # 287.805 M/sec 13,626 branch-misses # 2.56% of all branches TopdownL1 # 23.5 % tma_backend_bound # 11.5 % tma_bad_speculation # 39.1 % tma_frontend_bound # 25.9 % tma_retiring - Allow --cputype option to have any PMU name (not just hybrid). - Fix output value not to added when it runs multiple times with -r option. perf list: - Show metricgroup description from JSON file called metricgroups.json. - Allow 'pfm' argument to list only libpfm4 events and check each event is supported before showing it. JSON vendor events: - Avoid event grouping using "NO_GROUP_EVENTS" constraints. The topdown events are correctly grouped even if no group exists. - Add "Default" metric group to print it in the default output. And use "DefaultMetricgroupName" to indicate the real metric group name. - Add AmpereOne core PMU events. Misc: - Define man page date correctly. - Track exception level properly on ARM CoreSight ETM. - Allow anonymous struct, union or enum when retrieving type names from DWARF. - Fix incorrect filename when calling `perf inject --jit`. - Handle PLT size correctly on LoongArch. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZJxT3gAKCRCMstVUGiXM g3//AQDyH3tbAVxU6JkvEOjjDvK7MWeXef7GQh8MP8D9Wkxk1AD9HgyxZWXn+mer wxzBMntnxlr9+mkBerrVwUzYMd/IJQk= =hPh8 -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next Pull perf tools updates from Namhyung Kim: "Internal cleanup: - Refactor PMU data management to handle hybrid systems in a generic way. Do more work in the lexer so that legacy event types parse more easily. A side-effect of this is that if a PMU is specified, scanning sysfs is avoided improving start-up time. - Fix hybrid metrics, for example, the TopdownL1 works for both performance and efficiency cores on Intel machines. To support this, sort and regroup events after parsing. - Add reference count checking for the 'thread' data structure. - Lots of fixes for memory leaks in various places thanks to the ASAN and Ian's refcount checker. - Reduce the binary size by replacing static variables with local or dynamically allocated memory. - Introduce shared_mutex for annotate data to reduce memory footprint. - Make filesystem access library functions more thread safe. Test: - Organize cpu_map tests into a single suite. - Add metric value validation test to check if the values are within correct value ranges. - Add perf stat stdio output test to check if event and metric names match. - Add perf data converter JSON output test. - Fix a lot of issues reported by shellcheck(1). This is a preparation to enable shellcheck by default. - Make the large x86 new instructions test optional at build time using EXTRA_TESTS=1. - Add a test for libpfm4 events. perf script: - Add 'dsoff' outpuf field to display offset from the DSO. $ perf script -F comm,pid,event,ip,dsoff ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5) ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms]) ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms]) ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms]) ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0) ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms]) - Adjust width for large PID/TID values. perf report: - Robustify reading addr2line output for srcline by checking sentinel output before the actual data and by using timeout of 1 second. - Allow config terms (like 'name=ABC') with breakpoint events. $ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1 perf annotate: - Handle x86 instruction suffix like 'l' in 'movl' generally. - Parse instruction operands properly even with a whitespace. This is needed for llvm-objdump output. - Support RISC-V binutils lookup using the triplet prefixes. - Add '<' and '>' key to navigate to prev/next symbols in TUI. - Fix instruction association and parsing for LoongArch. perf stat: - Add --per-cache aggregation option, optionally specify a cache level like `--per-cache=L2`. $ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\ taskset -c 0-15,64-79,128-143,192-207\ perf bench sched messaging -p -t -l 100000 -g 8 # Running 'sched/messaging' benchmark: # 20 sender and receiver threads per group # 8 groups == 320 threads run Total time: 7.648 [sec] Performance counter stats for 'system wide': S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote 7.779171484 seconds time elapsed - Change default (no event/metric) formatting for default metrics so that events are hidden and the metric and group appear. Performance counter stats for 'ls /': 1.85 msec task-clock # 0.594 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 97 page-faults # 52.517 K/sec 2,187,173 cycles # 1.184 GHz 2,474,459 instructions # 1.13 insn per cycle 531,584 branches # 287.805 M/sec 13,626 branch-misses # 2.56% of all branches TopdownL1 # 23.5 % tma_backend_bound # 11.5 % tma_bad_speculation # 39.1 % tma_frontend_bound # 25.9 % tma_retiring - Allow --cputype option to have any PMU name (not just hybrid). - Fix output value not to added when it runs multiple times with -r option. perf list: - Show metricgroup description from JSON file called metricgroups.json. - Allow 'pfm' argument to list only libpfm4 events and check each event is supported before showing it. JSON vendor events: - Avoid event grouping using "NO_GROUP_EVENTS" constraints. The topdown events are correctly grouped even if no group exists. - Add "Default" metric group to print it in the default output. And use "DefaultMetricgroupName" to indicate the real metric group name. - Add AmpereOne core PMU events. Misc: - Define man page date correctly. - Track exception level properly on ARM CoreSight ETM. - Allow anonymous struct, union or enum when retrieving type names from DWARF. - Fix incorrect filename when calling `perf inject --jit`. - Handle PLT size correctly on LoongArch" * tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits) perf test: Skip metrics w/o event name in stat STD output linter perf test: Reorder event name checks in stat STD output linter perf pmu: Remove a hard coded cpu PMU assumption perf pmus: Add notion of default PMU for JSON events perf unwind: Fix map reference counts perf test: Set PERF_EXEC_PATH for script execution perf script: Initialize buffer for regs_map() perf tests: Fix test_arm_callgraph_fp variable expansion perf symbol: Add LoongArch case in get_plt_sizes() perf test: Remove x permission from lib/stat_output.sh perf test: Rerun failed metrics with longer workload perf test: Add skip list for metrics known would fail perf test: Add metric value validation test perf jit: Fix incorrect file name in DWARF line table perf annotate: Fix instruction association and parsing for LoongArch perf annotation: Switch lock from a mutex to a sharded_mutex perf sharded_mutex: Introduce sharded_mutex tools: Fix incorrect calculation of object size by sizeof perf subcmd: Fix missing check for return value of malloc() in add_cmdname() perf parse-events: Remove unneeded semicolon ... |
||
![]() |
4a4a9bf907 |
perf expr: Add has_event function
Some events are dependent on firmware/kernel enablement. Allow such events to be detected when the metric is parsed so that the metric's event parsing doesn't fail. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623151016.4193660-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
36cee69f57 |
perf tools: Do not remove addr_location.thread in thread__find_map()
The thread__find_map() is to find a map for a given address in the
given thread's address space. It searches maps based on the cpu mode
and fills various information in the addr_location data structure.
It might change al->maps and al->map, but not al->thread. Then I
think no reason to not set the al->thread at the beginning.
Also get rid of the duplicate 'al->map = NULL' part.
Fixes:
|
||
![]() |
628eaa4e87 |
perf pmus: Add placeholder core PMU
If loading a core PMU fails, legacy hardware/cache events may segv due to there being no PMU. Create a placeholder empty PMU for this case. This was discussed in: https://lore.kernel.org/lkml/20230614151625.2077-1-yangjihong1@huawei.com/ Reported-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230627182834.117565-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
6aeadf7896 |
Move the arm64 architecture documentation under Documentation/arch/. This
brings some order to the documentation directory, declutters the top-level directory, and makes the documentation organization more closely match that of the source. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmSbDsIPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Y+ksH/2Xqun1ipPvu66+bBdPIf8N9AVFatl2q3mt4 tgX3A4RH3Ejklb4GbRLOIP23PmCxt7LRv4P05ttw8VpTP3A+Cw1d1s2RxiXGvfDE j7IW6hrpUmVoDdiDCRGtjdIa7MVI5aAsj8CCTjEFywGi5CQe0Uzq4aTUKoxJDEnu GYVy2CwDNEt4GTQ6ClPpFx2rc4UZf/H2XqXsnod9ef8A5Nkt3EtgoS1hh3o1QZGA Mqx2HAOVS1tb6GUVUbVLCdj40+YjBLjXFlsH4dA+wsFFdUlZLKuTesdiAMg2X6eT E8C/6oRT+OiWbrnXUTJEn8z98Ds8VHn7D4n97O9bIQ+R9AFtmPI= =H/+D -----END PGP SIGNATURE----- Merge tag 'docs-arm64-move' of git://git.lwn.net/linux Pull arm64 documentation move from Jonathan Corbet: "Move the arm64 architecture documentation under Documentation/arch/. This brings some order to the documentation directory, declutters the top-level directory, and makes the documentation organization more closely match that of the source" * tag 'docs-arm64-move' of git://git.lwn.net/linux: perf arm-spe: Fix a dangling Documentation/arm64 reference mm: Fix a dangling Documentation/arm64 reference arm64: Fix dangling references to Documentation/arm64 dt-bindings: fix dangling Documentation/arm64 reference docs: arm64: Move arm64 documentation under Documentation/arch/ |
||
![]() |
78987bb02a |
perf: Replace deprecated -target with --target= for Clang
-target has been deprecated since Clang 3.4 in 2013. Use the preferred
--target=bpf form instead. This matches how we use --target= in
scripts/Makefile.clang.
Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: llvm@lists.linux.dev
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link:
|
||
![]() |
710dffc969 |
perf pmu: Correct auto_merge_stats test
The original logic was to check is_pmu_hybrid() like in the below.
It just checks the name of PMU specifically for Intel hybrid systems
which means uncore PMU events should return false.
https://lore.kernel.org/all/20230527072210.2900565-35-irogers@google.com/
The is_pmu_hybrid() was replaced by arch-agnostic way but with the
incorrect condition which was fixed for core PMUs but not uncore.
This change fixes both.
Fixes:
|
||
![]() |
929ff679b6 |
perf tools: Add printing perf_event_attr config symbol in perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr config and its symbol to improve the readability of debugging information. Before: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 size 136 config 0x143 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 size 136 config 0x10005 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 size 136 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11 <SNIP> After: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 (PERF_TYPE_TRACEPOINT) size 136 config 0x143 (sched:sched_switch) { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 (PERF_TYPE_HW_CACHE) size 136 config 0x10005 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_BPU) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 (PERF_TYPE_RAW) size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 (PERF_TYPE_BREAKPOINT) size 136 config 0 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0x9 (PERF_COUNT_SW_DUMMY) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID inherit 1 mmap 1 comm 1 freq 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 12 <SNIP> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-5-yangjihong1@huawei.com [ fix perf import test by adding a dummy tracepoint_id__to_name() ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
2e4dd65d7d |
perf tools: Add printing perf_event_attr type symbol in perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr type and its symbol to improve the readability of debugging information. Before: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 size 136 config 0x143 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 size 136 config 0x10005 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 size 136 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ <SNIP> After: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 (PERF_TYPE_TRACEPOINT) size 136 config 0x143 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 (PERF_TYPE_HW_CACHE) size 136 config 0x10005 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 (PERF_TYPE_RAW) size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 (PERF_TYPE_BREAKPOINT) size 136 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ <SNIP> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-4-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
5492e72500 |
perf tools: Extend PRINT_ATTRf to support printing of members with a value of 0
When printing attr, members whose value is 0 will not be printed, we want to print the case where attr->type is 0(PERF_TYPE_HARDWARE), add `_a` param to PRINT_ATTRf macro to always print member when it is true No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-3-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
220f88b5a1 |
perf trace-event-info: Add tracepoint_id_to_name() helper
Add tracepoint_id_to_name() helper to search for the trace events directory by given event id and return the corresponding tracepoint. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-2-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
d82257d7f6 |
perf symbol: Remove now unused symbol_conf.sort_by_name
Previously used to specify symbol_name_rb_node was in use. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
259dce914e |
perf symbol: Remove symbol_name_rb_node
Most perf commands want to sort symbols by name and this is done via an invasive rbtree that on 64-bit systems costs 24 bytes. Sorting the symbols in a DSO by name is optional and not done by default, however, if sorting is requested the 24 bytes is allocated for every symbol. This change removes the rbtree and uses a sorted array of symbol pointers instead (costing 8 bytes per symbol). As the array is created on demand then there are further memory savings. The complexity of sorting the array and using the rbtree are the same. To support going to the next symbol, the index of the current symbol needs to be passed around as a pair with the current symbol. This requires some API changes. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-3-irogers@google.com [ minimize change in symbols__sort_by_name() ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
ce5b293405 |
perf dso: Sort symbols under lock
Determine if symbols are sorted, set the sorted flag and sort under the dso lock. Done in the interest of thread safety. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-2-irogers@google.com [ handle the similar code in util/probe-event.c ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
5c45b21047 |
perf bpf: Move the declaration of struct rq
struct rq is defined in vmlinux.h when the vmlinux.h is generated,
this causes a redefinition failure if it is declared in
lock_contention.bpf.c. Move the definition to vmlinux.h for
consistency with the generated version.
Fixes:
|
||
![]() |
b7a2d774c9 |
perf build: Add ability to build with a generated vmlinux.h
Commit |
||
![]() |
d06593aa00 |
perf pmu: Remove a hard coded cpu PMU assumption
The property of "cpu" when it has no cpu map is true on S390 with the PMU cpum_cf. Rather than maintain a list of such PMUs, reuse the is_core test result from the caller. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20230623043843.4080180-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
d685819b40 |
perf pmus: Add notion of default PMU for JSON events
JSON events created in pmu-events.c by jevents.py may not specify a PMU they are associated with, in which case it is implied that it is the first core PMU. Care is needed to select this for regular 'cpu', s390 'cpum_cf' and ARMs many names as at the point the name is first needed the core PMUs list hasn't been initialized. Add a helper in perf_pmus to create this value, in the worst case by scanning sysfs. v2. Add missing close if fdopendir fails. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623043843.4080180-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
33941dbd14 |
perf unwind: Fix map reference counts
The result of thread__find_map is the map in the passed in addr_location. Calling addr_location__exit puts that map and so copies need to do a map__get. Add in the corresponding map__puts. v2. Add missing map__put when dso is missing. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623043107.4077510-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
2d7f5540b8 |
perf script: Initialize buffer for regs_map()
The buffer is used to save register mapping in a sample. Normally
perf samples don't have any register so the string should be empty.
But it missed to initialize the buffer when the size is 0. And it's
passed to PyUnicode_FromString() with a garbage data.
So it returns NULL due to invalid input (instead of an empty unicode
string object) which causes a segfault like below:
Thread 2.1 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7c83780 (LWP 193775)]
0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
(gdb) bt
#0 0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#1 0x00007ffff6dbf848 in PyDict_SetItemString () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#2 0x000055555575824d in pydict_set_item_string_decref (val=0x0, key=0x5555557f96e3 "iregs", dict=0x7ffff5f7f780)
at util/scripting-engines/trace-event-python.c:145
#3 set_regs_in_dict (evsel=0x555555efc370, sample=0x7fffffffb870, dict=0x7ffff5f7f780)
at util/scripting-engines/trace-event-python.c:776
#4 get_perf_sample_dict (sample=sample@entry=0x7fffffffb870, evsel=evsel@entry=0x555555efc370, al=al@entry=0x7fffffffb2e0,
addr_al=addr_al@entry=0x0, callchain=callchain@entry=0x7ffff63ef440) at util/scripting-engines/trace-event-python.c:923
#5 0x0000555555758ec1 in python_process_tracepoint (sample=0x7fffffffb870, evsel=0x555555efc370, al=0x7fffffffb2e0, addr_al=0x0)
at util/scripting-engines/trace-event-python.c:1044
#6 0x00005555555c5db8 in process_sample_event (tool=<optimized out>, event=<optimized out>, sample=<optimized out>,
evsel=0x555555efc370, machine=0x555555ef4d68) at builtin-script.c:2421
#7 0x00005555556b7793 in perf_session__deliver_event (session=0x555555ef4b60, event=0x7ffff62ff7d0, tool=0x7fffffffc150,
file_offset=30672, file_path=0x555555efb8a0 "perf.data") at util/session.c:1639
#8 0x00005555556bc864 in do_flush (show_progress=true, oe=0x555555efb700) at util/ordered-events.c:245
#9 __ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL, timestamp=timestamp@entry=0)
at util/ordered-events.c:324
#10 0x00005555556bd06e in ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL)
at util/ordered-events.c:342
#11 0x00005555556b9d63 in __perf_session__process_events (session=0x555555ef4b60) at util/session.c:2465
#12 perf_session__process_events (session=0x555555ef4b60) at util/session.c:2627
#13 0x00005555555cb1d0 in __cmd_script (script=0x7fffffffc150) at builtin-script.c:2839
#14 cmd_script (argc=<optimized out>, argv=<optimized out>) at builtin-script.c:4365
#15 0x0000555555650811 in run_builtin (p=p@entry=0x555555ed8948 <commands+456>, argc=argc@entry=4, argv=argv@entry=0x7fffffffe240)
at perf.c:323
#16 0x0000555555597eb3 in handle_internal_command (argv=0x7fffffffe240, argc=4) at perf.c:377
#17 run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at perf.c:421
#18 main (argc=4, argv=0x7fffffffe240) at perf.c:537
Fixes:
|
||
![]() |
765be32b97 |
perf symbol: Add LoongArch case in get_plt_sizes()
We can see the following definitions in bfd/elfnn-loongarch.c: #define PLT_HEADER_INSNS 8 #define PLT_HEADER_SIZE (PLT_HEADER_INSNS * 4) #define PLT_ENTRY_INSNS 4 #define PLT_ENTRY_SIZE (PLT_ENTRY_INSNS * 4) so plt header size is 32 and plt entry size is 16 on LoongArch, let us add LoongArch case in get_plt_sizes(). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Cc: Ingo Molnar <mingo@redhat.com> Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-loongarch.c Link: https://lore.kernel.org/r/1684835873-15956-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
f40f97aaf7 |
perf arm-spe: Fix a dangling Documentation/arm64 reference
The arm64 documentation has moved under Documentation/arch/. Fix up a dangling reference to match. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> |
||
![]() |
362f9c907f |
perf jit: Fix incorrect file name in DWARF line table
Fixes an issue where an incorrect filename was added in the DWARF line table of an ELF object file when calling 'perf inject --jit' due to not checking the filename of a debug entry against the repeated name marker (/xff/0). The marker is mentioned in the tools/perf/util/jitdump.h header, which describes the jitdump binary format, and indicitates that the filename in a debug entry is the same as the previous enrty. In the function emit_lineno_info(), in the file tools/perf/util/genelf-debug.c, the debug entry filename gets compared to the previous entry filename. If they are not the same, a new filename is added to the DWARF line table. However, since there is no check against '\xff\0', in some cases '\xff\0' is inserted as the filename into the DWARF line table. This can be seen with `objdump --dwarf=line` on the ELF file after `perf inject --jit`. It also makes no source code information show up in 'perf annotate'. Signed-off-by: Elisabeth Panholzer <elisabeth@leaningtech.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230602123815.255001-1-paniii94@gmail.com [ Fixed a trailing white space, removed a subject prefix ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
4ca0d340ce |
perf annotate: Fix instruction association and parsing for LoongArch
In the perf annotate view for LoongArch, there is no arrowed line pointing to the target from the branch instruction. This issue is caused by incorrect instruction association and parsing. $ perf record alloc-6276705c94ad1398 # rust benchmark $ perf report 0.28 │ ori $a1, $zero, 0x63 │ move $a2, $zero 10.55 │ addi.d $a3, $a2, 1(0x1) │ sltu $a4, $a3, $s7 9.53 │ masknez $a4, $s7, $a4 │ sub.d $a3, $a3, $a4 12.12 │ st.d $a1, $fp, 24(0x18) │ st.d $a3, $fp, 16(0x10) 16.29 │ slli.d $a2, $a2, 0x2 │ ldx.w $a2, $s8, $a2 12.77 │ st.w $a2, $sp, 724(0x2d4) │ st.w $s0, $sp, 720(0x2d0) 7.03 │ addi.d $a2, $sp, 720(0x2d0) │ addi.d $a1, $a1, -1(0xfff) 12.03 │ move $a2, $a3 │ → bne $a1, $s3, -52(0x3ffcc) # 82ce8 <test::bench::Bencher::iter+0x3f4> 2.50 │ addi.d $a0, $a0, 1(0x1) This patch fixes instruction association issues, such as associating branch instructions with jump_ops instead of call_ops, and corrects false instruction matches. It also implements branch instruction parsing specifically for LoongArch. With this patch, we will be able to see the arrowed line. 0.79 │3ec: ori $a1, $zero, 0x63 │ move $a2, $zero 10.32 │3f4:┌─→addi.d $a3, $a2, 1(0x1) │ │ sltu $a4, $a3, $s7 10.44 │ │ masknez $a4, $s7, $a4 │ │ sub.d $a3, $a3, $a4 14.17 │ │ st.d $a1, $fp, 24(0x18) │ │ st.d $a3, $fp, 16(0x10) 13.15 │ │ slli.d $a2, $a2, 0x2 │ │ ldx.w $a2, $s8, $a2 11.00 │ │ st.w $a2, $sp, 724(0x2d4) │ │ st.w $s0, $sp, 720(0x2d0) 8.00 │ │ addi.d $a2, $sp, 720(0x2d0) │ │ addi.d $a1, $a1, -1(0xfff) 11.99 │ │ move $a2, $a3 │ └──bne $a1, $s3, 3f4 3.17 │ addi.d $a0, $a0, 1(0x1) Signed-off-by: WANG Rui <wangrui@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cn Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
2e9f9d4a72 |
perf annotation: Switch lock from a mutex to a sharded_mutex
Remove the "struct mutex lock" variable from annotation that is allocated per symbol. This removes in the region of 40 bytes per symbol allocation. Use a sharded mutex where the number of shards is set to the number of CPUs. Assuming good hashing of the annotation (done based on the pointer), this means in order to contend there needs to be more threads than CPUs, which is not currently true in any perf command. Were contention an issue it is straightforward to increase the number of shards in the mutex. On my Debian/glibc based machine, this reduces the size of struct annotation from 136 bytes to 96 bytes, or nearly 30%. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andres Freund <andres@anarazel.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yuan Can <yuancan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
0650b2b2e6 |
perf sharded_mutex: Introduce sharded_mutex
Per object mutexes may come with significant memory cost while a global mutex can suffer from unnecessary contention. A sharded mutex is a compromise where objects are hashed and then a particular mutex for the hash of the object used. Contention can be controlled by the number of shards. v2. Use hashmap.h's hash_bits in case of contention from alignment of objects. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andres Freund <andres@anarazel.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yuan Can <yuancan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230615040715.2064350-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
5e37ef5c2a |
tools: Fix incorrect calculation of object size by sizeof
What we need to calculate is the size of the object, not the size of the
pointer.
Fixed:
|
||
![]() |
240de691dd |
perf parse-events: Remove unneeded semicolon
./tools/perf/util/parse-events.c:1466:2-3: Unneeded semicolon Signed-off-by: Mingtong Bao <baomingtong001@208suo.com> Link: https://lore.kernel.org/r/2c733a91717eae93119ba2226420fd8f@208suo.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
bc06026d14 |
perf parse: Add missing newline to pr_debug message in evsel__compute_group_pmu_name()
The newline is missing for pr_debug message in evsel__compute_group_pmu_name(), fix it. Before: # perf --debug verbose=2 record -e cpu-clock true <SNIP> No PMU found for 'cycles:u'No PMU found for 'instructions:u'------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ <SNIP> After: # perf --debug verbose=2 record -e cpu-clock true <SNIP> No PMU found for 'cycles:u' No PMU found for 'instructions:u' ------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ <SNIP> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: peterz@infradead.org Cc: adrian.hunter@intel.com Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: kan.liang@linux.intel.com Cc: mingo@redhat.com Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20230616024515.80814-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
![]() |
82fe2e45cd |
perf pmus: Check if we can encode the PMU number in perf_event_attr.type
In some architectures we can't encode the PMU number in perf_event_attr.type and thus can't just ask for the same event in multiple CPUs (and thus PMUs), that is what we want in hybrid systems but we can't when that encoding isn't understood by the kernel, such as in ARM64's big.LITTLE. If that is the case, fallback to the previous behaviour till we find a better solution to have consistent output accross architectures with hybrid CPU configurations. Co-developed-with: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e2be06662c |
perf print-events: Export is_event_supported()
Will be used when checking if we can encode the PMU number in perf_event_attr.type, part of the logic to use in hybrid systems (multiple types of CPUs, such as Intel's (Alder Lake, etc) or ARM's big.LITTLE). Co-developed-with: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5752c20f37 |
perf mem: Scan all PMUs instead of just core ones
Scanning only core PMUs is not sufficient on platforms like AMD since perf mem on AMD uses IBS OP PMU, which is independent of core PMU. Scan all PMUs instead of just core PMUs. There should be negligible performance overhead because of scanning all PMUs, so we should be okay. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-4-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f0dc208267 |
perf mem amd: Fix perf_pmus__num_mem_pmus()
perf mem/c2c on AMD internally uses IBS OP PMU, not the core PMU. Also, AMD platforms does not have heterogeneous PMUs. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-3-ravi.bangoria@amd.com [ Added the improved comment for perf_pmus__num_mem_pmus() as b4 didn't from the per-patch (not series) newer version ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
cddfc5fb3f |
perf pmus: Describe semantics of 'core_pmus' and 'other_pmus'
Notion of 'core_pmus' and 'other_pmus' are independent of hw core and uncore pmus. For example, AMD IBS PMUs are present in each SMT-thread but they belongs to 'other_pmus'. Add a comment describing what these list contains and how they are treated. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
dada1a1f5f |
perf stat: Show average value on multiple runs
When -r option is used, perf stat runs the command multiple times and update stats in the evsel->stats.res_stats for global aggregation. But the value is never used and the value it prints at the end is just the value from the last run. I think we should print the average number of multiple runs. Add evlist__copy_res_stats() to update the aggr counter (for display) using the values in the evsel->stats.res_stats. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6fbd67b0f0 |
perf test: fix failing test cases on linux-next for s390
In linux-next tree the many test cases fail on s390x when running the
perf test suite, sometime the perf tool dumps core.
Output before:
6.1: Test event parsing : FAILED!
10.3: Parsing of PMU event table metrics : FAILED!
10.4: Parsing of PMU event table metrics with fake PMUs: FAILED!
17: Setup struct perf_event_attr : FAILED!
24: Number of exit events of a simple workload : FAILED!
26: Object code reading : FAILED!
28: Use a dummy software event to keep tracking : FAILED!
35: Track with sched_switch : FAILED!
42.3: BPF prologue generation : FAILED!
66: Parse and process metrics : FAILED!
68: Event expansion for cgroups : FAILED!
69.2: Perf time to TSC : FAILED!
74: build id cache operations : FAILED!
86: Zstd perf.data compression/decompression : FAILED!
87: perf record tests : FAILED!
106: Test java symbol : FAILED!
The reason for all these failure is a missing PMU. On s390x the PMU is
named cpum_cf which is not detected as core PMU. A similar patch was
added before, see commit
|
||
![]() |
66dc1920f6 |
perf annotate: Work with vmlinux outside symfs
It is currently possible to use --symfs along with a vmlinux which lies outside of the symfs by passing an absolute path to --vmlinux, thanks to the check in dso__load_vmlinux() which handles this explicitly. However, the annotate code lacks this check and thus 'perf annotate' does not work ("Internal error: Invalid -1 error code") for kernel functions with this combination. Add the missing handling. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel@axis.com Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6a80d794d7 |
perf stat: New metricgroup output for the default mode
In the default mode, the current output of the metricgroup include both events and metrics, which is not necessary and just makes the output hard to read. Since different ARCHs (even different generations in the same ARCH) may use different events. The output also vary on different platforms. For a metricgroup, only outputting the value of each metric is good enough. Add a new field default_metricgroup in evsel to indicate an event of the default metricgroup. For those events, printout() should print the metricgroup name rather than each event. Add perf_stat__skip_metric_event() to skip the evsel in the Default metricgroup, if it's not running or not the metric event. Add print_metricgroup_header_t to pass the functions which print the display name of each metricgroup in the Default metricgroup. Support all three output methods. Factor out perf_stat__print_shadow_stats_metricgroup() to print out each metrics. On SPR: Before: ./perf_old stat sleep 1 Performance counter stats for 'sleep 1': 0.54 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 125.445 K/sec 540,970 cycles:u # 0.998 GHz 556,325 instructions:u # 1.03 insn per cycle 123,602 branches:u # 228.018 M/sec 6,889 branch-misses:u # 5.57% of all branches 3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound # 17.2 % tma_retiring # 23.1 % tma_bad_speculation # 41.4 % tma_frontend_bound 564,859 topdown-retiring:u 1,370,999 topdown-fe-bound:u 603,271 topdown-be-bound:u 744,874 topdown-bad-spec:u 12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec 1.001798215 seconds time elapsed 0.000193000 seconds user 0.001700000 seconds sys After: $ ./perf stat sleep 1 Performance counter stats for 'sleep 1': 0.51 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 132.683 K/sec 545,228 cycles:u # 1.064 GHz 555,509 instructions:u # 1.02 insn per cycle 123,574 branches:u # 241.120 M/sec 6,957 branch-misses:u # 5.63% of all branches TopdownL1 # 17.5 % tma_backend_bound # 22.6 % tma_bad_speculation # 42.7 % tma_frontend_bound # 17.1 % tma_retiring TopdownL2 # 21.8 % tma_branch_mispredicts # 11.5 % tma_core_bound # 13.4 % tma_fetch_bandwidth # 29.3 % tma_fetch_latency # 2.7 % tma_heavy_operations # 14.5 % tma_light_operations # 0.8 % tma_machine_clears # 6.1 % tma_memory_bound 1.001712086 seconds time elapsed 0.000151000 seconds user 0.001618000 seconds sys Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
1c0e47956a |
perf metrics: Sort the Default metricgroup
The new default mode will print the metrics as a metric group. The metrics from the same metric group must be adjacent to each other in the metric list. But the metric_list_cmp() sorts metrics by the number of events. Add a new sort for the Default metricgroup, which sorts by default_metricgroup_name and metric_name. Add is_default in the struct metric_event to indicate that it's from the Default metricgroup. Store the displayed metricgroup name of the Default metricgroup into the metric expr for output. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b0a9e8f81f |
perf stat,jevents: Introduce Default tags for the default mode
Introduce a new metricgroup, Default, to tag all the metric groups which will be collected in the default mode. Add a new field, DefaultMetricgroupName, in the JSON file to indicate the real metric group name. It will be printed in the default output to replace the event names. There is nothing changed for the output format. On SPR, both TopdownL1 and TopdownL2 are displayed in the default output. On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is displayed in the default output. Suggested-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e15e4a3d7d |
perf evsel: Fix the annotation for hardware events on hybrid
The annotation for hardware events is wrong on hybrid. For example, # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 32,148.85 msec cpu-clock # 32.000 CPUs utilized 374 context-switches # 11.633 /sec 33 cpu-migrations # 1.026 /sec 295 page-faults # 9.176 /sec 18,979,960 cpu_core/cycles/ # 590.378 K/sec 261,230,783 cpu_atom/cycles/ # 8.126 M/sec (54.21%) 17,019,732 cpu_core/instructions/ # 529.404 K/sec 38,020,470 cpu_atom/instructions/ # 1.183 M/sec (63.36%) 3,296,743 cpu_core/branches/ # 102.546 K/sec 6,692,338 cpu_atom/branches/ # 208.167 K/sec (63.40%) 96,421 cpu_core/branch-misses/ # 2.999 K/sec 1,016,336 cpu_atom/branch-misses/ # 31.613 K/sec (63.38%) The hardware events have extended type on hybrid, but the evsel__match() doesn't take it into account. Filter the config on hybrid before checking. With the patch, # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 32,139.90 msec cpu-clock # 32.003 CPUs utilized 343 context-switches # 10.672 /sec 32 cpu-migrations # 0.996 /sec 73 page-faults # 2.271 /sec 13,712,841 cpu_core/cycles/ # 0.000 GHz 258,301,691 cpu_atom/cycles/ # 0.008 GHz (54.20%) 12,428,163 cpu_core/instructions/ # 0.91 insn per cycle 37,786,557 cpu_atom/instructions/ # 2.76 insn per cycle (63.35%) 2,418,826 cpu_core/branches/ # 75.259 K/sec 6,965,962 cpu_atom/branches/ # 216.739 K/sec (63.38%) 72,150 cpu_core/branch-misses/ # 2.98% of all branches 1,032,746 cpu_atom/branch-misses/ # 42.70% of all branches (63.35%) Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230615135315.3662428-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e90208e9ff |
perf srcline: Fix handling of inline functions
We write an address then a ',' to addr2line. With inline data we
generally get back (// are my comments):
0x1234 // address
foo // function name
foo.c:123 // filename:line
bar // function name
bar.c:123 // filename:line
0x000000000000000 // sentinel address created by ','
?? // unknown function name
??:0 // unknown filename:line
The code was assuming the inline data also had the address, which is
incorrect. This means the first inline function name (bar above) needs
to be checked to see if it is the sentinel, otherwise to be treated as
a function name. The regression was caused by the addition of
addresses as the kernel is reporting a symbol at address 0 (used by
GNU binutils when it interprets ',').
Committer testing:
Using:
# perf trace --call-graph=dwarf -e lock:contention_*
<SNIP>
1244.615 TaskCon~ller #/2645281 lock:contention_begin(lock_addr: 0xffff8e6748da5ab0, flags: 2)
__preempt_count_dec_and_test (inlined)
trace_contention_begin (inlined)
trace_contention_begin (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__preempt_count_dec_and_test (inlined)
trace_contention_begin (inlined)
trace_contention_begin (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__down_read_common (inlined)
__down_read (inlined)
down_read ([kernel.kallsyms])
arch_static_branch (inlined)
static_key_false (inlined)
__mmap_lock_trace_acquire_returned (inlined)
mmap_read_lock (inlined)
do_user_addr_fault ([kernel.kallsyms])
arch_local_irq_disable (inlined)
handle_page_fault (inlined)
exc_page_fault ([kernel.kallsyms])
asm_exc_page_fault ([kernel.kallsyms])
[0x4def008] (/usr/lib64/firefox/libxul.so)
1244.619 TaskCon~ller #/2645281 lock:contention_end(lock_addr: 0xffff8e6748da5ab0)
__preempt_count_dec_and_test (inlined)
trace_contention_end (inlined)
trace_contention_end (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__preempt_count_dec_and_test (inlined)
trace_contention_end (inlined)
trace_contention_end (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__down_read_common (inlined)
__down_read (inlined)
down_read ([kernel.kallsyms])
arch_static_branch (inlined)
static_key_false (inlined)
__mmap_lock_trace_acquire_returned (inlined)
mmap_read_lock (inlined)
do_user_addr_fault ([kernel.kallsyms])
arch_local_irq_disable (inlined)
handle_page_fault (inlined)
exc_page_fault ([kernel.kallsyms])
asm_exc_page_fault ([kernel.kallsyms])
<SNIP>
Fixes:
|
||
![]() |
701677b957 |
perf srcline: Add a timeout to reading from addr2line
addr2line may fail to send expected values causing perf to wait indefinitely. Add a 1 second timeout (twice the timeout for reading from /proc/pid/maps) so that such reads don't cause perf to appear to lock up. There are already checks that the file for addr2line contains a debug section but this isn't always sufficient. The problem was observed when a valid elf file would set the configuration for binutils addr2line, then a later read of vmlinux with ELF debug sections would cause a failing write/read which would block indefinitely. As a service to future readers, if the io hits eof or an error, cleanup the addr2line process. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230608061812.3715566-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6ec9503f45 |
perf parse-events: Avoid string for PE_BP_COLON, PE_BP_SLASH
There's no need to read the string ':' or '/' for PE_BP_COLON or
PE_BP_SLASH and doing so causes parse-events.y to leak memory.
The original patch has a committer note about not using these tokens
presumably as yacc spotted they were a memory leak because no
%destructor could be run. Remove the unused token workaround as there
is now no value associated with these tokens.
Fixes:
|
||
![]() |
e4c4e8a538 |
perf metric: Fix no group check
The no group check fails if there is more than one meticgroup in the
metricgroup_no_group.
The first parameter of the match_metric() should be the string, while
the substring should be the second parameter.
Fixes:
|
||
![]() |
8dc26b6f71 |
perf srcline: Make sentinel reading for binutils addr2line more robust
The addr2line process is sent an address then multiple function, filename:line "records" are read. To detect the end of output a ',' is sent and for llvm-addr2line a ',' is then read back showing the end of addrline's output. For binutils addr2line the ',' translates to address 0 and we expect the bogus filename marker "??:0" (see filename_split) to be sent from addr2line. For some kernels address 0 may have a mapping and so a seemingly valid inline output is given and breaking the sentinel discovery: ``` $ addr2line -e vmlinux -f -i , __per_cpu_start ./arch/x86/kernel/cpu/common.c:1850 ``` To avoid this problem enable the address dumping for addr2line (the -a option). If an address of 0x0000000000000000 is read then this is the sentinel value working around the problem above. The filename_split still needs to check for "??:0" as bogus non-zero addresses also need handling. Reported-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Changbin Du <changbin.du@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230613034817.1356114-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
c7a0023a14 |
perf srcline: Make addr2line configuration failure more verbose
To aid debugging why it fails. Also, combine the loops for reading a line for the llvm/binutils cases. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Changbin Du <changbin.du@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230613034817.1356114-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7f911905ff |
perf dwarf-aux: Allow unnamed struct/union/enum
It's possible some struct/union/enum type don't have type name. Allow the empty name after "struct"/"union"/"enum" string rather than fail. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230612234102.3909116-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
3abfcfd847 |
perf dwarf-aux: Fix off-by-one in die_get_varname()
The die_get_varname() returns "(unknown_type)" string if it failed to
find a type for the variable. But it had a space before the opening
parenthesis and it made the closing parenthesis cut off due to the
off-by-one in the string length (14).
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixes:
|
||
![]() |
d15b8c76c9 |
perf pfm: Remove duplicate util/cpumap.h include
Fixes:
|
||
![]() |
103b3d2f94 |
perf annotate: Allow whitespace between insn operands
The llvm-objdump adds a space between the operands while GNU objdump does not. Allow a space to handle the both. In GNU objdump: Disassembly of section .text: here | ffffffff81000000 <_stext>: v ffffffff81000000: 48 8d 25 51 1f 40 01 lea 0x1401f51(%rip),%rsp ffffffff81000007: e8 d4 00 00 00 call ffffffff810000e0 <verify_cpu> ffffffff8100000c: 48 8d 3d ed ff ff ff lea -0x13(%rip),%rdi In llvm-objdump: Disassembly of section .text: here | ffffffff81000000 <startup_64>: v ffffffff81000000: 48 8d 25 51 1f 40 01 leaq 20979537(%rip), %rsp ffffffff81000007: e8 d4 00 00 00 callq 0xffffffff810000e0 <verify_cpu> ffffffff8100000c: 48 8d 3d ed ff ff ff leaq -19(%rip), %rdi Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230612230026.3887586-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
0f0d1354a5 |
perf help: Ensure clean_cmds is called on all paths
Avoid potential memory leaks. Committer notes: This is right before calling exit(1), so just to clean up memory leak checker detection. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230611233610.953456-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d927ef5004 |
perf cs-etm: Add exception level consistency check
Assert that our own tracking of the exception level matches what OpenCSD provides. OpenCSD doesn't distinguish between EL0 and EL1 in the memory access callback so the extra tracking was required. But a rough assert can still be done. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230612111403.100613-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
8d3031d39f |
perf cs-etm: Track exception level
Currently we assume all trace belongs to the host machine so when the decoder should be looking at the guest kernel maps it can crash because it looks at the host ones instead. Avoid one scenario (guest kernel running at EL1) by assigning the default guest machine to this trace. For userspace trace it's still not possible to determine guest vs host, but the PIDs should help in this case. Committer notes: Fixed up conflict with: perf addr_location: Add init/exit/copy functions That was only on tmp.perf-tools-next. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230612111403.100613-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5414b53261 |
perf cs-etm: Make PID format accessible from struct cs_etm_auxtrace
To avoid every user of PID format having to use their own static local variable, cache it on initialisation and change the accessor to take struct cs_etm_auxtrace. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230612111403.100613-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d67d8c87d0 |
perf cs-etm: Use previous thread for branch sample source IP
Branch samples currently use the IP of the previous packet as the from IP, and the IP of the current packet as the to IP. But it incorrectly uses the current thread. In some cases like a jump into a different exception level this will attribute to the incorrect process. Fix it by tracking the previous thread in the same way the previous packet is tracked. Committer notes: Resolved conflicts with: perf addr_location: Add init/exit/copy functions perf thread: Add accessor functions for thread Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20230612111403.100613-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
951ccccdc7 |
perf cs-etm: Only track threads instead of PID and TIDs
PIDs and TIDs are already contained within the thread struct, so to avoid inconsistencies drop the extra members on the etm queue and only use the thread struct. At the same time stop using the 'unknown' thread. In a later commit we will be making samples from multiple machines so it will be better to use the idle thread of each machine rather than overlapping unknown threads. Using the idle thread is also better because kernel addresses with a previously unknown thread will now be assigned to a real kernel thread. Committer notes: Resolved conflicts with: perf addr_location: Add init/exit/copy functions perf thread: Add accessor functions for thread perf thread: Remove notion of dead threads That were present in tmp.perf-tools.next only. Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20230612111403.100613-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
0d98a7af4b |
perf map: Fix double 'struct map' reference free found with -DREFCNT_CHECKING=1
When quitting after running a 'perf report', the refcount checker finds some double frees. The issue is that map__put() is called on a function argument so it removes the refcount wrapper that someone else was using. Fix it by only calling map__put() on a reference that is owned by this function. Committer notes: Narrowed the map_ref scope as suggested by Ian, removed the symbol-elf part as it was already fixed by another patch, from Ian. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230612150424.198914-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
922db21d7e |
perf srcline: Optimize comparision against SRCLINE_UNKNOWN
This is a string constant that gets returned and then strcmp() around, we can instead just do a pointer comparision. That requires a new global variable to comply with these warnings from some versions of clang and gcc: 41 68.95 fedora:rawhide : FAIL clang version 16.0.4 (Fedora 16.0.4-1.fc39) result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if (start_line != SRCLINE_UNKNOWN && ^ ~~~~~~~~~~~~~~~ 41 Ack comments: Agreed, the strcmps make me nervous as they won't distinguish heap from a global meaning we could end up with things like pointers to freed memory. The comparison with the global is always going to be same imo. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/lkml/ZIcoJytUEz4UgQYR@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
834631ee77 |
perf hist: Fix srcline memory leak
srcline isn't freed if it is SRCLINE_UNKNOWN. Avoid strduping in this case as such strdups are redundant and leak memory. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-27-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
625db36e6c |
perf srcline: Change free_srcline to zfree_srcline
Make use after free more unlikely. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-26-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
8ab12a2038 |
perf callchain: Use pthread keys for tls callchain_cursor
Pthread keys are more portable than __thread and allow the association of a destructor with the key. Use the destructor to clean up TLS callchain cursors to aid understanding memory leaks. Committer notes: Had to fixup a series of unconverted places and also check for the return of get_tls_callchain_cursor() as it may fail and return NULL. In that unlikely case we now either print something to a file, if the caller was expecting to print a callchain, or return an error code to state that resolving the callchain isn't possible. In some cases this was made easier because thread__resolve_callchain() already can fail for other reasons, so this new one (cursor == NULL) can be added and the callers don't have to explicitely check for this new condition. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-25-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d7ba60a4e5 |
perf header: Avoid out-of-bounds read
intel-pt tests were failing: -- Test virtual LBR --- Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.126 MB /tmp/perf-test-intel-pt-sh.FW57CXnCqQ/test-perf.data ] Failed with virtual lbr ... ``` The root cause is an out-of-bounds read in header (where maxbrstack.py is from test_intel_pt.sh): ``` $ perf --no-pager script --itrace=L -s maxbrstack.py ================================================================= ==3907930==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000095a8 at pc 0x563c26c840bb bp 0x7fff43582710 sp 0x7fff43582708 READ of size 4 at 0x6020000095a8 thread T0 #0 0x563c26c840ba in process_group_desc util/header.c:2847 #1 0x563c26c8bc78 in perf_file_section__process util/header.c:4037 #2 0x563c26c8aa9b in perf_header__process_sections util/header.c:3813 #3 0x563c26c8d028 in perf_session__read_header util/header.c:4286 #4 0x563c26cbab29 in perf_session__open util/session.c:113 #5 0x563c26cbb3d0 in __perf_session__new util/session.c:221 #6 0x563c26aacb14 in perf_session__new util/session.h:73 #7 0x563c26acf7f1 in cmd_script tools/perf/builtin-script.c:4212 #8 0x563c26bb58ff in run_builtin tools/perf/perf.c:323 #9 0x563c26bb5e70 in handle_internal_command tools/perf/perf.c:377 #10 0x563c26bb6238 in run_argv tools/perf/perf.c:421 #11 0x563c26bb67a0 in main tools/perf/perf.c:537 #12 0x7f34bde46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #13 0x7f34bde46244 in __libc_start_main_impl ../csu/libc-start.c:381 #14 0x563c26a33390 in _start (/tmp/perf/perf+0x1eb390) 0x6020000095a8 is located 8 bytes to the right of 16-byte region [0x602000009590,0x6020000095a0) allocated by thread T0 here: #0 0x7f34beeb83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563c26c83df8 in process_group_desc util/header.c:2824 #2 0x563c26c8bc78 in perf_file_section__process util/header.c:4037 #3 0x563c26c8aa9b in perf_header__process_sections util/header.c:3813 #4 0x563c26c8d028 in perf_session__read_header util/header.c:4286 #5 0x563c26cbab29 in perf_session__open util/session.c:113 #6 0x563c26cbb3d0 in __perf_session__new util/session.c:221 #7 0x563c26aacb14 in perf_session__new util/session.h:73 #8 0x563c26acf7f1 in cmd_script tools/perf/builtin-script.c:4212 #9 0x563c26bb58ff in run_builtin tools/perf/perf.c:323 #10 0x563c26bb5e70 in handle_internal_command tools/perf/perf.c:377 #11 0x563c26bb6238 in run_argv tools/perf/perf.c:421 #12 0x563c26bb67a0 in main tools/perf/perf.c:537 #13 0x7f34bde46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 ``` Avoid the out-of-bounds read checking for the leader. Leave the 'nr' check intact as nr will be 0 or the counting down and evsel be a group member. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/lkml/20230608232823.4027869-24-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d3d53b2e96 |
perf annotate: Fix parse_objdump_line memory leak
fileloc is used to hold a previous line, before overwriting it ensure the previous contents is freed. Free the storage once done in symbol__disassemble. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-22-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
bffb5b0c09 |
perf map/maps/thread: Changes to reference counting
Fix missed reference count gets and puts as detected with leak sanitizer and reference count checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-21-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
1981da1fe2 |
perf machine: Don't leak module maps
machine__addnew_module_map requires a put on its result. Add this and narrow the scope of map to make the correctness more obvious. This leak was caught with leak sanitizer and the reference count checker. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-20-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
34b29bd61d |
perf machine: Fix leak of kernel dso
The kernel dso may be found by searching dsos or allocating if not found. The allocation returns with a reference count of 2, once for the dsos list and once for the returned value. The list search has a reference count of 1, once for the dsos list. To make the reference counts consistent, increase the dsos list search reference count to 2 with a dso__get, and do a put when the scope ends for either the allocated or found dso. This issue was found with leak sanitizer and reference count checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-19-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
814a656870 |
perf maps: Fix overlapping memory leak
Add a missed free detected by leak sanitizer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-18-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
fe8fec1028 |
perf symbol-elf: Correct holding a reference
If a reference is held, don't put it as this will confuse reference count checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5cedd1e29d |
perf jit: Fix two thread leaks
As reported by leak sanitizer with reference count checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-16-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
51cfe7a3e8 |
perf python: Avoid 2 leak sanitizer issues
Leak sanitizer complains about the variable size bf allocation and store to bf if sized 0. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-15-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
ac873ac326 |
perf evlist: Free stats in all evlist destruction
There is no evsel free stats, freeing in the evlist__delete ensures memory leaks are avoided. Issues detected with "perf stat report" and leak sanitizer, perf stat uses perf_session__delete to free the evlist. Add dummy symbol for python build. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
084770f55a |
perf intel-pt: Fix missed put and leak
Add missing put and free, detected with leak sanitizer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f8e502b9d1 |
perf header: Ensure bitmaps are freed
memory_node bitmaps need a bitmap_free to avoid memory leaks. Caught by leak sanitizer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
cf078c8381 |
perf machine: Make delete_threads part of machine__exit
The code required threads to be deleted before machine__exit was called or the threads would be leaked. This was error prone so move the delete_threads into machine__exit. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f6005cafeb |
perf thread: Add reference count checking
Modify struct declaration and accessor functions for the reference count checkers additional layer of indirection. Make sure pid_cmp in builtin-sched.c uses the underlying/original struct in pointer arithmetic, and not the temporary get/put indirection. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
0dd5041c9a |
perf addr_location: Add init/exit/copy functions
struct addr_location holds references to multiple reference counted objects. Add init/exit functions to make maintenance of those more consistent with the rest of the code and to try to avoid leaks. Modification of thread reference counts isn't included in this change. Committer notes: I needed to initialize result to sample->ip to make sure is set to something, fixing a compile time error, mostly keeping the previous logic as build_alloc_func_list() already does debugging/error prints about what went wrong if it takes the 'goto out'. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
620be847f4 |
perf addr_location: Move to its own header
addr_location is a common abstraction, move it into its own header and source file in preparation for wider clean up. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
46125590e0 |
perf maps: Make delete static, always use put
Address/leak sanitizer with reference count checking can identify the location of leaks, so use put rather than delete to avoid free-ing memory when the reference count is >1. Add maps__zput to ensure the variable is cleared. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
ee84a3032b |
perf thread: Add accessor functions for thread
Using accessors will make it easier to add reference count checking in later patches. Committer notes: thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer. When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
7ee227f674 |
perf thread: Make threads rbtree non-invasive
Separate the rbtree out of thread and into a new struct thread_rb_node. The refcnt is in thread and the rbtree is responsible for a single count. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
40826c45eb |
perf thread: Remove notion of dead threads
The dead thread list is best effort. Threads live on it until the reference count hits zero and they are removed. With correct reference counting this should never happen. It is, however, part of the 'perf sched' output that is now removed. If this is an issue we should implement tracking of dead threads in a robust not best-effort way. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d1f1cecc92 |
perf list: Check if libpfm4 event is supported
Some of its event info cannot be used directly due to missing default attributes. Let's check if the event is supported before printing like we do for hw and cache events. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers>@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230608232400.3056312-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
f0617f526c |
perf parse: Allow config terms with breakpoints
Add config terms to the parsing of breakpoint events. Extend "Test event parsing" to also cover using a confg term. This makes breakpoint events consistent with other events which already support config terms. Example: $ cat dr_test.c #include <unistd.h> #include <stdio.h> void func0(void) { } int main() { printf("func0 %p\n", &func0); while (1) { func0(); usleep(100000); } return 0; } $ gcc -g -O0 -o dr_test dr_test.c $ ./dr_test & [2] 19646 func0 0x55feb98dd169 $ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 0.5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (5 samples) ] $ perf script dr_test 19646 5632.956628: 1 breakpoint: 55feb98dd169 func0+0x0 (/home/ahunter/git/work/dr_test) dr_test 19646 5633.056866: 1 breakpoint: 55feb98dd169 func0+0x0 (/home/ahunter/git/work/dr_test) dr_test 19646 5633.157084: 1 breakpoint: 55feb98dd169 func0+0x0 (/home/ahunter/git/work/dr_test) dr_test 19646 5633.257309: 1 breakpoint: 55feb98dd169 func0+0x0 (/home/ahunter/git/work/dr_test) dr_test 19646 5633.357532: 1 breakpoint: 55feb98dd169 func0+0x0 (/home/ahunter/git/work/dr_test) $ sudo perf test "Test event parsing" 6: Parse event definition strings : 6.1: Test event parsing : Ok $ sudo perf test -v "Test event parsing" |& grep mem running test 8 'mem:0' running test 9 'mem:0:x' running test 10 'mem:0:r' running test 11 'mem:0:w' running test 19 'mem:0:u' running test 20 'mem:0:x:k' running test 21 'mem:0:r:hp' running test 22 'mem:0:w:up' running test 26 'mem:0:rw' running test 27 'mem:0:rw:kp' running test 42 'mem:0/1' running test 43 'mem:0/2:w' running test 44 'mem:0/4:rw:u' running test 58 'mem:0/name=breakpoint/' running test 59 'mem:0:x/name=breakpoint/' running test 60 'mem:0:r/name=breakpoint/' running test 61 'mem:0:w/name=breakpoint/' running test 62 'mem:0/name=breakpoint/u' running test 63 'mem:0:x/name=breakpoint/k' running test 64 'mem:0:r/name=breakpoint/hp' running test 65 'mem:0:w/name=breakpoint/up' running test 66 'mem:0:rw/name=breakpoint/' running test 67 'mem:0:rw/name=breakpoint/kp' running test 68 'mem:0/1/name=breakpoint/' running test 69 'mem:0/2:w/name=breakpoint/' running test 70 'mem:0/4:rw/name=breakpoint/u' running test 71 'mem:0/1/name=breakpoint1/,mem:0/4:rw/name=breakpoint2/' Committer notes: Folded follow up patch (see 2nd link below) to address warnings about unused tokens: perf tools: Suppress bison unused value warnings Patch "perf tools: Allow config terms with breakpoints" introduced parse tokens for colons and slashes within breakpoint parsing to prevent mix up with colons and slashes related to config terms. The token values are not needed but introduce bison "unused value" warnings. Suppress those warnings. Committer testing: # cat ~acme/c/mem_breakpoint.c #include <stdio.h> #include <unistd.h> void func1(void) { } void func2(void) { } void func3(void) { } void func4(void) { } void func5(void) { } int main() { printf("func1 %p\n", &func1); printf("func2 %p\n", &func2); printf("func3 %p\n", &func3); printf("func4 %p\n", &func4); printf("func5 %p\n", &func5); while (1) { func1(); func2(); func3(); func4(); func5(); usleep(100000); } return 0; } # ~acme/c/mem_breakpoint & [1] 3186153 func1 0x401136 func2 0x40113d func3 0x401144 func4 0x40114b func5 0x401152 # Trying to watch the first 4 functions for eXecutable access: # perf record -e mem:0x401136:x/name=breakpoint1/,mem:0x40113d:x/name=breakpoint2/,mem:0x401144:x/name=breakpoint3/,mem:0x40114b:x/name=breakpoint4/ -p 3186153 -- sleep 0.5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.026 MB perf.data (20 samples) ] [root@five ~]# perf script mem_breakpoint 3186153 131612.864793: 1 breakpoint1: 401136 func1+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131612.864795: 1 breakpoint2: 40113d func2+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131612.864796: 1 breakpoint3: 401144 func3+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131612.864797: 1 breakpoint4: 40114b func4+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131612.964868: 1 breakpoint1: 401136 func1+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131612.964870: 1 breakpoint2: 40113d func2+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131612.964871: 1 breakpoint3: 401144 func3+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131612.964872: 1 breakpoint4: 40114b func4+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.064945: 1 breakpoint1: 401136 func1+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.064948: 1 breakpoint2: 40113d func2+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.064948: 1 breakpoint3: 401144 func3+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.064949: 1 breakpoint4: 40114b func4+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.165024: 1 breakpoint1: 401136 func1+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.165026: 1 breakpoint2: 40113d func2+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.165027: 1 breakpoint3: 401144 func3+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.165028: 1 breakpoint4: 40114b func4+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.265103: 1 breakpoint1: 401136 func1+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.265105: 1 breakpoint2: 40113d func2+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.265106: 1 breakpoint3: 401144 func3+0x0 (/var/home/acme/c/mem_breakpoint) mem_breakpoint 3186153 131613.265107: 1 breakpoint4: 40114b func4+0x0 (/var/home/acme/c/mem_breakpoint) # Then all the 5 functions: # perf record -e mem:0x401136:x/name=breakpoint1/,mem:0x40113d:x/name=breakpoint2/,mem:0x401144:x/name=breakpoint3/,mem:0x40114b:x/name=breakpoint4/,mem:0x401152:x/name=breakpoint5/ -p 3186153 -- sleep 0.5 Error: The sys_perf_event_open() syscall returned with 28 (No space left on device) for event (breakpoint5). /bin/dmesg | grep -i perf may provide additional information. # grep -m1 'model name' /proc/cpuinfo model name : AMD Ryzen 9 5950X 16-Core Processor # Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230525082902.25332-2-adrian.hunter@intel.com Link: https://lore.kernel.org/r/f7228dc9-fe18-a8e3-7d3f-52922e0e1113@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d0b3597998 |
perf annotate: Handle x86 instruction suffix generally
In AT&T asm syntax, most of x86 instructions can have size suffix like b, w, l or q. Instead of adding all these instructions in the table, we can handle them in a general way. For example, it can try to find an instruction as is. If not found, assuming it has a suffix and it'd try again without the suffix if it's one of the allowed suffixes. This way, we can reduce the instruction table size for duplicated entries of the same instructions with a different suffix. If an instruction xyz and others like xyz<suffix> are completely different ones, then they both need to be listed in the table so that they can be found before the second attempt (without the suffix). Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230524205054.3087004-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
6f765bbbfb |
perf expr: Make the evaluation of & and | logical and lazy
Currently the & and | operators are only used in metric thresholds like (from the tma_retiring metric): tma_retiring > 0.7 | tma_heavy_operations > 0.1 Thresholds are always computed when present, but a lack of events may mean the threshold can't be computed. This happens with the option --metric-no-threshold for say the metric tma_retiring on Tigerlake model CPUs. To fully compute the threshold tma_heavy_operations is needed and it needs the extra events of IDQ.MS_UOPS, UOPS_DECODED.DEC0, cpu/UOPS_DECODED.DEC0,cmask=1/ and IDQ.MITE_UOPS. So --metric-no-threshold is a useful option to reduce the number of events needed and potentially multiplexing of events. Rather than just fail threshold computations like this, we may know a result from just the left or right-hand side. So, for tma_retiring if its value is "> 0.7" we know it is over the threshold. This allows the metric to have the threshold coloring, when possible, without all the counters being programmed. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20230519063719.1029596-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
49f3806d89 |
perf tools: Declare syscalltbl_*[] as const for all archs
syscalltbl_*[] should never be changing, let us declare it as const. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: loongarch@lists.linux.dev Link: https://lore.kernel.org/r/1685441401-8709-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
b9f010328c |
perf pmu: Warn about invalid config for all PMUs and configs
Don't just check the raw PMU type, the only core PMU on homogeneous x86, check raw and all dynamically added PMUs. Extend the perf_pmu__warn_invalid_config to check all 4 config values. Rather than process the format list once per event, store the computed masks for each config value. Don't ignore the mask being zero, which is likely for config2 and config3, add config_masks_present so config values can be ignored only when no format information is present. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230601023644.587584-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
68c2504341 |
perf pmu: Only warn about unsupported formats once
Avoid scanning format list for each event parsed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230601023644.587584-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
251aa04024 |
perf parse-events: Wildcard most "numeric" events
Numeric events are either raw events or those with ABI defined numbers
matched by the lexer. PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events
should wildcard match on hybrid systems. So "cycles" should match each
PMU type with an extended type, not just PERF_TYPE_HARDWARE.
Change wildcard matching to add the event even if wildcard PMU
scanning fails, there will be no extended type but this best matches
previous behavior.
Only set the extended type when the event type supports it and when
perf_pmus__supports_extended_type is true. This new function returns
true if >1 core PMU and avoids potential errors on older kernels.
Modify evsel__compute_group_pmu_name using a helper
perf_pmu__is_software to determine when grouping should occur. Try to
use PMUs, and evsel__find_pmu, as being more dependable than
evsel->pmu_name.
Set a parse events error if a hardware term's PMU lookup fails, to
provide extra diagnostics.
Fixes:
|
||
![]() |
1f4326bf83 |
perf evsel: Add verbose 3 print of evsel name when opening
It is often useful to know not just the attribute and perf_event_open() details when opening an evsel, but also the evsel's name. Add this debug output for verbose 3 so that it won't interfere with the current verbose 2 output. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230601082954.754318-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
e23421426e |
perf pmu: Correct perf_pmu__auto_merge_stats() affecting hybrid
Flip the return value correcting a bug.
Fixes:
|
||
![]() |
d17ed982e4 |
perf tools fixes for v6.4: 2nd batch
- Fix BPF CO-RE naming convention for checking the availability of fields on 'union perf_mem_data_src' on the running kernel. - Remove the use of llvm-strip on BPF skel object files, not needed, fixes a build breakage when the llvm package, that contains it in most distros, isn't installed. - Fix tools that use both evsel->{bpf_counter_list,bpf_filters}, removing them from a union. - Remove extra "--" from the 'perf ftrace latency' --use-nsec option, previously it was working only when using the '-n' alternative. - Don't stop building when both binutils-devel and a C++ compiler isn't available to compile the alternative C++ demangle support code, disable that feature instead. - Sync the linux/in.h and coresight-pmu.h header copies with the kernel sources. - Fix relative include path to cs-etm.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZHY9egAKCRCyPKLppCJ+ JzbBAQCv+i0j/+garWKCTSe33yztzQGS6jxu/YzQrbQO427DMwEA9fJgp/r2OQC5 wMM5gng2fPaHe6Hs4cnPL/SzMxLC2gQ= =fHRR -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.4-2-2023-05-30' into perf-tools-next perf tools fixes for v6.4: 2nd batch - Fix BPF CO-RE naming convention for checking the availability of fields on 'union perf_mem_data_src' on the running kernel. - Remove the use of llvm-strip on BPF skel object files, not needed, fixes a build breakage when the llvm package, that contains it in most distros, isn't installed. - Fix tools that use both evsel->{bpf_counter_list,bpf_filters}, removing them from a union. - Remove extra "--" from the 'perf ftrace latency' --use-nsec option, previously it was working only when using the '-n' alternative. - Don't stop building when both binutils-devel and a C++ compiler isn't available to compile the alternative C++ demangle support code, disable that feature instead. - Sync the linux/in.h and coresight-pmu.h header copies with the kernel sources. - Fix relative include path to cs-etm.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
d9c26d45db |
perf scripting-engines: Move static to local variable, remove 16384 from .bss
Avoid 16,384 bytes in .bss by stack allocating two bitmaps. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230526183401.2326121-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
370ce164de |
perf path: Make mkpath thread safe, remove 16384 bytes from .bss
Avoid 4 static arrays for paths, pass in a char[] buffer to use. Makes mkpath thread safe for the small number of users. Also removes 16,384 bytes from .bss. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230526183401.2326121-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
5c6e7c21ae |
perf header: Make nodes dynamic in write_mem_topology()
Avoid a large static array, dynamically allocate the nodes avoiding a hard coded limited as well. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230526183401.2326121-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
![]() |
797b9ec8c4 |
perf evsel: Don't let for_each_group() treat the head of the list as one of its nodes
Address/memory sanitizer was reporting issues in evsel__group_pmu_name
because the for_each_group_evsel loop didn't terminate when the head
was reached, the head would then be cast and accessed as an evsel
leading to invalid memory accesses.
Fix for_each_group_member and for_each_group_evsel to terminate at the
list head. Note, evsel__group_pmu_name no longer iterates the group, but
the problem is present regardless.
Fixes:
|
||
![]() |
a90cc5a9ee |
perf evsel: Don't let evsel__group_pmu_name() traverse unsorted group
Previously the evsel__group_pmu_name would iterate the evsel's group,
however, the list of evsels aren't yet sorted and so the loop may
terminate prematurely. It is also not desirable to iterate the list of
evsels during list_sort as the list may be broken.
Precompute the group_pmu_name for the evsel before sorting, as part of
the computation and only if necessary, iterate the whole list looking
for group members so that being sorted isn't necessary.
Move the group pmu name computation to parse-events.c given the closer
dependency on the behavior of
parse_events__sort_events_and_fix_groups.
Fixes:
|