Skip to content

Commit

Permalink
Merge tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/p…
Browse files Browse the repository at this point in the history
…ub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Namhyung Kim:
 "perf record:

   - Introduce latency profiling using scheduler information.

     The latency profiling is to show impacts on wall-time rather than
     cpu-time. By tracking context switches, it can weight samples and
     find which part of the code contributed more to the execution
     latency.

     The value (period) of the sample is weighted by dividing it by the
     number of parallel execution at the moment. The parallelism is
     tracked in perf report with sched-switch records. This will reduce
     the portion that are run in parallel and in turn increase the
     portion of serial executions.

     For now, it's limited to profile processes, IOW system-wide
     profiling is not supported. You can add --latency option to enable
     this.

       $ perf record --latency -- make -C tools/perf

     I've run the above command for perf build which adds -j option to
     make with the number of CPUs in the system internally. Normally
     it'd show something like below:

       $ perf report -F overhead,comm
       ...
       #
       # Overhead  Command
       # ........  ...............
       #
           78.97%  cc1
            6.54%  python3
            4.21%  shellcheck
            3.28%  ld
            1.80%  as
            1.37%  cc1plus
            0.80%  sh
            0.62%  clang
            0.56%  gcc
            0.44%  perl
            0.39%  make
  	 ...

     The cc1 takes around 80% of the overhead as it's the actual
     compiler. However it runs in parallel so its contribution to
     latency may be less than that. Now, perf report will show both
     overhead and latency (if --latency was given at record time) like
     below:

       $ perf report -s comm
       ...
       #
       # Overhead   Latency  Command
       # ........  ........  ...............
       #
           78.97%    48.66%  cc1
            6.54%    25.68%  python3
            4.21%     0.39%  shellcheck
            3.28%    13.70%  ld
            1.80%     2.56%  as
            1.37%     3.08%  cc1plus
            0.80%     0.98%  sh
            0.62%     0.61%  clang
            0.56%     0.33%  gcc
            0.44%     1.71%  perl
            0.39%     0.83%  make
  	 ...

     You can see latency of cc1 goes down to around 50% and python3 and
     ld contribute a lot more than their overhead. You can use --latency
     option in perf report to get the same result but ordered by
     latency.

       $ perf report --latency -s comm

  perf report:

   - As a side effect of the latency profiling work, it adds a new
     output field 'latency' and a sort key 'parallelism'. The below is a
     result from my system with 64 CPUs. The build was well-parallelized
     but contained some serial portions.

       $ perf report -s parallelism
       ...
       #
       # Overhead   Latency  Parallelism
       # ........  ........  ...........
       #
           16.95%     1.54%           62
           13.38%     1.24%           61
           12.50%    70.47%            1
           11.81%     1.06%           63
            7.59%     0.71%           60
            4.33%    12.20%            2
            3.41%     0.33%           59
            2.05%     0.18%           64
            1.75%     1.09%            9
            1.64%     1.85%            5
            ...

   - Support Feodra mini-debuginfo which is a LZMA compressed symbol
     table inside ".gnu_debugdata" ELF section.

  perf annotate:

   - Add --code-with-type option to enable data-type profiling with the
     usual annotate output.

     Instead of focusing on data structure, it shows code annotation
     together with data type it accesses in case the instruction refers
     to a memory location (and it was able to resolve the target data
     type). Currently it only works with --stdio.

       $ perf annotate --stdio --code-with-type
       ...
        Percent |      Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period)
       ----------------------------------------------------------------------------------------------------------------------
                : 0                0xffffffff81050610 <__fdget>:
           0.00 :   ffffffff81050610:        callq   0xffffffff81c01b80 <__fentry__>           # data-type: (stack operation)
           0.00 :   ffffffff81050615:        pushq   %rbp              # data-type: (stack operation)
           0.00 :   ffffffff81050616:        movq    %rsp, %rbp
           0.00 :   ffffffff81050619:        pushq   %r15              # data-type: (stack operation)
           0.00 :   ffffffff8105061b:        pushq   %r14              # data-type: (stack operation)
           0.00 :   ffffffff8105061d:        pushq   %rbx              # data-type: (stack operation)
           0.00 :   ffffffff8105061e:        subq    $0x10, %rsp
           0.00 :   ffffffff81050622:        movl    %edi, %ebx
           0.00 :   ffffffff81050624:        movq    %gs:0x7efc4814(%rip), %rax  # 0x14e40 <current_task>              # data-type: struct task_struct* +0
           0.00 :   ffffffff8105062c:        movq    0x8d0(%rax), %r14         # data-type: struct task_struct +0x8d0 (files)
           0.00 :   ffffffff81050633:        movl    (%r14), %eax              # data-type: struct files_struct +0 (count.counter)
           0.00 :   ffffffff81050636:        cmpl    $0x1, %eax
           0.00 :   ffffffff81050639:        je      0xffffffff810506a9 <__fdget+0x99>
           0.00 :   ffffffff8105063b:        movq    0x20(%r14), %rcx          # data-type: struct files_struct +0x20 (fdt)
           0.00 :   ffffffff8105063f:        movl    (%rcx), %eax              # data-type: struct fdtable +0 (max_fds)
           0.00 :   ffffffff81050641:        cmpl    %ebx, %eax
           0.00 :   ffffffff81050643:        jbe     0xffffffff810506ef <__fdget+0xdf>
           0.00 :   ffffffff81050649:        movl    %ebx, %r15d
           5.56 :   ffffffff8105064c:        movq    0x8(%rcx), %rdx           # data-type: struct fdtable +0x8 (fd)
  	...

     The "# data-type:" part was added with this change. The first few
     entries are not very interesting. But later you can it accesses a
     couple of fields in the task_struct, files_struct and fdtable.

  perf trace:

   - Support syscall tracing for different ABI. For example it can trace
     system calls for 32-bit applications on 64-bit kernel
     transparently.

   - Add --summary-mode=total option to show global syscall summary. The
     default is 'thread' to show per-thread syscall summary.

  Python support:

   - Add more interfaces to 'perf' module to parse events, and config,
     enable or disable the event list properly so that it can implement
     basic functionalities purely in Python. There is an example code
     for these new interfaces in python/tracepoint.py.

   - Add mypy and pylint support to enable build time checking. Fix some
     code based on the findings from these tools.

  Internals:

   - Introduce io_dir__readdir() API to make directory traveral (usually
     for proc or sysfs) efficient with less memory footprint.

  JSON vendor events:

   - Add events and metrics for ARM Neoverse N3 and V3

   - Update events and metrics on various Intel CPUs

   - Add/update events for a number of SiFive processors"

* tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits)
  perf bpf-filter: Fix a parsing error with comma
  perf report: Fix a memory leak for perf_env on AMD
  perf trace: Fix wrong size to bpf_map__update_elem call
  perf tools: annotate asm_pure_loop.S
  perf python: Fix setup.py mypy errors
  perf test: Address attr.py mypy error
  perf build: Add pylint build tests
  perf build: Add mypy build tests
  perf build: Rename TEST_LOGS to SHELL_TEST_LOGS
  tools/build: Don't pass test log files to linker
  perf bench sched pipe: fix enforced blocking reads in worker_thread
  perf tools: Fix is_compat_mode build break in ppc64
  perf build: filter all combinations of -flto for libperl
  perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation
  perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata
  perf trace: Fix evlist memory leak
  perf trace: Fix BTF memory leak
  perf trace: Make syscall table stable
  perf syscalltbl: Mask off ABI type for MIPS system calls
  perf build: Remove Makefile.syscalls
  ...
  • Loading branch information
Linus Torvalds committed Mar 31, 2025
2 parents 4e82c87 + 35d13f8 commit 802f0d5
Show file tree
Hide file tree
Showing 519 changed files with 39,386 additions and 9,914 deletions.
2 changes: 1 addition & 1 deletion tools/arch/x86/lib/insn.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
#endif
#include "../include/asm/inat.h" /* __ignore_sync_check__ */
#include "../include/asm/insn.h" /* __ignore_sync_check__ */
#include "../include/linux/unaligned.h" /* __ignore_sync_check__ */
#include <linux/unaligned.h> /* __ignore_sync_check__ */

#include <linux/errno.h>
#include <linux/kconfig.h>
Expand Down
6 changes: 5 additions & 1 deletion tools/build/Makefile.build
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ objprefix := $(subst ./,,$(OUTPUT)$(dir)/)
obj-y := $(addprefix $(objprefix),$(obj-y))
subdir-obj-y := $(addprefix $(objprefix),$(subdir-obj-y))

# Separate out test log files from real build objects.
test-y := $(filter %_log, $(obj-y))
obj-y := $(filter-out %_log, $(obj-y))

# Final '$(obj)-in.o' object
in-target := $(objprefix)$(obj)-in.o

Expand All @@ -139,7 +143,7 @@ $(subdir-y):

$(sort $(subdir-obj-y)): $(subdir-y) ;

$(in-target): $(obj-y) FORCE
$(in-target): $(obj-y) $(test-y) FORCE
$(call rule_mkdir)
$(call if_changed,$(host)ld_multi)

Expand Down
2 changes: 1 addition & 1 deletion tools/build/feature/test-backtrace.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
int main(void)
{
void *backtrace_fns[10];
size_t entries;
int entries;

entries = backtrace(backtrace_fns, 10);
backtrace_symbols_fd(backtrace_fns, entries, 1);
Expand Down
2 changes: 1 addition & 1 deletion tools/build/feature/test-bpf.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,5 @@ int main(void)
* Test existence of __NR_bpf and BPF_PROG_LOAD.
* This call should fail if we run the testcase.
*/
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr)) == 0;
}
2 changes: 1 addition & 1 deletion tools/build/feature/test-glibc.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ int main(void)
const char *version = XSTR(__GLIBC__) "." XSTR(__GLIBC_MINOR__);
#endif

return (long)version;
return version == NULL;
}
2 changes: 1 addition & 1 deletion tools/build/feature/test-libdebuginfod.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
int main(void)
{
debuginfod_client* c = debuginfod_begin();
return (long)c;
return !!c;
}
2 changes: 1 addition & 1 deletion tools/build/feature/test-libdw.c
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ int test_libdw(void)
{
Dwarf *dbg = dwarf_begin(0, DWARF_C_READ);

return (long)dbg;
return dbg == NULL;
}

int test_libdw_unwind(void)
Expand Down
2 changes: 1 addition & 1 deletion tools/build/feature/test-libelf-gelf_getnote.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@

int main(void)
{
return gelf_getnote(NULL, 0, NULL, NULL, NULL);
return gelf_getnote(NULL, 0, NULL, NULL, NULL) == 0;
}
2 changes: 1 addition & 1 deletion tools/build/feature/test-libelf.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ int main(void)
{
Elf *elf = elf_begin(0, ELF_C_READ, 0);

return (long)elf;
return !!elf;
}
2 changes: 1 addition & 1 deletion tools/build/feature/test-lzma.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
int main(void)
{
lzma_stream strm = LZMA_STREAM_INIT;
int ret;
lzma_ret ret;

ret = lzma_stream_decoder(&strm, UINT64_MAX, LZMA_CONCATENATED);
return ret ? -1 : 0;
Expand Down
2 changes: 1 addition & 1 deletion tools/lib/api/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ install_lib: $(LIBFILE)
$(call do_install_mkdir,$(libdir_SQ)); \
cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)

HDRS := cpu.h debug.h io.h
HDRS := cpu.h debug.h io.h io_dir.h
FD_HDRS := fd/array.h
FS_HDRS := fs/fs.h fs/tracing_path.h
INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
Expand Down
105 changes: 105 additions & 0 deletions tools/lib/api/io_dir.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/*
* Lightweight directory reading library.
*/
#ifndef __API_IO_DIR__
#define __API_IO_DIR__

#include <dirent.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <linux/limits.h>

#if !defined(SYS_getdents64)
#if defined(__x86_64__) || defined(__arm__)
#define SYS_getdents64 217
#elif defined(__i386__) || defined(__s390x__) || defined(__sh__)
#define SYS_getdents64 220
#elif defined(__alpha__)
#define SYS_getdents64 377
#elif defined(__mips__)
#define SYS_getdents64 308
#elif defined(__powerpc64__) || defined(__powerpc__)
#define SYS_getdents64 202
#elif defined(__sparc64__) || defined(__sparc__)
#define SYS_getdents64 154
#elif defined(__xtensa__)
#define SYS_getdents64 60
#else
#define SYS_getdents64 61
#endif
#endif /* !defined(SYS_getdents64) */

static inline ssize_t perf_getdents64(int fd, void *dirp, size_t count)
{
#ifdef MEMORY_SANITIZER
memset(dirp, 0, count);
#endif
return syscall(SYS_getdents64, fd, dirp, count);
}

struct io_dirent64 {
ino64_t d_ino; /* 64-bit inode number */
off64_t d_off; /* 64-bit offset to next structure */
unsigned short d_reclen; /* Size of this dirent */
unsigned char d_type; /* File type */
char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
};

struct io_dir {
int dirfd;
ssize_t available_bytes;
struct io_dirent64 *next;
struct io_dirent64 buff[4];
};

static inline void io_dir__init(struct io_dir *iod, int dirfd)
{
iod->dirfd = dirfd;
iod->available_bytes = 0;
}

static inline void io_dir__rewinddir(struct io_dir *iod)
{
lseek(iod->dirfd, 0, SEEK_SET);
iod->available_bytes = 0;
}

static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
{
struct io_dirent64 *entry;

if (iod->available_bytes <= 0) {
ssize_t rc = perf_getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));

if (rc <= 0)
return NULL;
iod->available_bytes = rc;
iod->next = iod->buff;
}
entry = iod->next;
iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
iod->available_bytes -= entry->d_reclen;
return entry;
}

static inline bool io_dir__is_dir(const struct io_dir *iod, struct io_dirent64 *dent)
{
if (dent->d_type == DT_UNKNOWN) {
struct stat st;

if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
return false;

if (S_ISDIR(st.st_mode)) {
dent->d_type = DT_DIR;
return true;
}
}
return dent->d_type == DT_DIR;
}

#endif /* __API_IO_DIR__ */
12 changes: 3 additions & 9 deletions tools/lib/perf/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,6 @@ libdir_relative_SQ = $(subst ','\'',$(libdir_relative))

TEST_ARGS := $(if $(V),-v)

# Set compile option CFLAGS
ifdef EXTRA_CFLAGS
CFLAGS := $(EXTRA_CFLAGS)
else
CFLAGS := -g -Wall
endif

INCLUDES = \
-I$(srctree)/tools/lib/perf/include \
-I$(srctree)/tools/lib/ \
Expand All @@ -57,11 +50,12 @@ INCLUDES = \
-I$(srctree)/tools/include/uapi

# Append required CFLAGS
override CFLAGS += $(EXTRA_WARNINGS)
override CFLAGS += -Werror -Wall
override CFLAGS += -g -Werror -Wall
override CFLAGS += -fPIC
override CFLAGS += $(INCLUDES)
override CFLAGS += -fvisibility=hidden
override CFLAGS += $(EXTRA_WARNINGS)
override CFLAGS += $(EXTRA_CFLAGS)

all:

Expand Down
8 changes: 4 additions & 4 deletions tools/lib/perf/cpumap.c
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
while (isdigit(*cpu_list)) {
p = NULL;
start_cpu = strtoul(cpu_list, &p, 0);
if (start_cpu >= INT_MAX
if (start_cpu >= INT16_MAX
|| (*p != '\0' && *p != ',' && *p != '-' && *p != '\n'))
goto invalid;

Expand All @@ -194,7 +194,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
p = NULL;
end_cpu = strtoul(cpu_list, &p, 0);

if (end_cpu >= INT_MAX || (*p != '\0' && *p != ',' && *p != '\n'))
if (end_cpu >= INT16_MAX || (*p != '\0' && *p != ',' && *p != '\n'))
goto invalid;

if (end_cpu < start_cpu)
Expand All @@ -209,7 +209,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
for (; start_cpu <= end_cpu; start_cpu++) {
/* check for duplicates */
for (i = 0; i < nr_cpus; i++)
if (tmp_cpus[i].cpu == (int)start_cpu)
if (tmp_cpus[i].cpu == (int16_t)start_cpu)
goto invalid;

if (nr_cpus == max_entries) {
Expand All @@ -219,7 +219,7 @@ struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list)
goto invalid;
tmp_cpus = tmp;
}
tmp_cpus[nr_cpus++].cpu = (int)start_cpu;
tmp_cpus[nr_cpus++].cpu = (int16_t)start_cpu;
}
if (*p)
++p;
Expand Down
3 changes: 2 additions & 1 deletion tools/lib/perf/include/perf/cpumap.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@

#include <perf/core.h>
#include <stdbool.h>
#include <stdint.h>

/** A wrapper around a CPU to avoid confusion with the perf_cpu_map's map's indices. */
struct perf_cpu {
int cpu;
int16_t cpu;
};

struct perf_cache {
Expand Down
32 changes: 29 additions & 3 deletions tools/perf/Build
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,40 @@ gtk-y += ui/gtk/

ifdef SHELLCHECK
SHELL_TESTS := $(wildcard *.sh)
TEST_LOGS := $(SHELL_TESTS:%=%.shellcheck_log)
SHELL_TEST_LOGS := $(SHELL_TESTS:%=%.shellcheck_log)
else
SHELL_TESTS :=
TEST_LOGS :=
SHELL_TEST_LOGS :=
endif

$(OUTPUT)%.shellcheck_log: %
$(call rule_mkdir)
$(Q)$(call echo-cmd,test)shellcheck -s bash -a -S warning "$<" > $@ || (cat $@ && rm $@ && false)

perf-y += $(TEST_LOGS)
perf-y += $(SHELL_TEST_LOGS)

ifdef MYPY
PY_TESTS := $(shell find python -type f -name '*.py')
MYPY_TEST_LOGS := $(PY_TESTS:python/%=python/%.mypy_log)
else
MYPY_TEST_LOGS :=
endif

$(OUTPUT)%.mypy_log: %
$(call rule_mkdir)
$(Q)$(call echo-cmd,test)mypy "$<" > $@ || (cat $@ && rm $@ && false)

perf-y += $(MYPY_TEST_LOGS)

ifdef PYLINT
PY_TESTS := $(shell find python -type f -name '*.py')
PYLINT_TEST_LOGS := $(PY_TESTS:python/%=python/%.pylint_log)
else
PYLINT_TEST_LOGS :=
endif

$(OUTPUT)%.pylint_log: %
$(call rule_mkdir)
$(Q)$(call echo-cmd,test)pylint "$<" > $@ || (cat $@ && rm $@ && false)

perf-y += $(PYLINT_TEST_LOGS)
5 changes: 3 additions & 2 deletions tools/perf/Documentation/callchain-overhead-calculation.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
Overhead calculation
--------------------
The overhead can be shown in two columns as 'Children' and 'Self' when
perf collects callchains. The 'self' overhead is simply calculated by
The CPU overhead can be shown in two columns as 'Children' and 'Self'
when perf collects callchains (and corresponding 'Wall' columns for
wall-clock overhead). The 'self' overhead is simply calculated by
adding all period values of the entry - usually a function (symbol).
This is the value that perf shows traditionally and sum of all the
'self' overhead values should be 100%.
Expand Down
Loading

0 comments on commit 802f0d5

Please sign in to comment.