Skip to content

Commit

Permalink
Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/…
Browse files Browse the repository at this point in the history
…pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "perf tools maintainership:

   - Add git information for perf-tools and perf-tools-next trees and
     branches to the MAINTAINERS file. That is where development now
     takes place and myself and Namhyung Kim have write access, more
     people to come as we emulate other maintainer groups.

  perf record:

   - Record kernel data maps when 'perf record --data' is used, so that
     global variables can be resolved and used in tools that do data
     profiling.

  perf trace:

   - Remove the old, experimental support for BPF events in which a .c
     file was passed as an event: "perf trace -e hello.c" to then get
     compiled and loaded.

     The only known usage for that, that shipped with the kernel as an
     example for such events, augmented the raw_syscalls tracepoints and
     was converted to a libbpf skeleton, reusing all the user space
     components and the BPF code connected to the syscalls.

     In the end just the way to glue the BPF part and the user space
     type beautifiers changed, now being performed by libbpf skeletons.

     The next step is to use BTF to do pretty printing of all syscall
     types, as discussed with Alan Maguire and others.

     Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
     path/filenames/strings, some of the networking data structures,
     perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
     and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
     seconds:

      # perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5
         0.000 (   9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
         9.039 (   0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
             ? (           ): gpm/991  ... [continued]: clock_nanosleep())               = 0
        10.133 (           ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ...
             ? (           ): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
        30.276 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
       223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
        30.276 (2000.394 ms): gpm/991  ... [continued]: clock_nanosleep())               = 0
      1230.814 (           ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
      1230.814 (1000.404 ms): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
      2030.886 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
      2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
             ? (           ): crond/1172  ... [continued]: clock_nanosleep())            = 0
      3242.699 (           ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
      2030.886 (2000.385 ms): gpm/991  ... [continued]: clock_nanosleep())               = 0
      3728.078 (           ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ...
      3242.699 (1000.158 ms): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
      4031.409 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
        10.133 (5000.375 ms): sleep/327642  ... [continued]: clock_nanosleep())          = 0

      Performance counter stats for 'sleep 5':

             2,617,347      cycles
             1,855,997      instructions                     #    0.71  insn per cycle

           5.002282128 seconds time elapsed

           0.000855000 seconds user
           0.000852000 seconds sys

  perf annotate:

   - Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
     for licensing reasons, and we missed a build test on
     tools/perf/tests makefile.

     Since we now default to NDEBUG=1, we ended up segfaulting when
     building with BUILD_NONDISTRO=1 because a needed initialization
     routine was being "error checked" via an assert.

     Fix it by explicitly checking the result and aborting instead if it
     fails.

     We better back propagate the error, but at least 'perf annotate' on
     samples collected for a BPF program is back working when perf is
     built with BUILD_NONDISTRO=1.

  perf report/top:

   - Add back TUI hierarchy mode header, that is seen when using 'perf
     report/top --hierarchy'.

   - Fix the number of entries for 'e' key in the TUI that was
     preventing navigation of lines when expanding an entry.

  perf report/script:

   - Support cross platform register handling, allowing a perf.data file
     collected on one architecture to have registers sampled correctly
     displayed when analysis tools such as 'perf report' and 'perf
     script' are used on a different architecture.

   - Fix handling of event attributes in pipe mode, i.e. when one uses:

  	perf record -o - | perf report -i -

     When no perf.data files are used.

   - Handle files generated via pipe mode with a version of perf and
     then read also via pipe mode with a different version of perf,
     where the event attr record may have changed, use the record size
     field to properly support this version mismatch.

  perf probe:

   - Accessing global variables from uprobes isn't supported, make the
     error message state that instead of stating that some minimal
     kernel version is needed to have that feature. This seems just a
     tool limitation, the kernel probably has all that is needed.

  perf tests:

   - Fix a reference count related leak in the dlfilter v0 API where the
     result of a thread__find_symbol_fb() is not matched with an
     addr_location__exit() to drop the reference counts of the resolved
     components (machine, thread, map, symbol, etc). Add a dlfilter test
     to make sure that doesn't regresses.

   - Lots of fixes for the 'perf test' written in shell script related
     to problems found with the shellcheck utility.

   - Fixes for 'perf test' shell scripts testing features enabled when
     perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
     counters.

   - Add perf record sample filtering test, things like the following
     example, that gets implemented as a BPF filter attached to the
     event:

       # perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000'

   - Improve the way the task_analyzer test checks if libtraceevent is
     linked, using 'perf version --build-options' instead of the more
     expensinve 'perf record -e "sched:sched_switch"'.

   - Add support for riscv in the mmap-basic test. (This went as well
     via the RiscV tree, same contents).

  libperf:

   - Implement riscv mmap support (This went as well via the RiscV tree,
     same contents).

  perf script:

   - New tool that converts perf.data files to the firefox profiler
     format so that one can use the visualizer at
     https://profiler.firefox.com/. Done by Anup Sharma as part of this
     year's Google Summer of Code.

     One can generate the output and upload it to the web interface but
     Anup also automated everything:

       perf script gecko -F 99 -a sleep 60

   - Support syscall name parsing on arm64.

   - Print "cgroup" field on the same line as "comm".

  perf bench:

   - Add new 'uprobe' benchmark to measure the overhead of uprobes
     with/without BPF programs attached to it.

   - breakpoints are not available on power9, skip that test.

  perf stat:

   - Add #num_cpus_online literal to be used in 'perf stat' metrics, and
     add this extra 'perf test' check that exemplifies its purpose:

  	TEST_ASSERT_VAL("#num_cpus_online",
                         expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0);
  	TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
  	TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online);

  Miscellaneous:

   - Improve tool startup time by lazily reading PMU, JSON, sysfs data.

   - Improve error reporting in the parsing of events, passing YYLTYPE
     to error routines, so that the output can show were the parsing
     error was found.

   - Add 'perf test' entries to check the parsing of events
     improvements.

   - Fix various leak for things detected by -fsanitize=address, mostly
     things that would be freed at tool exit, including:

       - Free evsel->filter on the destructor.

       - Allow tools to register a thread->priv destructor and use it in
         'perf trace'.

       - Free evsel->priv in 'perf trace'.

       - Free string returned by synthesize_perf_probe_point() when the
         caller fails to do all it needs.

   - Adjust various compiler options to not consider errors some
     warnings when building with broken headers found in things like
     python, flex, bison, as we otherwise build with -Werror. Some for
     gcc, some for clang, some for some specific version of those, some
     for some specific version of flex or bison, or some specific
     combination of these components, bah.

   - Allow customization of clang options for BPF target, this helps
     building on gentoo where there are other oddities where BPF targets
     gets passed some compiler options intended for the native build, so
     building with WERROR=0 helps while these oddities are fixed.

   - Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
     and 'perf lock', fixing some segfaults when handling some odd
     failures.

   - Add LTO build option.

   - Fix format of unordered lists in the perf docs
     (tools/perf/Documentation)

   - Overhaul the bison files, using constructs such as YYNOMEM.

   - Remove unused tokens from the bison .y files.

   - Add more comments to various structs.

   - A few LoongArch enablement patches.

  Vendor events (JSON):

   - Add JSON metrics for Yitian 710 DDR (aarch64). Things like:

  	EventName, BriefDescription
  	visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
  	visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
  	op_is_dqsosc_mpc	       , "A DQS Oscillator MPC command to DRAM.",
  	op_is_dqsosc_mrr	       , "A DQS Oscillator MRR command to DRAM.",
  	op_is_tcr_mrr		       , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",

   - Add AmpereOne metrics (aarch64).

   - Update N2 and V2 metrics (aarch64) and events using Arm telemetry
     repo.

   - Update scale units and descriptions of common topdown metrics on
     aarch64. Things like:
       - "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
       - "BriefDescription": "Frontend bound L1 topdown metric",
       + "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
       + "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",

   - Update events for intel: meteorlake to 1.04, sapphirerapids to
     1.15, Icelake+ metric constraints.

   - Update files for the power10 platform"

* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
  perf parse-events: Fix driver config term
  perf parse-events: Fixes relating to no_value terms
  perf parse-events: Fix propagation of term's no_value when cloning
  perf parse-events: Name the two term enums
  perf list: Don't print Unit for "default_core"
  perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
  perf dlfilter: Avoid leak in v0 API test use of resolve_address()
  perf metric: Add #num_cpus_online literal
  perf pmu: Remove str from perf_pmu_alias
  perf parse-events: Make common term list to strbuf helper
  perf parse-events: Minor help message improvements
  perf pmu: Avoid uninitialized use of alias->str
  perf jevents: Use "default_core" for events with no Unit
  perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
  perf test shell stat_bpf_counters: Fix test on Intel
  perf test shell record_bpf_filter: Skip 6.2 kernel
  libperf: Get rid of attr.id field
  perf tools: Convert to perf_record_header_attr_id()
  libperf: Add perf_record_header_attr_id()
  perf tools: Handle old data in PERF_RECORD_ATTR
  ...
  • Loading branch information
Linus Torvalds committed Sep 10, 2023
2 parents fd3a594 + 45fc462 commit 535a265
Show file tree
Hide file tree
Showing 284 changed files with 8,011 additions and 9,119 deletions.
5 changes: 5 additions & 0 deletions Documentation/admin-guide/perf/alibaba_pmu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ data bandwidth::
-e ali_drw_27080/hif_rmw/ \
-e ali_drw_27080/cycle/ -- sleep 10

Example usage of counting all memory read/write bandwidth by metric::

perf stat -M ddr_read_bandwidth.all -- sleep 10
perf stat -M ddr_write_bandwidth.all -- sleep 10

The average DRAM bandwidth can be calculated as follows:

- Read Bandwidth = perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
Expand Down
2 changes: 2 additions & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -16763,6 +16763,8 @@ L: linux-kernel@vger.kernel.org
S: Supported
W: https://perf.wiki.kernel.org/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
T: git git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools.git perf-tools
T: git git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
F: arch/*/events/*
F: arch/*/events/*/*
F: arch/*/include/asm/perf_event.h
Expand Down
10 changes: 10 additions & 0 deletions tools/build/Makefile.build
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,16 @@ $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)

# bison and flex files are generated in the OUTPUT directory
# so it needs a separate rule to depend on them properly
$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,$(host)cc_o_c)

$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,$(host)cc_o_c)

# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
Expand Down
10 changes: 4 additions & 6 deletions tools/build/feature/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -340,25 +340,23 @@ $(OUTPUT)test-jvmti-cmlr.bin:
$(BUILD)

$(OUTPUT)test-llvm.bin:
$(BUILDXX) -std=gnu++14 \
$(BUILDXX) -std=gnu++17 \
-I$(shell $(LLVM_CONFIG) --includedir) \
-L$(shell $(LLVM_CONFIG) --libdir) \
$(shell $(LLVM_CONFIG) --libs Core BPF) \
$(shell $(LLVM_CONFIG) --system-libs) \
> $(@:.bin=.make.output) 2>&1

$(OUTPUT)test-llvm-version.bin:
$(BUILDXX) -std=gnu++14 \
$(BUILDXX) -std=gnu++17 \
-I$(shell $(LLVM_CONFIG) --includedir) \
> $(@:.bin=.make.output) 2>&1

$(OUTPUT)test-clang.bin:
$(BUILDXX) -std=gnu++14 \
$(BUILDXX) -std=gnu++17 \
-I$(shell $(LLVM_CONFIG) --includedir) \
-L$(shell $(LLVM_CONFIG) --libdir) \
-Wl,--start-group -lclangBasic -lclangDriver \
-lclangFrontend -lclangEdit -lclangLex \
-lclangAST -Wl,--end-group \
-Wl,--start-group -lclang-cpp -Wl,--end-group \
$(shell $(LLVM_CONFIG) --libs Core option) \
$(shell $(LLVM_CONFIG) --system-libs) \
> $(@:.bin=.make.output) 2>&1
Expand Down
28 changes: 0 additions & 28 deletions tools/build/feature/test-clang.cpp

This file was deleted.

16 changes: 0 additions & 16 deletions tools/build/feature/test-cxx.cpp

This file was deleted.

12 changes: 0 additions & 12 deletions tools/build/feature/test-llvm-version.cpp

This file was deleted.

14 changes: 0 additions & 14 deletions tools/build/feature/test-llvm.cpp

This file was deleted.

14 changes: 12 additions & 2 deletions tools/lib/perf/include/perf/event.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,18 @@ struct perf_record_switch {
struct perf_record_header_attr {
struct perf_event_header header;
struct perf_event_attr attr;
__u64 id[];
};
/*
* Array of u64 id follows here but we cannot use a flexible array
* because size of attr in the data can be different then current
* version. Please use perf_record_header_attr_id() below.
*
* __u64 id[]; // do not use this
*/
};

/* Returns the pointer to id array based on the actual attr size. */
#define perf_record_header_attr_id(evt) \
((void *)&(evt)->attr.attr + (evt)->attr.attr.size)

enum {
PERF_CPU_MAP__CPUS = 0,
Expand Down
3 changes: 3 additions & 0 deletions tools/perf/Documentation/perf-bench.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ SUBSYSTEM
'internals'::
Benchmark internal perf functionality.

'uprobe'::
Benchmark overhead of uprobe + BPF.

'all'::
All benchmark subsystems.

Expand Down
33 changes: 0 additions & 33 deletions tools/perf/Documentation/perf-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,6 @@ Given a $HOME/.perfconfig like this:
group = true
skip-empty = true

[llvm]
dump-obj = true
clang-opt = -g

You can hide source code of annotate feature setting the config to false with

Expand Down Expand Up @@ -657,36 +654,6 @@ ftrace.*::
-F option is not specified. Possible values are 'function' and
'function_graph'.

llvm.*::
llvm.clang-path::
Path to clang. If omit, search it from $PATH.

llvm.clang-bpf-cmd-template::
Cmdline template. Below lines show its default value. Environment
variable is used to pass options.
"$CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS "\
"-DLINUX_VERSION_CODE=$LINUX_VERSION_CODE " \
"$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \
"-Wno-unused-value -Wno-pointer-sign " \
"-working-directory $WORKING_DIR " \
"-c \"$CLANG_SOURCE\" --target=bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE"

llvm.clang-opt::
Options passed to clang.

llvm.kbuild-dir::
kbuild directory. If not set, use /lib/modules/`uname -r`/build.
If set to "" deliberately, skip kernel header auto-detector.

llvm.kbuild-opts::
Options passed to 'make' when detecting kernel header options.

llvm.dump-obj::
Enable perf dump BPF object files compiled by LLVM.

llvm.opts::
Options passed to llc.

samples.*::

samples.context::
Expand Down
22 changes: 20 additions & 2 deletions tools/perf/Documentation/perf-dlfilter.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@ internal filtering.
If implemented, 'filter_description' should return a one-line description
of the filter, and optionally a longer description.

Do not assume the 'sample' argument is valid (dereferenceable)
after 'filter_event' and 'filter_event_early' return.

Do not assume data referenced by pointers in struct perf_dlfilter_sample
is valid (dereferenceable) after 'filter_event' and 'filter_event_early' return.

The perf_dlfilter_sample structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -150,7 +156,8 @@ struct perf_dlfilter_fns {
const char *(*srcline)(void *ctx, __u32 *line_number);
struct perf_event_attr *(*attr)(void *ctx);
__s32 (*object_code)(void *ctx, __u64 ip, void *buf, __u32 len);
void *(*reserved[120])(void *);
void (*al_cleanup)(void *ctx, struct perf_dlfilter_al *al);
void *(*reserved[119])(void *);
};
----

Expand All @@ -161,7 +168,8 @@ struct perf_dlfilter_fns {
'args' returns arguments from --dlarg options.

'resolve_address' provides information about 'address'. al->size must be set
before calling. Returns 0 on success, -1 otherwise.
before calling. Returns 0 on success, -1 otherwise. Call al_cleanup() (if present,
see below) when 'al' data is no longer needed.

'insn' returns instruction bytes and length.

Expand All @@ -171,6 +179,12 @@ before calling. Returns 0 on success, -1 otherwise.

'object_code' reads object code and returns the number of bytes read.

'al_cleanup' must be called (if present, so check perf_dlfilter_fns.al_cleanup != NULL)
after resolve_address() to free any associated resources.

Do not assume pointers obtained via perf_dlfilter_fns are valid (dereferenceable)
after 'filter_event' and 'filter_event_early' return.

The perf_dlfilter_al structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -197,9 +211,13 @@ struct perf_dlfilter_al {
/* Below members are only populated by resolve_ip() */
__u8 filtered; /* true if this sample event will be filtered out */
const char *comm;
void *priv; /* Private data. Do not change */
};
----

Do not assume data referenced by pointers in struct perf_dlfilter_al
is valid (dereferenceable) after 'filter_event' and 'filter_event_early' return.

perf_dlfilter_sample flags
~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
16 changes: 9 additions & 7 deletions tools/perf/Documentation/perf-ftrace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,9 @@ OPTIONS for 'perf ftrace trace'

--func-opts::
List of options allowed to set:
call-graph - Display kernel stack trace for function tracer.
irq-info - Display irq context info for function tracer.

- call-graph - Display kernel stack trace for function tracer.
- irq-info - Display irq context info for function tracer.

-G::
--graph-funcs=::
Expand All @@ -118,11 +119,12 @@ OPTIONS for 'perf ftrace trace'

--graph-opts::
List of options allowed to set:
nosleep-time - Measure on-CPU time only for function_graph tracer.
noirqs - Ignore functions that happen inside interrupt.
verbose - Show process names, PIDs, timestamps, etc.
thresh=<n> - Setup trace duration threshold in microseconds.
depth=<n> - Set max depth for function graph tracer to follow.

- nosleep-time - Measure on-CPU time only for function_graph tracer.
- noirqs - Ignore functions that happen inside interrupt.
- verbose - Show process names, PIDs, timestamps, etc.
- thresh=<n> - Setup trace duration threshold in microseconds.
- depth=<n> - Set max depth for function graph tracer to follow.


OPTIONS for 'perf ftrace latency'
Expand Down
Loading

0 comments on commit 535a265

Please sign in to comment.