Skip to content

Commit

Permalink
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/…
Browse files Browse the repository at this point in the history
…linux/kernel/git/tip/tip

Pull more perf tooling updates from Thomas Gleixner:
 "Perf tool updates and fixes:

  perf stat:

   - Display user and system time for workload targets (Jiri Olsa)

  perf record:

   - Enable arbitrary event names thru name= modifier (Alexey Budankov)

  PowerPC:

   - Add a python script for hypervisor call statistics (Ravi Bangoria)

  Intel PT: (Adrian Hunter)

   - Fix sync_switch INTEL_PT_SS_NOT_TRACING

   - Fix decoding to accept CBR between FUP and corresponding TIP

   - Fix MTC timing after overflow

   - Fix "Unexpected indirect branch" error

  perf test:

   - record+probe_libc_inet_pton:
      - To get the symbol table for dynamic shared objects on ubuntu we
        need to pass the -D/--dynamic command line option, unlike with
        the fedora distros (Arnaldo Carvalho de Melo)

   - code-reading:
      - Fix perf_env setup for PTI entry trampolines (Adrian Hunter)

   - kmod-path:
      - Add tests for vdso32 and vdsox32 (Adrian Hunter)

   - Use header file util/debug.h (Thomas Richter)

  perf annotate:

   - Make the various UI backends (stdio, TUI, gtk) use more
     consistently structs with annotation options as specified by the
     user (Arnaldo Carvalho de Melo)

   - Move annotation specific knobs from the symbol_conf global kitchen
     sink to the annotation option structs (Arnaldo Carvalho de Melo)

  perf script:

   - Add more PMU fields to python scripts event handler dict (Jin Yao)

  Core:

   - Fix misleading error for some unparsable events mentioning PMUs
     when those are not involved in the problem (Jiri Olsa)

   - Consider BSS symbols when processing /proc/kallsyms ('B' and 'b')
     (Arnaldo Carvalho de Melo)

   - Be more robust when trying to use per-symbol histograms, checking
     for unlikely but possible cases where the space for the histograms
     wasn't allocated, print a debug message for such cases (Arnaldo
     Carvalho de Melo)

   - Fix symbol and object code resolution for vdso32 and vdsox32
     (Adrian Hunter)

   - No need to check for null when passing pointers to foo__get() style
     refcount grabbing helpers, just like in the kernel and with free(),
     its safe to pass a NULL pointer to avoid having to check it before
     each and every foo__get() call (Arnaldo Carvalho de Melo)

   - Remove some dead code (quote.[ch]) (Arnaldo Carvalho de Melo)

   - Remove some needless globals, making them local (Arnaldo Carvalho
     de Melo)

   - Reduce usage of symbol_conf.use_callchain, using other means of
     finding out if callchains are in use or available for specific
     events, as we evolved this codebase to allow requesting callchains
     for just a subset of the monitored events. In time it will help
     polish recording and showing mixed sets accross the various tools:

        perf record -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions'

     (Arnaldo Carvalho de Melo)

   - Consider PTI entry trampolines in map__rip_2objdump() (Adrian
     Hunter)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  perf script python: Add dict fields introduction to Documentation
  perf script python: Add more PMU fields to event handler dict
  perf script python: Move dsoname code to a new function
  perf symbols: Add BSS symbols when reading from /proc/kallsyms
  perf annnotate: Make __symbol__inc_addr_samples handle src->histograms == NULL
  perf intel-pt: Fix "Unexpected indirect branch" error
  perf intel-pt: Fix MTC timing after overflow
  perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP
  perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING
  perf script powerpc: Python script for hypervisor call statistics
  perf test record+probe_libc_inet_pton: Ask 'nm' for dynamic symbols
  perf map: Consider PTI entry trampolines in rip_2objdump()
  perf test code-reading: Fix perf_env setup for PTI entry trampolines
  perf tools: Fix pmu events parsing rule
  perf stat: Display user and system time
  perf record: Enable arbitrary event names thru name= modifier
  perf tools: Fix symbol and object code resolution for vdso32 and vdsox32
  perf tests kmod-path: Add tests for vdso32 and vdsox32
  perf hists: Check if a hist_entry has callchains before using them
  perf hists: Introduce hist_entry__has_callchain() method
  ...
  • Loading branch information
Linus Torvalds committed Jun 10, 2018
2 parents 9f3fbe8 + 2696ec4 commit 2322d6c
Show file tree
Hide file tree
Showing 59 changed files with 998 additions and 427 deletions.
6 changes: 5 additions & 1 deletion tools/perf/Documentation/perf-list.txt
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,11 @@ The available PMUs and their raw parameters can be listed with
For example the raw event "LSD.UOPS" core pmu event above could
be specified as

perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=1/ ...
perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=0x1/ ...

or using extended name syntax

perf stat -e cpu/event=0xa8,umask=0x1,cmask=0x1,name=\'LSD.UOPS_CYCLES:cmask=0x1\'/ ...

PER SOCKET PMUS
---------------
Expand Down
3 changes: 3 additions & 0 deletions tools/perf/Documentation/perf-record.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ OPTIONS
FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
"no" for disable callgraph.
- 'stack-size': user stack size for dwarf mode
- 'name' : User defined event name. Single quotes (') may be used to
escape symbols in the name from parsing by shell and tool
like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.

See the linkperf:perf-list[1] man page for more parameters.

Expand Down
26 changes: 26 additions & 0 deletions tools/perf/Documentation/perf-script-python.txt
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,32 @@ Various utility functions for use with perf script:
nsecs_str(nsecs) - returns printable string in the form secs.nsecs
avg(total, n) - returns average given a sum and a total number of values

SUPPORTED FIELDS
----------------

Currently supported fields:

ev_name, comm, pid, tid, cpu, ip, time, period, phys_addr, addr,
symbol, dso, time_enabled, time_running, values, callchain,
brstack, brstacksym, datasrc, datasrc_decode, iregs, uregs,
weight, transaction, raw_buf, attr.

Some fields have sub items:

brstack:
from, to, from_dsoname, to_dsoname, mispred,
predicted, in_tx, abort, cycles.

brstacksym:
items: from, to, pred, in_tx, abort (converted string)

For example,
We can use this code to print brstack "from", "to", "cycles".

if 'brstack' in dict:
for entry in dict['brstack']:
print "from %s, to %s, cycles %s" % (entry["from"], entry["to"], entry["cycles"])

SEE ALSO
--------
linkperf:perf-script[1]
40 changes: 29 additions & 11 deletions tools/perf/Documentation/perf-stat.txt
Original file line number Diff line number Diff line change
Expand Up @@ -310,20 +310,38 @@ Users who wants to get the actual value can apply --no-metric-only.
EXAMPLES
--------

$ perf stat -- make -j
$ perf stat -- make

Performance counter stats for 'make -j':
Performance counter stats for 'make':

8117.370256 task clock ticks # 11.281 CPU utilization factor
678 context switches # 0.000 M/sec
133 CPU migrations # 0.000 M/sec
235724 pagefaults # 0.029 M/sec
24821162526 CPU cycles # 3057.784 M/sec
18687303457 instructions # 2302.138 M/sec
172158895 cache references # 21.209 M/sec
27075259 cache misses # 3.335 M/sec
83723.452481 task-clock:u (msec) # 1.004 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
3,228,188 page-faults:u # 0.039 M/sec
229,570,665,834 cycles:u # 2.742 GHz
313,163,853,778 instructions:u # 1.36 insn per cycle
69,704,684,856 branches:u # 832.559 M/sec
2,078,861,393 branch-misses:u # 2.98% of all branches

Wall-clock time elapsed: 719.554352 msecs
83.409183620 seconds time elapsed

74.684747000 seconds user
8.739217000 seconds sys

TIMINGS
-------
As displayed in the example above we can display 3 types of timings.
We always display the time the counters were enabled/alive:

83.409183620 seconds time elapsed

For workload sessions we also display time the workloads spent in
user/system lands:

74.684747000 seconds user
8.739217000 seconds sys

Those times are the very same as displayed by the 'time' tool.

CSV FORMAT
----------
Expand Down
4 changes: 2 additions & 2 deletions tools/perf/arch/common.c
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
return -1;
}

int perf_env__lookup_objdump(struct perf_env *env)
int perf_env__lookup_objdump(struct perf_env *env, const char **path)
{
/*
* For live mode, env->arch will be NULL and we can use
Expand All @@ -198,5 +198,5 @@ int perf_env__lookup_objdump(struct perf_env *env)
if (env->arch == NULL)
return 0;

return perf_env__lookup_binutils_path(env, "objdump", &objdump_path);
return perf_env__lookup_binutils_path(env, "objdump", path);
}
4 changes: 1 addition & 3 deletions tools/perf/arch/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@

#include "../util/env.h"

extern const char *objdump_path;

int perf_env__lookup_objdump(struct perf_env *env);
int perf_env__lookup_objdump(struct perf_env *env, const char **path);

#endif /* ARCH_PERF_COMMON_H */
36 changes: 18 additions & 18 deletions tools/perf/builtin-annotate.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,8 @@
struct perf_annotate {
struct perf_tool tool;
struct perf_session *session;
struct annotation_options opts;
bool use_tui, use_stdio, use_stdio2, use_gtk;
bool full_paths;
bool print_line;
bool skip_missing;
bool has_br_stack;
bool group_set;
Expand Down Expand Up @@ -162,12 +161,12 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter,
hist__account_cycles(sample->branch_stack, al, sample, false);

bi = he->branch_info;
err = addr_map_symbol__inc_samples(&bi->from, sample, evsel->idx);
err = addr_map_symbol__inc_samples(&bi->from, sample, evsel);

if (err)
goto out;

err = addr_map_symbol__inc_samples(&bi->to, sample, evsel->idx);
err = addr_map_symbol__inc_samples(&bi->to, sample, evsel);

out:
return err;
Expand Down Expand Up @@ -249,7 +248,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
if (he == NULL)
return -ENOMEM;

ret = hist_entry__inc_addr_samples(he, sample, evsel->idx, al->addr);
ret = hist_entry__inc_addr_samples(he, sample, evsel, al->addr);
hists__inc_nr_samples(hists, true);
return ret;
}
Expand Down Expand Up @@ -289,10 +288,9 @@ static int hist_entry__tty_annotate(struct hist_entry *he,
struct perf_annotate *ann)
{
if (!ann->use_stdio2)
return symbol__tty_annotate(he->ms.sym, he->ms.map, evsel,
ann->print_line, ann->full_paths, 0, 0);
return symbol__tty_annotate2(he->ms.sym, he->ms.map, evsel,
ann->print_line, ann->full_paths);
return symbol__tty_annotate(he->ms.sym, he->ms.map, evsel, &ann->opts);

return symbol__tty_annotate2(he->ms.sym, he->ms.map, evsel, &ann->opts);
}

static void hists__find_annotations(struct hists *hists,
Expand Down Expand Up @@ -343,7 +341,7 @@ static void hists__find_annotations(struct hists *hists,
/* skip missing symbols */
nd = rb_next(nd);
} else if (use_browser == 1) {
key = hist_entry__tui_annotate(he, evsel, NULL);
key = hist_entry__tui_annotate(he, evsel, NULL, &ann->opts);

switch (key) {
case -1:
Expand Down Expand Up @@ -390,8 +388,9 @@ static int __cmd_annotate(struct perf_annotate *ann)
goto out;
}

if (!objdump_path) {
ret = perf_env__lookup_objdump(&session->header.env);
if (!ann->opts.objdump_path) {
ret = perf_env__lookup_objdump(&session->header.env,
&ann->opts.objdump_path);
if (ret)
goto out;
}
Expand Down Expand Up @@ -476,6 +475,7 @@ int cmd_annotate(int argc, const char **argv)
.ordered_events = true,
.ordering_requires_timestamps = true,
},
.opts = annotation__default_options,
};
struct perf_data data = {
.mode = PERF_DATA_MODE_READ,
Expand Down Expand Up @@ -503,9 +503,9 @@ int cmd_annotate(int argc, const char **argv)
"file", "vmlinux pathname"),
OPT_BOOLEAN('m', "modules", &symbol_conf.use_modules,
"load module symbols - WARNING: use only with -k and LIVE kernel"),
OPT_BOOLEAN('l', "print-line", &annotate.print_line,
OPT_BOOLEAN('l', "print-line", &annotate.opts.print_lines,
"print matching source lines (may be slow)"),
OPT_BOOLEAN('P', "full-paths", &annotate.full_paths,
OPT_BOOLEAN('P', "full-paths", &annotate.opts.full_path,
"Don't shorten the displayed pathnames"),
OPT_BOOLEAN(0, "skip-missing", &annotate.skip_missing,
"Skip symbols that cannot be annotated"),
Expand All @@ -516,13 +516,13 @@ int cmd_annotate(int argc, const char **argv)
OPT_CALLBACK(0, "symfs", NULL, "directory",
"Look for files with symbols relative to this directory",
symbol__config_symfs),
OPT_BOOLEAN(0, "source", &symbol_conf.annotate_src,
OPT_BOOLEAN(0, "source", &annotate.opts.annotate_src,
"Interleave source code with assembly code (default)"),
OPT_BOOLEAN(0, "asm-raw", &symbol_conf.annotate_asm_raw,
OPT_BOOLEAN(0, "asm-raw", &annotate.opts.show_asm_raw,
"Display raw encoding of assembly instructions (default)"),
OPT_STRING('M', "disassembler-style", &disassembler_style, "disassembler style",
OPT_STRING('M', "disassembler-style", &annotate.opts.disassembler_style, "disassembler style",
"Specify disassembler style (e.g. -M intel for intel syntax)"),
OPT_STRING(0, "objdump", &objdump_path, "path",
OPT_STRING(0, "objdump", &annotate.opts.objdump_path, "path",
"objdump binary to use for disassembly and annotations"),
OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
"Show event group information together"),
Expand Down
2 changes: 1 addition & 1 deletion tools/perf/builtin-c2c.c
Original file line number Diff line number Diff line change
Expand Up @@ -1976,7 +1976,7 @@ static int filter_cb(struct hist_entry *he)
c2c_he = container_of(he, struct c2c_hist_entry, he);

if (c2c.show_src && !he->srcline)
he->srcline = hist_entry__get_srcline(he);
he->srcline = hist_entry__srcline(he);

calc_width(c2c_he);

Expand Down
2 changes: 0 additions & 2 deletions tools/perf/builtin-kvm.c
Original file line number Diff line number Diff line change
Expand Up @@ -1438,8 +1438,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
goto out;
}

symbol_conf.nr_events = kvm->evlist->nr_entries;

if (perf_evlist__create_maps(kvm->evlist, &kvm->opts.target) < 0)
usage_with_options(live_usage, live_options);

Expand Down
3 changes: 1 addition & 2 deletions tools/perf/builtin-probe.c
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,7 @@ static int parse_probe_event(const char *str)
params.target_used = true;
}

if (params.nsi)
pev->nsi = nsinfo__get(params.nsi);
pev->nsi = nsinfo__get(params.nsi);

/* Parse a perf-probe command into event */
ret = parse_perf_probe_command(str, pev);
Expand Down
39 changes: 19 additions & 20 deletions tools/perf/builtin-report.c
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ struct report {
bool group_set;
int max_stack;
struct perf_read_values show_threads_values;
struct annotation_options annotation_opts;
const char *pretty_printing_style;
const char *cpu_list;
const char *symbol_filter_str;
Expand Down Expand Up @@ -136,26 +137,25 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,

if (sort__mode == SORT_MODE__BRANCH) {
bi = he->branch_info;
err = addr_map_symbol__inc_samples(&bi->from, sample, evsel->idx);
err = addr_map_symbol__inc_samples(&bi->from, sample, evsel);
if (err)
goto out;

err = addr_map_symbol__inc_samples(&bi->to, sample, evsel->idx);
err = addr_map_symbol__inc_samples(&bi->to, sample, evsel);

} else if (rep->mem_mode) {
mi = he->mem_info;
err = addr_map_symbol__inc_samples(&mi->daddr, sample, evsel->idx);
err = addr_map_symbol__inc_samples(&mi->daddr, sample, evsel);
if (err)
goto out;

err = hist_entry__inc_addr_samples(he, sample, evsel->idx, al->addr);
err = hist_entry__inc_addr_samples(he, sample, evsel, al->addr);

} else if (symbol_conf.cumulate_callchain) {
if (single)
err = hist_entry__inc_addr_samples(he, sample, evsel->idx,
al->addr);
err = hist_entry__inc_addr_samples(he, sample, evsel, al->addr);
} else {
err = hist_entry__inc_addr_samples(he, sample, evsel->idx, al->addr);
err = hist_entry__inc_addr_samples(he, sample, evsel, al->addr);
}

out:
Expand All @@ -181,11 +181,11 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter,
rep->nonany_branch_mode);

bi = he->branch_info;
err = addr_map_symbol__inc_samples(&bi->from, sample, evsel->idx);
err = addr_map_symbol__inc_samples(&bi->from, sample, evsel);
if (err)
goto out;

err = addr_map_symbol__inc_samples(&bi->to, sample, evsel->idx);
err = addr_map_symbol__inc_samples(&bi->to, sample, evsel);

branch_type_count(&rep->brtype_stat, &bi->flags,
bi->from.addr, bi->to.addr);
Expand Down Expand Up @@ -561,7 +561,7 @@ static int report__browse_hists(struct report *rep)
ret = perf_evlist__tui_browse_hists(evlist, help, NULL,
rep->min_percent,
&session->header.env,
true);
true, &rep->annotation_opts);
/*
* Usually "ret" is the last pressed key, and we only
* care if the key notifies us to switch data file.
Expand Down Expand Up @@ -946,12 +946,6 @@ parse_percent_limit(const struct option *opt, const char *str,
return 0;
}

#define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent"

const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
CALLCHAIN_REPORT_HELP
"\n\t\t\t\tDefault: " CALLCHAIN_DEFAULT_OPT;

int cmd_report(int argc, const char **argv)
{
struct perf_session *session;
Expand All @@ -960,6 +954,10 @@ int cmd_report(int argc, const char **argv)
bool has_br_stack = false;
int branch_mode = -1;
bool branch_call_mode = false;
#define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent"
const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
CALLCHAIN_REPORT_HELP
"\n\t\t\t\tDefault: " CALLCHAIN_DEFAULT_OPT;
char callchain_default_opt[] = CALLCHAIN_DEFAULT_OPT;
const char * const report_usage[] = {
"perf report [<options>]",
Expand Down Expand Up @@ -989,6 +987,7 @@ int cmd_report(int argc, const char **argv)
.max_stack = PERF_MAX_STACK_DEPTH,
.pretty_printing_style = "normal",
.socket_filter = -1,
.annotation_opts = annotation__default_options,
};
const struct option options[] = {
OPT_STRING('i', "input", &input_name, "file",
Expand Down Expand Up @@ -1078,11 +1077,11 @@ int cmd_report(int argc, const char **argv)
"list of cpus to profile"),
OPT_BOOLEAN('I', "show-info", &report.show_full_info,
"Display extended information about perf.data file"),
OPT_BOOLEAN(0, "source", &symbol_conf.annotate_src,
OPT_BOOLEAN(0, "source", &report.annotation_opts.annotate_src,
"Interleave source code with assembly code (default)"),
OPT_BOOLEAN(0, "asm-raw", &symbol_conf.annotate_asm_raw,
OPT_BOOLEAN(0, "asm-raw", &report.annotation_opts.show_asm_raw,
"Display raw encoding of assembly instructions (default)"),
OPT_STRING('M', "disassembler-style", &disassembler_style, "disassembler style",
OPT_STRING('M', "disassembler-style", &report.annotation_opts.disassembler_style, "disassembler style",
"Specify disassembler style (e.g. -M intel for intel syntax)"),
OPT_BOOLEAN(0, "show-total-period", &symbol_conf.show_total_period,
"Show a column with the sum of periods"),
Expand All @@ -1093,7 +1092,7 @@ int cmd_report(int argc, const char **argv)
parse_branch_mode),
OPT_BOOLEAN(0, "branch-history", &branch_call_mode,
"add last branch records to call history"),
OPT_STRING(0, "objdump", &objdump_path, "path",
OPT_STRING(0, "objdump", &report.annotation_opts.objdump_path, "path",
"objdump binary to use for disassembly and annotations"),
OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
"Disable symbol demangling"),
Expand Down
Loading

0 comments on commit 2322d6c

Please sign in to comment.