Skip to content

Commit

Permalink
Merge tag 'perf-core-for-mingo-4.17-20180308' of git://git.kernel.org…
Browse files Browse the repository at this point in the history
…/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

- Support to display the IPC/Cycle in 'annotate' TUI, for systems
  where this info can be obtained, like Intel's >= Skylake (Jin Yao)

- Support wildcards on PMU name in dynamic PMU events (Agustin Vega-Frias)

- Display pmu name when printing unmerged events in stat (Agustin Vega-Frias)

- Auto-merge PMU events created by prefix or glob match (Agustin Vega-Frias)

- Fix s390 'call' operations target function annotation (Thomas Richter)

- Handle s390 PC relative load and store instruction in the augmented
  'annotate', code, used so far in the TUI modes of 'perf report' and
  'perf annotate' (Thomas Richter)

- Provide libtraceevent with a kernel symbol resolver, so that
  symbols in tracepoint fields can be resolved when showing them in
  tools such as 'perf report' (Wang YanQing)

- Refactor the cgroups code to look more like other code in tools/perf,
  using cgroup__{put,get} for refcount operations instead of its
  open-coded equivalent, breaking larger functions, etc (Arnaldo Carvalho de Melo)

- Implement support for the -G/--cgroup target in 'perf trace', allowing
  strace like tracing (plus other events, backtraces, etc) for cgroups
  (Arnaldo Carvalho de Melo)

- Update thread shortname in 'perf sched map' when the thread's COMM
  changes (Changbin Du)

- refcount 'struct mem_info', for better sharing it over several
  users, avoid duplicating structs and fixing crashes related to
  use after free (Jiri Olsa)

- Display perf.data version, offsets in 'perf report --header' (Jiri Olsa)

- Record the machine's memory topology information in a perf.data
  feature section, to be used by tools such as 'perf c2c' (Jiri Olsa)

- Fix output of forced groups in the header for 'perf report' --stdio
  and --tui (Jiri Olsa)

- Better support llvm, clang, cxx make tests in the build process (Jiri Olsa)

- Streamline the 'struct perf_mmap' methods, storing some info in the
  struct instead of passing it via various methods, shortening its
  signatures (Kan Liang)

- Update the quipper perf.data parser library site information (Stephane Eranian)

- Correct perf's man pages title markers for asciidoctor (Takashi Iwai)

- Intel PT fixes and refactorings paving the way for implementing
  support for AUX area sampling (Adrian Hunter)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
  • Loading branch information
Ingo Molnar committed Mar 9, 2018
2 parents 1af22eb + 2427b43 commit fbf8a1e
Show file tree
Hide file tree
Showing 62 changed files with 1,197 additions and 401 deletions.
6 changes: 5 additions & 1 deletion tools/build/Makefile.feature
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,11 @@ FEATURE_TESTS_EXTRA := \
liberty-z \
libunwind-debug-frame \
libunwind-debug-frame-arm \
libunwind-debug-frame-aarch64
libunwind-debug-frame-aarch64 \
cxx \
llvm \
llvm-version \
clang

FEATURE_TESTS ?= $(FEATURE_TESTS_BASIC)

Expand Down
14 changes: 10 additions & 4 deletions tools/build/feature/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,10 @@ FILES= \
test-jvmti.bin \
test-sched_getcpu.bin \
test-setns.bin \
test-libopencsd.bin
test-libopencsd.bin \
test-clang.bin \
test-llvm.bin \
test-llvm-version.bin

FILES := $(addprefix $(OUTPUT),$(FILES))

Expand Down Expand Up @@ -257,11 +260,13 @@ $(OUTPUT)test-llvm.bin:
-I$(shell $(LLVM_CONFIG) --includedir) \
-L$(shell $(LLVM_CONFIG) --libdir) \
$(shell $(LLVM_CONFIG) --libs Core BPF) \
$(shell $(LLVM_CONFIG) --system-libs)
$(shell $(LLVM_CONFIG) --system-libs) \
> $(@:.bin=.make.output) 2>&1

$(OUTPUT)test-llvm-version.bin:
$(BUILDXX) -std=gnu++11 \
-I$(shell $(LLVM_CONFIG) --includedir)
-I$(shell $(LLVM_CONFIG) --includedir) \
> $(@:.bin=.make.output) 2>&1

$(OUTPUT)test-clang.bin:
$(BUILDXX) -std=gnu++11 \
Expand All @@ -271,7 +276,8 @@ $(OUTPUT)test-clang.bin:
-lclangFrontend -lclangEdit -lclangLex \
-lclangAST -Wl,--end-group \
$(shell $(LLVM_CONFIG) --libs Core option) \
$(shell $(LLVM_CONFIG) --system-libs)
$(shell $(LLVM_CONFIG) --system-libs) \
> $(@:.bin=.make.output) 2>&1

-include $(OUTPUT)*.d

Expand Down
2 changes: 1 addition & 1 deletion tools/include/linux/bitmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ static inline int test_and_set_bit(int nr, unsigned long *addr)

/**
* bitmap_alloc - Allocate bitmap
* @nr: Bit to set
* @nbits: Number of bits
*/
static inline unsigned long *bitmap_alloc(int nbits)
{
Expand Down
2 changes: 1 addition & 1 deletion tools/perf/Documentation/perf-data.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
perf-data(1)
==============
============

NAME
----
Expand Down
2 changes: 1 addition & 1 deletion tools/perf/Documentation/perf-ftrace.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
perf-ftrace(1)
=============
==============

NAME
----
Expand Down
2 changes: 1 addition & 1 deletion tools/perf/Documentation/perf-kallsyms.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
perf-kallsyms(1)
==============
================

NAME
----
Expand Down
8 changes: 7 additions & 1 deletion tools/perf/Documentation/perf-list.txt
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,13 @@ on the first memory controller on socket 0 of a Intel Xeon system

Each memory controller has its own PMU. Measuring the complete system
bandwidth would require specifying all imc PMUs (see perf list output),
and adding the values together.
and adding the values together. To simplify creation of multiple events,
prefix and glob matching is supported in the PMU name, and the prefix
'uncore_' is also ignored when performing the match. So the command above
can be expanded to all memory controllers by using the syntaxes:

perf stat -C 0 -a imc/cas_count_read/,imc/cas_count_write/ -I 1000 ...
perf stat -C 0 -a *imc*/cas_count_read/,*imc*/cas_count_write/ -I 1000 ...

This example measures the combined core power every second

Expand Down
2 changes: 1 addition & 1 deletion tools/perf/Documentation/perf-sched.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
perf-sched(1)
==============
=============

NAME
----
Expand Down
2 changes: 1 addition & 1 deletion tools/perf/Documentation/perf-script-perl.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
perf-script-perl(1)
==================
===================

NAME
----
Expand Down
17 changes: 17 additions & 0 deletions tools/perf/Documentation/perf-stat.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ report::
parameters are defined by corresponding entries in
/sys/bus/event_source/devices/<pmu>/format/*

Note that the last two syntaxes support prefix and glob matching in
the PMU name to simplify creation of events accross multiple instances
of the same type of PMU in large systems (e.g. memory controller PMUs).
Multiple PMU instances are typical for uncore PMUs, so the prefix
'uncore_' is also ignored when performing this match.


-i::
--no-inherit::
child tasks do not inherit counters
Expand Down Expand Up @@ -260,6 +267,16 @@ taskset.
--no-merge::
Do not merge results from same PMUs.

When multiple events are created from a single event specification,
stat will, by default, aggregate the event counts and show the result
in a single row. This option disables that behavior and shows
the individual events and counts.

Multiple events are created from a single event specification when:
1. Prefix or glob matching is used for the PMU name.
2. Aliases, which are listed immediately after the Kernel PMU events
by perf list, are used.

--smi-cost::
Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.

Expand Down
25 changes: 25 additions & 0 deletions tools/perf/Documentation/perf-trace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,31 @@ filter out the startup phase of the program, which is often very different.
--uid=::
Record events in threads owned by uid. Name or number.

-G::
--cgroup::
Record events in threads in a cgroup.

Look for cgroups to set at the /sys/fs/cgroup/perf_event directory, then
remove the /sys/fs/cgroup/perf_event/ part and try:

perf trace -G A -e sched:*switch

Will set all raw_syscalls:sys_{enter,exit}, pgfault, vfs_getname, etc
_and_ sched:sched_switch to the 'A' cgroup, while:

perf trace -e sched:*switch -G A

will only set the sched:sched_switch event to the 'A' cgroup, all the
other events (raw_syscalls:sys_{enter,exit}, etc are left "without"
a cgroup (on the root cgroup, sys wide, etc).

Multiple cgroups:

perf trace -G A -e sched:*switch -G B

the syscall ones go to the 'A' cgroup, the sched:sched_switch goes
to the 'B' cgroup.

--filter-pids=::
Filter out events for these pids and for 'trace' itself (comma separated list).

Expand Down
7 changes: 1 addition & 6 deletions tools/perf/Documentation/perf.data-file-format.txt
Original file line number Diff line number Diff line change
Expand Up @@ -485,10 +485,5 @@ in pmu-tools parser. This allows to read perf.data from python and dump it.
quipper

The quipper C++ parser is available at
https://chromium.googlesource.com/chromiumos/platform2
http://github.com/google/perf_data_converter/tree/master/src/quipper

It is under the chromiumos-wide-profiling/ subdirectory. This library can
convert a perf data file to a protobuf and vice versa.

Unfortunately this parser tends to be many versions behind and may not be able
to parse data files generated by recent perf.
6 changes: 3 additions & 3 deletions tools/perf/Makefile.perf
Original file line number Diff line number Diff line change
Expand Up @@ -708,15 +708,15 @@ TAG_FILES= ../../include/uapi/linux/perf_event.h

TAGS:
$(QUIET_GEN)$(RM) TAGS; \
$(FIND) $(TAG_FOLDERS) -name '*.[hcS]' -print | xargs etags -a $(TAG_FILES)
$(FIND) $(TAG_FOLDERS) -name '*.[hcS]' -print -o -name '*.cpp' -print | xargs etags -a $(TAG_FILES)

tags:
$(QUIET_GEN)$(RM) tags; \
$(FIND) $(TAG_FOLDERS) -name '*.[hcS]' -print | xargs ctags -a $(TAG_FILES)
$(FIND) $(TAG_FOLDERS) -name '*.[hcS]' -print -o -name '*.cpp' -print | xargs ctags -a $(TAG_FILES)

cscope:
$(QUIET_GEN)$(RM) cscope*; \
$(FIND) $(TAG_FOLDERS) -name '*.[hcS]' -print | xargs cscope -b $(TAG_FILES)
$(FIND) $(TAG_FOLDERS) -name '*.[hcS]' -print -o -name '*.cpp' -print | xargs cscope -b $(TAG_FILES)

### Testing rules

Expand Down
116 changes: 115 additions & 1 deletion tools/perf/arch/s390/annotate/instructions.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,112 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/compiler.h>

static int s390_call__parse(struct arch *arch, struct ins_operands *ops,
struct map *map)
{
char *endptr, *tok, *name;
struct addr_map_symbol target = {
.map = map,
};

tok = strchr(ops->raw, ',');
if (!tok)
return -1;

ops->target.addr = strtoull(tok + 1, &endptr, 16);

name = strchr(endptr, '<');
if (name == NULL)
return -1;

name++;

if (arch->objdump.skip_functions_char &&
strchr(name, arch->objdump.skip_functions_char))
return -1;

tok = strchr(name, '>');
if (tok == NULL)
return -1;

*tok = '\0';
ops->target.name = strdup(name);
*tok = '>';

if (ops->target.name == NULL)
return -1;
target.addr = map__objdump_2mem(map, ops->target.addr);

if (map_groups__find_ams(&target) == 0 &&
map__rip_2objdump(target.map, map->map_ip(target.map, target.addr)) == ops->target.addr)
ops->target.sym = target.sym;

return 0;
}

static int call__scnprintf(struct ins *ins, char *bf, size_t size,
struct ins_operands *ops);

static struct ins_ops s390_call_ops = {
.parse = s390_call__parse,
.scnprintf = call__scnprintf,
};

static int s390_mov__parse(struct arch *arch __maybe_unused,
struct ins_operands *ops,
struct map *map __maybe_unused)
{
char *s = strchr(ops->raw, ','), *target, *endptr;

if (s == NULL)
return -1;

*s = '\0';
ops->source.raw = strdup(ops->raw);
*s = ',';

if (ops->source.raw == NULL)
return -1;

target = ++s;
ops->target.raw = strdup(target);
if (ops->target.raw == NULL)
goto out_free_source;

ops->target.addr = strtoull(target, &endptr, 16);
if (endptr == target)
goto out_free_target;

s = strchr(endptr, '<');
if (s == NULL)
goto out_free_target;
endptr = strchr(s + 1, '>');
if (endptr == NULL)
goto out_free_target;

*endptr = '\0';
ops->target.name = strdup(s + 1);
*endptr = '>';
if (ops->target.name == NULL)
goto out_free_target;

return 0;

out_free_target:
zfree(&ops->target.raw);
out_free_source:
zfree(&ops->source.raw);
return -1;
}

static int mov__scnprintf(struct ins *ins, char *bf, size_t size,
struct ins_operands *ops);

static struct ins_ops s390_mov_ops = {
.parse = s390_mov__parse,
.scnprintf = mov__scnprintf,
};

static struct ins_ops *s390__associate_ins_ops(struct arch *arch, const char *name)
{
struct ins_ops *ops = NULL;
Expand All @@ -14,9 +120,17 @@ static struct ins_ops *s390__associate_ins_ops(struct arch *arch, const char *na
if (!strcmp(name, "bras") ||
!strcmp(name, "brasl") ||
!strcmp(name, "basr"))
ops = &call_ops;
ops = &s390_call_ops;
if (!strcmp(name, "br"))
ops = &ret_ops;
/* override load/store relative to PC */
if (!strcmp(name, "lrl") ||
!strcmp(name, "lgrl") ||
!strcmp(name, "lgfrl") ||
!strcmp(name, "llgfrl") ||
!strcmp(name, "strl") ||
!strcmp(name, "stgrl"))
ops = &s390_mov_ops;

if (ops)
arch__associate_ins_ops(arch, name, ops);
Expand Down
7 changes: 3 additions & 4 deletions tools/perf/arch/x86/tests/perf-time-to-tsc.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe
u64 test_tsc, comm1_tsc, comm2_tsc;
u64 test_time, comm1_time = 0, comm2_time = 0;
struct perf_mmap *md;
u64 end, start;

threads = thread_map__new(-1, getpid(), UINT_MAX);
CHECK_NOT_NULL__(threads);
Expand Down Expand Up @@ -112,10 +111,10 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe

for (i = 0; i < evlist->nr_mmaps; i++) {
md = &evlist->mmap[i];
if (perf_mmap__read_init(md, false, &start, &end) < 0)
if (perf_mmap__read_init(md) < 0)
continue;

while ((event = perf_mmap__read_event(md, false, &start, end)) != NULL) {
while ((event = perf_mmap__read_event(md)) != NULL) {
struct perf_sample sample;

if (event->header.type != PERF_RECORD_COMM ||
Expand All @@ -134,7 +133,7 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe
comm2_time = sample.time;
}
next_event:
perf_mmap__consume(md, false);
perf_mmap__consume(md);
}
perf_mmap__read_done(md);
}
Expand Down
14 changes: 5 additions & 9 deletions tools/perf/arch/x86/util/auxtrace.c
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,11 @@ struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);

if (evlist) {
evlist__for_each_entry(evlist, evsel) {
if (intel_pt_pmu &&
evsel->attr.type == intel_pt_pmu->type)
found_pt = true;
if (intel_bts_pmu &&
evsel->attr.type == intel_bts_pmu->type)
found_bts = true;
}
evlist__for_each_entry(evlist, evsel) {
if (intel_pt_pmu && evsel->attr.type == intel_pt_pmu->type)
found_pt = true;
if (intel_bts_pmu && evsel->attr.type == intel_bts_pmu->type)
found_bts = true;
}

if (found_pt && found_bts) {
Expand Down
Loading

0 comments on commit fbf8a1e

Please sign in to comment.