Skip to content

Commit

Permalink
Merge tag 'perf-core-for-mingo-4.13-20170630' of git://git.kernel.org…
Browse files Browse the repository at this point in the history
…/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

Intel PT enhancements:

 - Support "ptwrite" instruction, a way to stuff 32 or 64 bit values into
   the Intel PT trace (Adrian Hunter)

 - Support power events in Intel PT to report changes to C-state (Adrian
   Hunter)

 - Synthesize Intel PT events as PERF_RECORD_SAMPLE records with a
   perf_event_attr.type (PERF_TYPE_SYNTH) just after the range used by the
   kernel, i.e. right after what is allocated for PMUs, at INT_MAX + 1U,
   attr.config will have the identification for the synthesized event and
   the PERF_SAMPLE_RAW payload will have its fields (Adrian Hunter)

Infrastructure changes:

 - Remove warning() and error(), using instead pr_warning() and
   pr_error(), consolidating error reporting (Arnaldo Carvalho de Melo)

 - Add platform dependency to 'perf test 15' (Thomas Richter)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
  • Loading branch information
Ingo Molnar committed Jul 1, 2017
2 parents e91c8d9 + 644e084 commit 23acd3e
Show file tree
Hide file tree
Showing 40 changed files with 1,228 additions and 369 deletions.
2 changes: 1 addition & 1 deletion arch/x86/lib/x86-opcode-map.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1009,7 +1009,7 @@ GrpTable: Grp15
1: fxstor | RDGSBASE Ry (F3),(11B)
2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
4: XSAVE
4: XSAVE | ptwrite Ey (F3),(11B)
5: XRSTOR | lfence (11B)
6: XSAVEOPT | clwb (66) | mfence (11B)
7: clflush | clflushopt (66) | sfence (11B)
Expand Down
35 changes: 29 additions & 6 deletions tools/include/linux/kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#include <stddef.h>
#include <assert.h>
#include <linux/compiler.h>
#include <endian.h>
#include <byteswap.h>

#ifndef UINT_MAX
#define UINT_MAX (~0U)
Expand Down Expand Up @@ -67,12 +69,33 @@
#endif
#endif

/*
* Both need more care to handle endianness
* (Don't use bitmap_copy_le() for now)
*/
#define cpu_to_le64(x) (x)
#define cpu_to_le32(x) (x)
#if __BYTE_ORDER == __BIG_ENDIAN
#define cpu_to_le16 bswap_16
#define cpu_to_le32 bswap_32
#define cpu_to_le64 bswap_64
#define le16_to_cpu bswap_16
#define le32_to_cpu bswap_32
#define le64_to_cpu bswap_64
#define cpu_to_be16
#define cpu_to_be32
#define cpu_to_be64
#define be16_to_cpu
#define be32_to_cpu
#define be64_to_cpu
#else
#define cpu_to_le16
#define cpu_to_le32
#define cpu_to_le64
#define le16_to_cpu
#define le32_to_cpu
#define le64_to_cpu
#define cpu_to_be16 bswap_16
#define cpu_to_be32 bswap_32
#define cpu_to_be64 bswap_64
#define be16_to_cpu bswap_16
#define be32_to_cpu bswap_32
#define be64_to_cpu bswap_64
#endif

int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
int scnprintf(char * buf, size_t size, const char * fmt, ...);
Expand Down
2 changes: 1 addition & 1 deletion tools/objtool/arch/x86/insn/x86-opcode-map.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1009,7 +1009,7 @@ GrpTable: Grp15
1: fxstor | RDGSBASE Ry (F3),(11B)
2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
4: XSAVE
4: XSAVE | ptwrite Ey (F3),(11B)
5: XRSTOR | lfence (11B)
6: XSAVEOPT | clwb (66) | mfence (11B)
7: clflush | clflushopt (66) | sfence (11B)
Expand Down
42 changes: 40 additions & 2 deletions tools/perf/Documentation/intel-pt.txt
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@ approach is available to export the data to a postgresql database. Refer to
script export-to-postgresql.py for more details, and to script
call-graph-from-postgresql.py for an example of using the database.

There is also script intel-pt-events.py which provides an example of how to
unpack the raw data for power events and PTWRITE.

As mentioned above, it is easy to capture too much data. One way to limit the
data captured is to use 'snapshot' mode which is explained further below.
Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
Expand Down Expand Up @@ -710,13 +713,15 @@ Having no option is the same as

which, in turn, is the same as

--itrace=ibxe
--itrace=ibxwpe

The letters are:

i synthesize "instructions" events
b synthesize "branches" events
x synthesize "transactions" events
w synthesize "ptwrite" events
p synthesize "power" events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
e synthesize tracing error events
Expand All @@ -735,7 +740,40 @@ and "r" can be combined to get calls and returns.
'flags' field can be used in perf script to determine whether the event is a
tranasaction start, commit or abort.

Error events are new. They show where the decoder lost the trace. Error events
Note that "instructions", "branches" and "transactions" events depend on code
flow packets which can be disabled by using the config term "branch=0". Refer
to the config terms section above.

"ptwrite" events record the payload of the ptwrite instruction and whether
"fup_on_ptw" was used. "ptwrite" events depend on PTWRITE packets which are
recorded only if the "ptw" config term was used. Refer to the config terms
section above. perf script "synth" field displays "ptwrite" information like
this: "ip: 0 payload: 0x123456789abcdef0" where "ip" is 1 if "fup_on_ptw" was
used.

"Power" events correspond to power event packets and CBR (core-to-bus ratio)
packets. While CBR packets are always recorded when tracing is enabled, power
event packets are recorded only if the "pwr_evt" config term was used. Refer to
the config terms section above. The power events record information about
C-state changes, whereas CBR is indicative of CPU frequency. perf script
"event,synth" fields display information like this:
cbr: cbr: 22 freq: 2189 MHz (200%)
mwait: hints: 0x60 extensions: 0x1
pwre: hw: 0 cstate: 2 sub-cstate: 0
exstop: ip: 1
pwrx: deepest cstate: 2 last cstate: 2 wake reason: 0x4
Where:
"cbr" includes the frequency and the percentage of maximum non-turbo
"mwait" shows mwait hints and extensions
"pwre" shows C-state transitions (to a C-state deeper than C0) and
whether initiated by hardware
"exstop" indicates execution stopped and whether the IP was recorded
exactly,
"pwrx" indicates return to C0
For more details refer to the Intel 64 and IA-32 Architectures Software
Developer Manuals.

Error events show where the decoder lost the trace. Error events
are quite important. Users must know if what they are seeing is a complete
picture or not.

Expand Down
8 changes: 5 additions & 3 deletions tools/perf/Documentation/itrace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
w synthesize ptwrite events
p synthesize power events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
l synthesize last branch entries (use with i or x)
s skip initial number of events

The default is all events i.e. the same as --itrace=ibxe
The default is all events i.e. the same as --itrace=ibxwpe

In addition, the period (default 100000) for instructions events
can be specified in units of:
Expand All @@ -26,8 +28,8 @@
Also the number of last branch entries (default 64, max. 1024) for
instructions or transactions events can be specified.

It is also possible to skip events generated (instructions, branches, transactions)
at the beginning. This is useful to ignore initialization code.
It is also possible to skip events generated (instructions, branches, transactions,
ptwrite, power) at the beginning. This is useful to ignore initialization code.

--itrace=i0nss1000000

Expand Down
6 changes: 5 additions & 1 deletion tools/perf/Documentation/perf-script.txt
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,8 @@ OPTIONS
Comma separated list of fields to print. Options are:
comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
srcline, period, iregs, brstack, brstacksym, flags, bpf-output, brstackinsn, brstackoff,
callindent, insn, insnlen. Field list can be prepended with the type, trace, sw or hw,
callindent, insn, insnlen, synth.
Field list can be prepended with the type, trace, sw or hw,
to indicate to which event type the field list applies.
e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace

Expand Down Expand Up @@ -193,6 +194,9 @@ OPTIONS
instruction bytes and the instruction length of the current
instruction.

The synth field is used by synthesized events which may be created when
Instruction Trace decoding.

Finally, a user may not set fields to none for all event types.
i.e., -F "" is not allowed.

Expand Down
12 changes: 12 additions & 0 deletions tools/perf/arch/x86/tests/insn-x86-dat-32.c
Original file line number Diff line number Diff line change
Expand Up @@ -1664,3 +1664,15 @@
"0f c7 1d 78 56 34 12 \txrstors 0x12345678",},
{{0x0f, 0xc7, 0x9c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 8, 0, "", "",
"0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%eax,%ecx,8)",},
{{0xf3, 0x0f, 0xae, 0x20, }, 4, 0, "", "",
"f3 0f ae 20 \tptwritel (%eax)",},
{{0xf3, 0x0f, 0xae, 0x25, 0x78, 0x56, 0x34, 0x12, }, 8, 0, "", "",
"f3 0f ae 25 78 56 34 12 \tptwritel 0x12345678",},
{{0xf3, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
"f3 0f ae a4 c8 78 56 34 12 \tptwritel 0x12345678(%eax,%ecx,8)",},
{{0xf3, 0x0f, 0xae, 0x20, }, 4, 0, "", "",
"f3 0f ae 20 \tptwritel (%eax)",},
{{0xf3, 0x0f, 0xae, 0x25, 0x78, 0x56, 0x34, 0x12, }, 8, 0, "", "",
"f3 0f ae 25 78 56 34 12 \tptwritel 0x12345678",},
{{0xf3, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
"f3 0f ae a4 c8 78 56 34 12 \tptwritel 0x12345678(%eax,%ecx,8)",},
30 changes: 30 additions & 0 deletions tools/perf/arch/x86/tests/insn-x86-dat-64.c
Original file line number Diff line number Diff line change
Expand Up @@ -1696,3 +1696,33 @@
"0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%rax,%rcx,8)",},
{{0x41, 0x0f, 0xc7, 0x9c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
"41 0f c7 9c c8 78 56 34 12 \txrstors 0x12345678(%r8,%rcx,8)",},
{{0xf3, 0x0f, 0xae, 0x20, }, 4, 0, "", "",
"f3 0f ae 20 \tptwritel (%rax)",},
{{0xf3, 0x41, 0x0f, 0xae, 0x20, }, 5, 0, "", "",
"f3 41 0f ae 20 \tptwritel (%r8)",},
{{0xf3, 0x0f, 0xae, 0x24, 0x25, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
"f3 0f ae 24 25 78 56 34 12 \tptwritel 0x12345678",},
{{0xf3, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
"f3 0f ae a4 c8 78 56 34 12 \tptwritel 0x12345678(%rax,%rcx,8)",},
{{0xf3, 0x41, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
"f3 41 0f ae a4 c8 78 56 34 12 \tptwritel 0x12345678(%r8,%rcx,8)",},
{{0xf3, 0x0f, 0xae, 0x20, }, 4, 0, "", "",
"f3 0f ae 20 \tptwritel (%rax)",},
{{0xf3, 0x41, 0x0f, 0xae, 0x20, }, 5, 0, "", "",
"f3 41 0f ae 20 \tptwritel (%r8)",},
{{0xf3, 0x0f, 0xae, 0x24, 0x25, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
"f3 0f ae 24 25 78 56 34 12 \tptwritel 0x12345678",},
{{0xf3, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
"f3 0f ae a4 c8 78 56 34 12 \tptwritel 0x12345678(%rax,%rcx,8)",},
{{0xf3, 0x41, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
"f3 41 0f ae a4 c8 78 56 34 12 \tptwritel 0x12345678(%r8,%rcx,8)",},
{{0xf3, 0x48, 0x0f, 0xae, 0x20, }, 5, 0, "", "",
"f3 48 0f ae 20 \tptwriteq (%rax)",},
{{0xf3, 0x49, 0x0f, 0xae, 0x20, }, 5, 0, "", "",
"f3 49 0f ae 20 \tptwriteq (%r8)",},
{{0xf3, 0x48, 0x0f, 0xae, 0x24, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
"f3 48 0f ae 24 25 78 56 34 12 \tptwriteq 0x12345678",},
{{0xf3, 0x48, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
"f3 48 0f ae a4 c8 78 56 34 12 \tptwriteq 0x12345678(%rax,%rcx,8)",},
{{0xf3, 0x49, 0x0f, 0xae, 0xa4, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
"f3 49 0f ae a4 c8 78 56 34 12 \tptwriteq 0x12345678(%r8,%rcx,8)",},
30 changes: 30 additions & 0 deletions tools/perf/arch/x86/tests/insn-x86-dat-src.c
Original file line number Diff line number Diff line change
Expand Up @@ -1343,6 +1343,26 @@ int main(void)
asm volatile("xrstors 0x12345678(%rax,%rcx,8)");
asm volatile("xrstors 0x12345678(%r8,%rcx,8)");

/* ptwrite */

asm volatile("ptwrite (%rax)");
asm volatile("ptwrite (%r8)");
asm volatile("ptwrite (0x12345678)");
asm volatile("ptwrite 0x12345678(%rax,%rcx,8)");
asm volatile("ptwrite 0x12345678(%r8,%rcx,8)");

asm volatile("ptwritel (%rax)");
asm volatile("ptwritel (%r8)");
asm volatile("ptwritel (0x12345678)");
asm volatile("ptwritel 0x12345678(%rax,%rcx,8)");
asm volatile("ptwritel 0x12345678(%r8,%rcx,8)");

asm volatile("ptwriteq (%rax)");
asm volatile("ptwriteq (%r8)");
asm volatile("ptwriteq (0x12345678)");
asm volatile("ptwriteq 0x12345678(%rax,%rcx,8)");
asm volatile("ptwriteq 0x12345678(%r8,%rcx,8)");

#else /* #ifdef __x86_64__ */

/* bound r32, mem (same op code as EVEX prefix) */
Expand Down Expand Up @@ -2653,6 +2673,16 @@ int main(void)
asm volatile("xrstors (0x12345678)");
asm volatile("xrstors 0x12345678(%eax,%ecx,8)");

/* ptwrite */

asm volatile("ptwrite (%eax)");
asm volatile("ptwrite (0x12345678)");
asm volatile("ptwrite 0x12345678(%eax,%ecx,8)");

asm volatile("ptwritel (%eax)");
asm volatile("ptwritel (0x12345678)");
asm volatile("ptwritel 0x12345678(%eax,%ecx,8)");

#endif /* #ifndef __x86_64__ */

/* Following line is a marker for the awk script - do not change */
Expand Down
4 changes: 2 additions & 2 deletions tools/perf/builtin-c2c.c
Original file line number Diff line number Diff line change
Expand Up @@ -1725,10 +1725,10 @@ static int c2c_hists__init_sort(struct perf_hpp_list *hpp_list, char *name)
tok; tok = strtok_r(NULL, ", ", &tmp)) { \
ret = _fn(hpp_list, tok); \
if (ret == -EINVAL) { \
error("Invalid --fields key: `%s'", tok); \
pr_err("Invalid --fields key: `%s'", tok); \
break; \
} else if (ret == -ESRCH) { \
error("Unknown --fields key: `%s'", tok); \
pr_err("Unknown --fields key: `%s'", tok); \
break; \
} \
} \
Expand Down
5 changes: 4 additions & 1 deletion tools/perf/builtin-diff.c
Original file line number Diff line number Diff line change
Expand Up @@ -1302,7 +1302,10 @@ static int diff__config(const char *var, const char *value,
void *cb __maybe_unused)
{
if (!strcmp(var, "diff.order")) {
sort_compute = perf_config_int(var, value);
int ret;
if (perf_config_int(&ret, var, value) < 0)
return -1;
sort_compute = ret;
return 0;
}
if (!strcmp(var, "diff.compute")) {
Expand Down
Loading

0 comments on commit 23acd3e

Please sign in to comment.