Skip to content

Commit

Permalink
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/li…
Browse files Browse the repository at this point in the history
…nux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Main perf kernel side changes:

   - uprobes updates/fixes.  (Oleg Nesterov)

   - Add PERF_RECORD_SWITCH to indicate context switches and use it in
     tooling.  (Adrian Hunter)

   - Support BPF programs attached to uprobes and first steps for BPF
     tooling support.  (Wang Nan)

   - x86 generic x86 MSR-to-perf PMU driver.  (Andy Lutomirski)

   - x86 Intel PT, LBR and BTS updates.  (Alexander Shishkin)

   - x86 Intel Skylake support.  (Andi Kleen)

   - x86 Intel Knights Landing (KNL) RAPL support.  (Dasaratharaman
     Chandramouli)

   - x86 Intel Broadwell-DE uncore support.  (Kan Liang)

   - x86 hw breakpoints robustization (Andy Lutomirski)

  Main perf tooling side changes:

   - Support Intel PT in several tools, enabling the use of the
     processor trace feature introduced in Intel Broadwell processors:
     (Adrian Hunter)

       # dmesg | grep Performance
       # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
       # perf record -e intel_pt//u -a sleep 1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.216 MB perf.data ]
       # perf script # then navigate in the tool output to some area, like this one:
       184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so)
       185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so)
       186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so)
       187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so)
       188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so)
       189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so)
       190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so)
       191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so)
       192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so)
       193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so)
       194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so)
       195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so)
       196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so)
       197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so)
       198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so)
       199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so)
       200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) =>  7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so)
       201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so)

   - Add support for using several Intel PT features (CYC, MTC packets),
     the relevant documentation was updated in:
         tools/perf/Documentation/intel-pt.txt
     briefly describing those packets, its purposes, how to configure
     them in the event config terms and relevant external documentation
     for further reading.  (Adrian Hunter)

   - Introduce support for probing at an absolute address, for user and
     kernel 'perf probe's, useful when one have the symbol maps on a
     developer machine but not on an embedded system.  (Wang Nan)

   - Add Intel BTS support, with a call-graph script to show it and PT
     in use in a GUI using 'perf script' python scripting with
     postgresql and Qt.  (Adrian Hunter)

   - Allow selecting the type of callchains per event, including
     disabling callchains in all but one entry in an event list, to save
     space, and also to ask for the callchains collected in one event to
     be used in other events.  (Kan Liang)

   - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho
     de Melo)
       * A bunch more translate file/pathnames from pointers to strings.
       * Convert numbers to strings for the 'keyctl' syscall 'option'
         arg.
       * Add missing 'clockid' entries.

   - Introduce 'srcfile' sort key: (Andi Kleen)

       # perf record -F 10000 usleep 1
       # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile
       <SNIP>
       # Overhead  Source File
          26.49%  copy_page_64.S
           5.49%  signal.c
           0.51%  msr.h
       #

     It can be combined with other fields, for instance, experiment with
     '-s srcfile,symbol'.

     There are some oddities in some distros and with some specific
     DSOs, being investigated, so your mileage may vary.

   - Support per-event 'freq' term: (Namhyung Kim)

       $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1
       $ perf evlist -F
       cpu/instructions,freq=1234/: sample_freq=1234
       cycles: sample_period=1000
       $

   - Deref sys_enter pointer args with contents from probe:vfs_getname,
     showing pathnames instead of pointers in many syscalls in 'perf
     trace'.  (Arnaldo Carvalho de Melo)

   - Stop collecting /proc/kallsyms in perf.data files, saving about
     4.5MB on a typical x86-64 system, use the the symbol resolution
     routines used in all the other tools (report, top, etc) now that we
     can ask libtraceevent to use perf's symbol resolution code.
     (Arnaldo Carvalho de Melo)

   - Allow filtering out of perf's PID via 'perf record --exclude-perf'.
     (Wang Nan)

   - 'perf trace' now supports syscall groups, like strace, i.e:

       $ trace -e file touch file

     Will expand 'file' into multiple, file related, syscalls.  More
     work needed to add extra groups for other syscall groups, and also
     to complement what was added for the 'file' group, included as a
     proof of concept.  (Arnaldo Carvalho de Melo)

   - Add lock_pi stresser to 'perf bench futex', to test the kernel code
     related to FUTEX_(UN)LOCK_PI.  (Davidlohr Bueso)

   - Let user have timestamps with per-thread recording in 'perf record'
     (Adrian Hunter)

   - ... and tons of other changes, see the shortlog and the Git log for
     details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits)
  perf evlist: Add backpointer for perf_env to evlist
  perf tools: Rename perf_session_env to perf_env
  perf tools: Do not change lib/api/fs/debugfs directly
  perf tools: Add tracing_path and remove unneeded functions
  perf buildid: Introduce sysfs/filename__sprintf_build_id
  perf evsel: Add a backpointer to the evlist a evsel is in
  perf trace: Add header with copyright and background info
  perf scripts python: Add new compaction-times script
  perf stat: Get correct cpu id for print_aggr
  tools lib traceeveent: Allow for negative numbers in print format
  perf script: Add --[no-]-demangle/--[no-]-demangle-kernel
  tracing/uprobes: Do not print '0x (null)' when offset is 0
  perf probe: Support probing at absolute address
  perf probe: Fix error reported when offset without function
  perf probe: Fix list result when address is zero
  perf probe: Fix list result when symbol can't be found
  tools build: Allow duplicate objects in the object list
  perf tools: Remove export.h from MANIFEST
  perf probe: Prevent segfault when reading probe point with absolute address
  perf tools: Update Intel PT documentation
  ...
  • Loading branch information
Linus Torvalds committed Sep 1, 2015
2 parents 4658000 + bac2e4a commit 41d859a
Show file tree
Hide file tree
Showing 206 changed files with 19,187 additions and 1,584 deletions.
14 changes: 14 additions & 0 deletions arch/x86/include/asm/msr-index.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,20 +73,34 @@
#define MSR_LBR_CORE_FROM 0x00000040
#define MSR_LBR_CORE_TO 0x00000060

#define MSR_LBR_INFO_0 0x00000dc0 /* ... 0xddf for _31 */
#define LBR_INFO_MISPRED BIT_ULL(63)
#define LBR_INFO_IN_TX BIT_ULL(62)
#define LBR_INFO_ABORT BIT_ULL(61)
#define LBR_INFO_CYCLES 0xffff

#define MSR_IA32_PEBS_ENABLE 0x000003f1
#define MSR_IA32_DS_AREA 0x00000600
#define MSR_IA32_PERF_CAPABILITIES 0x00000345
#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6

#define MSR_IA32_RTIT_CTL 0x00000570
#define RTIT_CTL_TRACEEN BIT(0)
#define RTIT_CTL_CYCLEACC BIT(1)
#define RTIT_CTL_OS BIT(2)
#define RTIT_CTL_USR BIT(3)
#define RTIT_CTL_CR3EN BIT(7)
#define RTIT_CTL_TOPA BIT(8)
#define RTIT_CTL_MTC_EN BIT(9)
#define RTIT_CTL_TSC_EN BIT(10)
#define RTIT_CTL_DISRETC BIT(11)
#define RTIT_CTL_BRANCH_EN BIT(13)
#define RTIT_CTL_MTC_RANGE_OFFSET 14
#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
#define RTIT_CTL_CYC_THRESH_OFFSET 19
#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
#define RTIT_CTL_PSB_FREQ_OFFSET 24
#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
#define MSR_IA32_RTIT_STATUS 0x00000571
#define RTIT_STATUS_CONTEXTEN BIT(1)
#define RTIT_STATUS_TRIGGEREN BIT(2)
Expand Down
7 changes: 7 additions & 0 deletions arch/x86/include/asm/perf_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,13 @@ struct x86_pmu_capability {
*/
#define INTEL_PMC_IDX_FIXED_BTS (INTEL_PMC_IDX_FIXED + 16)

#define GLOBAL_STATUS_COND_CHG BIT_ULL(63)
#define GLOBAL_STATUS_BUFFER_OVF BIT_ULL(62)
#define GLOBAL_STATUS_UNC_OVF BIT_ULL(61)
#define GLOBAL_STATUS_ASIF BIT_ULL(60)
#define GLOBAL_STATUS_COUNTERS_FROZEN BIT_ULL(59)
#define GLOBAL_STATUS_LBRS_FROZEN BIT_ULL(58)

/*
* IBS cpuid feature detection
*/
Expand Down
1 change: 1 addition & 0 deletions arch/x86/include/asm/tsc.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ extern int unsynchronized_tsc(void);
extern int check_tsc_unstable(void);
extern int check_tsc_disabled(void);
extern unsigned long native_calibrate_tsc(void);
extern unsigned long long native_sched_clock_from_tsc(u64 tsc);

extern int tsc_clocksource_reliable;

Expand Down
2 changes: 2 additions & 0 deletions arch/x86/kernel/cpu/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \
perf_event_intel_uncore_snb.o \
perf_event_intel_uncore_snbep.o \
perf_event_intel_uncore_nhmex.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_msr.o
obj-$(CONFIG_CPU_SUP_AMD) += perf_event_msr.o
endif


Expand Down
39 changes: 12 additions & 27 deletions arch/x86/kernel/cpu/intel_pt.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,32 +25,11 @@
*/
#define TOPA_PMI_MARGIN 512

/*
* Table of Physical Addresses bits
*/
enum topa_sz {
TOPA_4K = 0,
TOPA_8K,
TOPA_16K,
TOPA_32K,
TOPA_64K,
TOPA_128K,
TOPA_256K,
TOPA_512K,
TOPA_1MB,
TOPA_2MB,
TOPA_4MB,
TOPA_8MB,
TOPA_16MB,
TOPA_32MB,
TOPA_64MB,
TOPA_128MB,
TOPA_SZ_END,
};
#define TOPA_SHIFT 12

static inline unsigned int sizes(enum topa_sz tsz)
static inline unsigned int sizes(unsigned int tsz)
{
return 1 << (tsz + 12);
return 1 << (tsz + TOPA_SHIFT);
};

struct topa_entry {
Expand All @@ -66,20 +45,26 @@ struct topa_entry {
u64 rsvd4 : 16;
};

#define TOPA_SHIFT 12
#define PT_CPUID_LEAVES 2
#define PT_CPUID_LEAVES 2
#define PT_CPUID_REGS_NUM 4 /* number of regsters (eax, ebx, ecx, edx) */

enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
PT_CAP_psb_cyc,
PT_CAP_mtc,
PT_CAP_topa_output,
PT_CAP_topa_multiple_entries,
PT_CAP_single_range_output,
PT_CAP_payloads_lip,
PT_CAP_mtc_periods,
PT_CAP_cycle_thresholds,
PT_CAP_psb_periods,
};

struct pt_pmu {
struct pmu pmu;
u32 caps[4 * PT_CPUID_LEAVES];
u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
};

/**
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kernel/cpu/perf_event.c
Original file line number Diff line number Diff line change
Expand Up @@ -1551,7 +1551,7 @@ static void __init filter_events(struct attribute **attrs)
}

/* Merge two pointer arrays */
static __init struct attribute **merge_attr(struct attribute **a, struct attribute **b)
__init struct attribute **merge_attr(struct attribute **a, struct attribute **b)
{
struct attribute **new;
int j, i;
Expand Down
25 changes: 10 additions & 15 deletions arch/x86/kernel/cpu/perf_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ struct intel_excl_cntrs {
unsigned core_id; /* per-core: core id */
};

#define MAX_LBR_ENTRIES 16
#define MAX_LBR_ENTRIES 32

enum {
X86_PERF_KFREE_SHARED = 0,
Expand Down Expand Up @@ -594,6 +594,7 @@ struct x86_pmu {
struct event_constraint *pebs_constraints;
void (*pebs_aliases)(struct perf_event *event);
int max_pebs_events;
unsigned long free_running_flags;

/*
* Intel LBR
Expand Down Expand Up @@ -624,6 +625,7 @@ struct x86_pmu {
struct x86_perf_task_context {
u64 lbr_from[MAX_LBR_ENTRIES];
u64 lbr_to[MAX_LBR_ENTRIES];
u64 lbr_info[MAX_LBR_ENTRIES];
int lbr_callstack_users;
int lbr_stack_state;
};
Expand Down Expand Up @@ -793,6 +795,8 @@ static inline void set_linear_ip(struct pt_regs *regs, unsigned long ip)
ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event);
ssize_t intel_event_sysfs_show(char *page, u64 config);

struct attribute **merge_attr(struct attribute **a, struct attribute **b);

#ifdef CONFIG_CPU_SUP_AMD

int amd_pmu_init(void);
Expand All @@ -808,20 +812,6 @@ static inline int amd_pmu_init(void)

#ifdef CONFIG_CPU_SUP_INTEL

static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
{
/* user explicitly requested branch sampling */
if (has_branch_stack(event))
return true;

/* implicit branch sampling to correct PEBS skid */
if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
x86_pmu.intel_cap.pebs_format < 2)
return true;

return false;
}

static inline bool intel_pmu_has_bts(struct perf_event *event)
{
if (event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
Expand Down Expand Up @@ -873,6 +863,8 @@ extern struct event_constraint intel_ivb_pebs_event_constraints[];

extern struct event_constraint intel_hsw_pebs_event_constraints[];

extern struct event_constraint intel_skl_pebs_event_constraints[];

struct event_constraint *intel_pebs_constraints(struct perf_event *event);

void intel_pmu_pebs_enable(struct perf_event *event);
Expand Down Expand Up @@ -911,6 +903,8 @@ void intel_pmu_lbr_init_snb(void);

void intel_pmu_lbr_init_hsw(void);

void intel_pmu_lbr_init_skl(void);

int intel_pmu_setup_lbr_filter(struct perf_event *event);

void intel_pt_interrupt(void);
Expand All @@ -934,6 +928,7 @@ static inline int is_ht_workaround_enabled(void)
{
return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
}

#else /* CONFIG_CPU_SUP_INTEL */

static inline void reserve_ds_buffers(void)
Expand Down
Loading

0 comments on commit 41d859a

Please sign in to comment.