Skip to content

Commit

Permalink
Merge tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linu…
Browse files Browse the repository at this point in the history
…x/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add AMD Fam17h RAPL support

   - Introduce CAP_PERFMON to kernel and user space

   - Add Zhaoxin CPU support

   - Misc fixes and cleanups

  Tooling changes:

   - perf record:

     Introduce '--switch-output-event' to use arbitrary events to be
     setup and read from a side band thread and, when they take place a
     signal be sent to the main 'perf record' thread, reusing the core
     for '--switch-output' to take perf.data snapshots from the ring
     buffer used for '--overwrite', e.g.:

	# perf record --overwrite -e sched:* \
		      --switch-output-event syscalls:*connect* \
		      workload

     will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
     connect syscalls.

     Add '--num-synthesize-threads' option to control degree of
     parallelism of the synthesize_mmap() code which is scanning
     /proc/PID/task/PID/maps and can be time consuming. This mimics
     pre-existing behaviour in 'perf top'.

   - perf bench:

     Add a multi-threaded synthesize benchmark and kallsyms parsing
     benchmark.

   - Intel PT support:

     Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.

     Allow using Intel PT to synthesize callchains for regular events.

     Add support for synthesizing branch stacks for regular events
     (cycles, instructions, etc) from Intel PT data.

  Misc changes:

   - Updated perf vendor events for power9 and Coresight.

   - Add flamegraph.py script via 'perf flamegraph'

   - Misc other changes, fixes and cleanups - see the Git log for details

  Also, since over the last couple of years perf tooling has matured and
  decoupled from the kernel perf changes to a large degree, going
  forward Arnaldo is going to send perf tooling changes via direct pull
  requests"

* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
  perf/x86/rapl: Add AMD Fam17h RAPL support
  perf/x86/rapl: Make perf_probe_msr() more robust and flexible
  perf/x86/rapl: Flip logic on default events visibility
  perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
  perf/x86/rapl: Move RAPL support to common x86 code
  perf/core: Replace zero-length array with flexible-array
  perf/x86: Replace zero-length array with flexible-array
  perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
  perf/x86/rapl: Add Ice Lake RAPL support
  perf flamegraph: Use /bin/bash for report and record scripts
  perf cs-etm: Move definition of 'traceid_list' global variable from header file
  libsymbols kallsyms: Move hex2u64 out of header
  libsymbols kallsyms: Parse using io api
  perf bench: Add kallsyms parsing
  perf: cs-etm: Update to build with latest opencsd version.
  perf symbol: Fix kernel symbol address display
  perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
  ...
  • Loading branch information
Linus Torvalds committed Jun 1, 2020
2 parents 69fc06f + 5cde265 commit a7092c8
Show file tree
Hide file tree
Showing 194 changed files with 5,221 additions and 1,993 deletions.
86 changes: 61 additions & 25 deletions Documentation/admin-guide/perf-security.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _perf_security:

Perf Events and tool security
Perf events and tool security
=============================

Overview
Expand Down Expand Up @@ -42,11 +42,11 @@ categories:
Data that belong to the fourth category can potentially contain
sensitive process data. If PMUs in some monitoring modes capture values
of execution context registers or data from process memory then access
to such monitoring capabilities requires to be ordered and secured
properly. So, perf_events/Perf performance monitoring is the subject for
security access control management [5]_ .
to such monitoring modes requires to be ordered and secured properly.
So, perf_events performance monitoring and observability operations are
the subject for security access control management [5]_ .

perf_events/Perf access control
perf_events access control
-------------------------------

To perform security checks, the Linux implementation splits processes
Expand All @@ -66,11 +66,25 @@ into distinct units, known as capabilities [6]_ , which can be
independently enabled and disabled on per-thread basis for processes and
files of unprivileged users.

Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated
Unprivileged processes with enabled CAP_PERFMON capability are treated
as privileged processes with respect to perf_events performance
monitoring and bypass *scope* permissions checks in the kernel.

Unprivileged processes using perf_events system call API is also subject
monitoring and observability operations, thus, bypass *scope* permissions
checks in the kernel. CAP_PERFMON implements the principle of least
privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and
observability operations in the kernel and provides a secure approach to
perfomance monitoring and observability in the system.

For backward compatibility reasons the access to perf_events monitoring and
observability operations is also open for CAP_SYS_ADMIN privileged
processes but CAP_SYS_ADMIN usage for secure monitoring and observability
use cases is discouraged with respect to the CAP_PERFMON capability.
If system audit records [14]_ for a process using perf_events system call
API contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN
capabilities then providing the process with CAP_PERFMON capability singly
is recommended as the preferred secure approach to resolve double access
denial logging related to usage of performance monitoring and observability.

Unprivileged processes using perf_events system call are also subject
for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose
outcome determines whether monitoring is permitted. So unprivileged
processes provided with CAP_SYS_PTRACE capability are effectively
Expand All @@ -82,14 +96,14 @@ performance analysis of monitored processes or a system. For example,
CAP_SYSLOG capability permits reading kernel space memory addresses from
/proc/kallsyms file.

perf_events/Perf privileged users
Privileged Perf users groups
---------------------------------

Mechanisms of capabilities, privileged capability-dumb files [6]_ and
file system ACLs [10]_ can be used to create a dedicated group of
perf_events/Perf privileged users who are permitted to execute
performance monitoring without scope limits. The following steps can be
taken to create such a group of privileged Perf users.
file system ACLs [10]_ can be used to create dedicated groups of
privileged Perf users who are permitted to execute performance monitoring
and observability without scope limits. The following steps can be
taken to create such groups of privileged Perf users.

1. Create perf_users group of privileged Perf users, assign perf_users
group to Perf tool executable and limit access to the executable for
Expand All @@ -108,30 +122,51 @@ taken to create such a group of privileged Perf users.
-rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf

2. Assign the required capabilities to the Perf tool executable file and
enable members of perf_users group with performance monitoring
enable members of perf_users group with monitoring and observability
privileges [6]_ :

::

# setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
# setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
# setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
# setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
perf: OK
# getcap perf
perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep
perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep

If the libcap installed doesn't yet support "cap_perfmon", use "38" instead,
i.e.:

::

# setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf

Note that you may need to have 'cap_ipc_lock' in the mix for tools such as
'perf top', alternatively use 'perf top -m N', to reduce the memory that
it uses for the perf ring buffer, see the memory allocation section below.

Using a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38,
CAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u',
so as a workaround explicitly ask for the 'cycles' event, i.e.:

::

# perf top -e cycles

To get kernel and user samples with a perf binary with just CAP_PERFMON.

As a result, members of perf_users group are capable of conducting
performance monitoring by using functionality of the configured Perf
tool executable that, when executes, passes perf_events subsystem scope
checks.
performance monitoring and observability by using functionality of the
configured Perf tool executable that, when executes, passes perf_events
subsystem scope checks.

This specific access control management is only available to superuser
or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_
capabilities.

perf_events/Perf unprivileged users
Unprivileged users
-----------------------------------

perf_events/Perf *scope* and *access* control for unprivileged processes
perf_events *scope* and *access* control for unprivileged processes
is governed by perf_event_paranoid [2]_ setting:

-1:
Expand Down Expand Up @@ -166,7 +201,7 @@ is governed by perf_event_paranoid [2]_ setting:
perf_event_mlock_kb locking limit is imposed but ignored for
unprivileged processes with CAP_IPC_LOCK capability.

perf_events/Perf resource control
Resource control
---------------------------------

Open file descriptors
Expand Down Expand Up @@ -227,4 +262,5 @@ Bibliography
.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_
.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_
.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_
.. [13] `<https://sites.google.com/site/fullycapable>`_
.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_
16 changes: 11 additions & 5 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,13 @@ perf_event_paranoid
===================

Controls use of the performance events system by unprivileged
users (without CAP_SYS_ADMIN). The default value is 2.
users (without CAP_PERFMON). The default value is 2.

For backward compatibility reasons access to system performance
monitoring and observability remains open for CAP_SYS_ADMIN
privileged processes but CAP_SYS_ADMIN usage for secure system
performance monitoring and observability operations is discouraged
with respect to CAP_PERFMON use cases.

=== ==================================================================
-1 Allow use of (almost) all events by all users.
Expand All @@ -730,13 +736,13 @@ users (without CAP_SYS_ADMIN). The default value is 2.
``CAP_IPC_LOCK``.

>=0 Disallow ftrace function tracepoint by users without
``CAP_SYS_ADMIN``.
``CAP_PERFMON``.

Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``.
Disallow raw tracepoint access by users without ``CAP_PERFMON``.

>=1 Disallow CPU event access by users without ``CAP_SYS_ADMIN``.
>=1 Disallow CPU event access by users without ``CAP_PERFMON``.

>=2 Disallow kernel profiling by users without ``CAP_SYS_ADMIN``.
>=2 Disallow kernel profiling by users without ``CAP_PERFMON``.
=== ==================================================================


Expand Down
2 changes: 1 addition & 1 deletion arch/parisc/kernel/perf.c
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf,
else
return -EFAULT;

if (!capable(CAP_SYS_ADMIN))
if (!perfmon_capable())
return -EACCES;

if (count != sizeof(uint32_t))
Expand Down
4 changes: 2 additions & 2 deletions arch/powerpc/perf/imc-pmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -976,7 +976,7 @@ static int thread_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;

if (!capable(CAP_SYS_ADMIN))
if (!perfmon_capable())
return -EACCES;

/* Sampling not supported */
Expand Down Expand Up @@ -1412,7 +1412,7 @@ static int trace_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;

if (!capable(CAP_SYS_ADMIN))
if (!perfmon_capable())
return -EACCES;

/* Return if this is a couting event */
Expand Down
6 changes: 3 additions & 3 deletions arch/x86/events/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ config PERF_EVENTS_INTEL_UNCORE
available on NehalemEX and more modern processors.

config PERF_EVENTS_INTEL_RAPL
tristate "Intel rapl performance events"
depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
tristate "Intel/AMD rapl performance events"
depends on PERF_EVENTS && (CPU_SUP_INTEL || CPU_SUP_AMD) && PCI
default y
---help---
Include support for Intel rapl performance events for power
Include support for Intel and AMD rapl performance events for power
monitoring on modern processors.

config PERF_EVENTS_INTEL_CSTATE
Expand Down
3 changes: 3 additions & 0 deletions arch/x86/events/Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += core.o probe.o
obj-$(PERF_EVENTS_INTEL_RAPL) += rapl.o
obj-y += amd/
obj-$(CONFIG_X86_LOCAL_APIC) += msr.o
obj-$(CONFIG_CPU_SUP_INTEL) += intel/
obj-$(CONFIG_CPU_SUP_CENTAUR) += zhaoxin/
obj-$(CONFIG_CPU_SUP_ZHAOXIN) += zhaoxin/
4 changes: 4 additions & 0 deletions arch/x86/events/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -1839,6 +1839,10 @@ static int __init init_hw_perf_events(void)
err = amd_pmu_init();
x86_pmu.name = "HYGON";
break;
case X86_VENDOR_ZHAOXIN:
case X86_VENDOR_CENTAUR:
err = zhaoxin_pmu_init();
break;
default:
err = -ENOTSUPP;
}
Expand Down
2 changes: 0 additions & 2 deletions arch/x86/events/intel/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
obj-$(CONFIG_CPU_SUP_INTEL) += core.o bts.o
obj-$(CONFIG_CPU_SUP_INTEL) += ds.o knc.o
obj-$(CONFIG_CPU_SUP_INTEL) += lbr.o p4.o p6.o pt.o
obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL) += intel-rapl-perf.o
intel-rapl-perf-objs := rapl.o
obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o
intel-uncore-objs := uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o
obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE) += intel-cstate.o
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/events/intel/bts.c
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ struct bts_buffer {
local_t head;
unsigned long end;
void **data_pages;
struct bts_phys buf[0];
struct bts_phys buf[];
};

static struct pmu bts_pmu;
Expand Down
4 changes: 2 additions & 2 deletions arch/x86/events/intel/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -1892,8 +1892,8 @@ static __initconst const u64 tnt_hw_cache_extra_regs

static struct extra_reg intel_tnt_extra_regs[] __read_mostly = {
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0xffffff9fffull, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0xffffff9fffull, RSP_1),
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x800ff0ffffff9fffull, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0xff0ffffff9fffull, RSP_1),
EVENT_EXTRA_END
};

Expand Down
2 changes: 0 additions & 2 deletions arch/x86/events/intel/pt.c
Original file line number Diff line number Diff line change
Expand Up @@ -226,8 +226,6 @@ static int __init pt_pmu_hw_init(void)
pt_pmu.vmx = true;
}

attrs = NULL;

for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
&pt_pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/events/intel/uncore.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ struct intel_uncore_box {
struct list_head list;
struct list_head active_list;
void __iomem *io_addr;
struct intel_uncore_extra_reg shared_regs[0];
struct intel_uncore_extra_reg shared_regs[];
};

/* CFL uncore 8th cbox MSRs */
Expand Down
10 changes: 10 additions & 0 deletions arch/x86/events/perf_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,7 @@ struct x86_pmu {

/* PMI handler bits */
unsigned int late_ack :1,
enabled_ack :1,
counter_freezing :1;
/*
* sysfs attrs
Expand Down Expand Up @@ -1133,3 +1134,12 @@ static inline int is_ht_workaround_enabled(void)
return 0;
}
#endif /* CONFIG_CPU_SUP_INTEL */

#if ((defined CONFIG_CPU_SUP_CENTAUR) || (defined CONFIG_CPU_SUP_ZHAOXIN))
int zhaoxin_pmu_init(void);
#else
static inline int zhaoxin_pmu_init(void)
{
return 0;
}
#endif /*CONFIG_CPU_SUP_CENTAUR or CONFIG_CPU_SUP_ZHAOXIN*/
13 changes: 13 additions & 0 deletions arch/x86/events/probe.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ not_visible(struct kobject *kobj, struct attribute *attr, int i)
return 0;
}

/*
* Accepts msr[] array with non populated entries as long as either
* msr[i].msr is 0 or msr[i].grp is NULL. Note that the default sysfs
* visibility is visible when group->is_visible callback is set.
*/
unsigned long
perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void *data)
{
Expand All @@ -24,8 +29,16 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void *data)
if (!msr[bit].no_check) {
struct attribute_group *grp = msr[bit].grp;

/* skip entry with no group */
if (!grp)
continue;

grp->is_visible = not_visible;

/* skip unpopulated entry */
if (!msr[bit].msr)
continue;

if (msr[bit].test && !msr[bit].test(bit, data))
continue;
/* Virt sucks; you cannot tell if a R/O MSR is present :/ */
Expand Down
Loading

0 comments on commit a7092c8

Please sign in to comment.