Skip to content

Commit

Permalink
perf: Extend per event callchain limit to branch stack
Browse files Browse the repository at this point in the history
The commit 97c79a3 ("perf core: Per event callchain limit")
introduced a per-event term to allow finer tuning of the depth of
callchains to save space.

It should be applied to the branch stack as well. For example, autoFDO
collections require maximum LBR entries. In the meantime, other
system-wide LBR users may only be interested in the latest a few number
of LBRs. A per-event LBR depth would save the perf output buffer.

The patch simply drops the uninterested branches, but HW still collects
the maximum branches. There may be a model-specific optimization that
can reduce the HW depth for some cases to reduce the overhead further.
But it isn't included in the patch set. Because it's not useful for all
cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
LBRs. The depth should have less impact on the collecting overhead.
The model-specific optimization may be implemented later separately.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310181536.3645382-1-kan.liang@linux.intel.com
  • Loading branch information
Kan Liang authored and Peter Zijlstra committed Mar 17, 2025
1 parent c96fff3 commit c53e14f
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 0 deletions.
3 changes: 3 additions & 0 deletions include/linux/perf_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -1347,6 +1347,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,

if (branch_sample_hw_index(event))
size += sizeof(u64);

brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);

size += brs->nr * sizeof(struct perf_branch_entry);

/*
Expand Down
2 changes: 2 additions & 0 deletions include/uapi/linux/perf_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,8 @@ enum perf_event_read_format {
*
* @sample_max_stack: Max number of frame pointers in a callchain,
* should be < /proc/sys/kernel/perf_event_max_stack
* Max number of entries of branch stack
* should be < hardware limit
*/
struct perf_event_attr {

Expand Down

0 comments on commit c53e14f

Please sign in to comment.