Skip to content

Commit

Permalink
perf callchain: Feed callchains into a cursor
Browse files Browse the repository at this point in the history
The callchains are fed with an array of a fixed size.
As a result we iterate over each callchains three times:

- 1st to resolve symbols
- 2nd to filter out context boundaries
- 3rd for the insertion into the tree

This also involves some pairs of memory allocation/deallocation
everytime we insert a callchain, for the filtered out array of
addresses and for the array of symbols that comes along.

Instead, feed the callchains through a linked list with persistent
allocations. It brings several pros like:

- Merge the 1st and 2nd iterations in one. That was possible before
but in a way that would involve allocating an array slightly taller
than necessary because we don't know in advance the number of context
boundaries to filter out.

- Much lesser allocations/deallocations. The linked list keeps
persistent empty entries for the next usages and is extendable at
will.

- Makes it easier for multiple sources of callchains to feed a
stacktrace together. This is deemed to pave the way for cfi based
callchains wherein traditional frame pointer based kernel
stacktraces will precede cfi based user ones, producing an overall
callchain which size is hardly predictable. This requirement
makes the static array obsolete and makes a linked list based
iterator a much more flexible fit.

Basic testing on a big perf file containing callchains (~ 176 MB)
has shown a throughput gain of about 11% with perf report.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294977121-5700-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
  • Loading branch information
Frederic Weisbecker authored and Arnaldo Carvalho de Melo committed Jan 22, 2011
1 parent de5fa3a commit 1b3a0e9
Show file tree
Hide file tree
Showing 7 changed files with 204 additions and 139 deletions.
25 changes: 12 additions & 13 deletions tools/perf/builtin-report.c
Original file line number Diff line number Diff line change
Expand Up @@ -81,18 +81,17 @@ static int perf_session__add_hist_entry(struct perf_session *self,
struct addr_location *al,
struct sample_data *data)
{
struct map_symbol *syms = NULL;
struct symbol *parent = NULL;
int err = -ENOMEM;
int err = 0;
struct hist_entry *he;
struct hists *hists;
struct perf_event_attr *attr;

if ((sort__has_parent || symbol_conf.use_callchain) && data->callchain) {
syms = perf_session__resolve_callchain(self, al->thread,
data->callchain, &parent);
if (syms == NULL)
return -ENOMEM;
err = perf_session__resolve_callchain(self, al->thread,
data->callchain, &parent);
if (err)
return err;
}

attr = perf_header__find_attr(data->id, &self->header);
Expand All @@ -101,16 +100,17 @@ static int perf_session__add_hist_entry(struct perf_session *self,
else
hists = perf_session__hists_findnew(self, data->id, 0, 0);
if (hists == NULL)
goto out_free_syms;
return -ENOMEM;

he = __hists__add_entry(hists, al, parent, data->period);
if (he == NULL)
goto out_free_syms;
err = 0;
return -ENOMEM;

if (symbol_conf.use_callchain) {
err = callchain_append(he->callchain, data->callchain, syms,
err = callchain_append(he->callchain, &self->callchain_cursor,
data->period);
if (err)
goto out_free_syms;
return err;
}
/*
* Only in the newt browser we are doing integrated annotation,
Expand All @@ -119,8 +119,7 @@ static int perf_session__add_hist_entry(struct perf_session *self,
*/
if (use_browser > 0)
err = hist_entry__inc_addr_samples(he, al->addr);
out_free_syms:
free(syms);

return err;
}

Expand Down
Loading

0 comments on commit 1b3a0e9

Please sign in to comment.