Skip to content

Commit

Permalink
Merge tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/g…
Browse files Browse the repository at this point in the history
…it/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add new feature to have function graph tracer record the return
   value. Adds a new option: funcgraph-retval ; when set, will show the
   return value of a function in the function graph tracer.

 - Also add the option: funcgraph-retval-hex where if it is not set, and
   the return value is an error code, then it will return the decimal of
   the error code, otherwise it still reports the hex value.

 - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd
   That when a application opens it, it becomes the task that the timer
   lat tracer traces. The application can also read this file to find
   out how it's being interrupted.

 - Add the file /sys/kernel/tracing/available_filter_functions_addrs
   that works just the same as available_filter_functions but also shows
   the addresses of the functions like kallsyms, except that it gives
   the address of where the fentry/mcount jump/nop is. This is used by
   BPF to make it easier to attach BPF programs to ftrace hooks.

 - Replace strlcpy with strscpy in the tracing boot code.

* tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix warnings when building htmldocs for function graph retval
  riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing/boot: Replace strlcpy with strscpy
  tracing/timerlat: Add user-space interface
  tracing/osnoise: Skip running osnoise if all instances are off
  tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable
  ftrace: Show all functions with addresses in available_filter_functions_addrs
  selftests/ftrace: Add funcgraph-retval test case
  LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex
  function_graph: Support recording and printing the return value of function
  fgraph: Add declaration of "struct fgraph_ret_regs"
  • Loading branch information
Linus Torvalds committed Jun 30, 2023
2 parents 533925c + fc30ace commit cccf0c2
Show file tree
Hide file tree
Showing 29 changed files with 1,079 additions and 62 deletions.
132 changes: 132 additions & 0 deletions Documentation/trace/ftrace.rst
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,12 @@ of ftrace. Here is a list of some of the key files:
"set_graph_function", or "set_graph_notrace".
(See the section "dynamic ftrace" below for more details.)

available_filter_functions_addrs:

Similar to available_filter_functions, but with address displayed
for each function. The displayed address is the patch-site address
and can differ from /proc/kallsyms address.

dyn_ftrace_total_info:

This file is for debugging purposes. The number of functions that
Expand Down Expand Up @@ -1359,6 +1365,19 @@ Options for function_graph tracer:
only a closing curly bracket "}" is displayed for
the return of a function.

funcgraph-retval
When set, the return value of each traced function
will be printed after an equal sign "=". By default
this is off.

funcgraph-retval-hex
When set, the return value will always be printed
in hexadecimal format. If the option is not set and
the return value is an error code, it will be printed
in signed decimal format; otherwise it will also be
printed in hexadecimal format. By default, this option
is off.

sleep-time
When running function graph tracer, to include
the time a task schedules out in its function.
Expand Down Expand Up @@ -2704,6 +2723,119 @@ It is default disabled.
0) 1.757 us | } /* kmem_cache_free() */
0) 2.861 us | } /* putname() */

The return value of each traced function can be displayed after
an equal sign "=". When encountering system call failures, it
can be verfy helpful to quickly locate the function that first
returns an error code.

- hide: echo nofuncgraph-retval > trace_options
- show: echo funcgraph-retval > trace_options

Example with funcgraph-retval::

1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */

The above example shows that the function cpu_cgroup_can_attach
returned the error code -22 firstly, then we can read the code
of this function to get the root cause.

When the option funcgraph-retval-hex is not set, the return value can
be displayed in a smart way. Specifically, if it is an error code,
it will be printed in signed decimal format, otherwise it will
printed in hexadecimal format.

- smart: echo nofuncgraph-retval-hex > trace_options
- hexadecimal: echo funcgraph-retval-hex > trace_options

Example with funcgraph-retval-hex::

1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = 0xffffffea */
1) 4.369 us | } /* cgroup_migrate_execute = 0xffffffea */
1) 7.143 us | } /* cgroup_migrate = 0xffffffea */

At present, there are some limitations when using the funcgraph-retval
option, and these limitations will be eliminated in the future:

- Even if the function return type is void, a return value will still
be printed, and you can just ignore it.

- Even if return values are stored in multiple registers, only the
value contained in the first register will be recorded and printed.
To illustrate, in the x86 architecture, eax and edx are used to store
a 64-bit return value, with the lower 32 bits saved in eax and the
upper 32 bits saved in edx. However, only the value stored in eax
will be recorded and printed.

- In certain procedure call standards, such as arm64's AAPCS64, when a
type is smaller than a GPR, it is the responsibility of the consumer
to perform the narrowing, and the upper bits may contain UNKNOWN values.
Therefore, it is advisable to check the code for such cases. For instance,
when using a u8 in a 64-bit GPR, bits [63:8] may contain arbitrary values,
especially when larger types are truncated, whether explicitly or implicitly.
Here are some specific cases to illustrate this point:

**Case One**:

The function narrow_to_u8 is defined as follows::

u8 narrow_to_u8(u64 val)
{
// implicitly truncated
return val;
}

It may be compiled to::

narrow_to_u8:
< ... ftrace instrumentation ... >
RET

If you pass 0x123456789abcdef to this function and want to narrow it,
it may be recorded as 0x123456789abcdef instead of 0xef.

**Case Two**:

The function error_if_not_4g_aligned is defined as follows::

int error_if_not_4g_aligned(u64 val)
{
if (val & GENMASK(31, 0))
return -EINVAL;

return 0;
}

It could be compiled to::

error_if_not_4g_aligned:
CBNZ w0, .Lnot_aligned
RET // bits [31:0] are zero, bits
// [63:32] are UNKNOWN
.Lnot_aligned:
MOV x0, #-EINVAL
RET

When passing 0x2_0000_0000 to it, the return value may be recorded as
0x2_0000_0000 instead of 0.

You can put some comments on specific functions by using
trace_printk() For example, if you want to put a comment inside
the __might_sleep() function, you just have to include
Expand Down
78 changes: 78 additions & 0 deletions Documentation/trace/timerlat-tracer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,3 +180,81 @@ dummy_load_1ms_pd_init, which had the following code (on purpose)::
return 0;

}

User-space interface
---------------------------

Timerlat allows user-space threads to use timerlat infra-structure to
measure scheduling latency. This interface is accessible via a per-CPU
file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd.

This interface is accessible under the following conditions:

- timerlat tracer is enable
- osnoise workload option is set to NO_OSNOISE_WORKLOAD
- The user-space thread is affined to a single processor
- The thread opens the file associated with its single processor
- Only one thread can access the file at a time

The open() syscall will fail if any of these conditions are not met.
After opening the file descriptor, the user space can read from it.

The read() system call will run a timerlat code that will arm the
timer in the future and wait for it as the regular kernel thread does.

When the timer IRQ fires, the timerlat IRQ will execute, report the
IRQ latency and wake up the thread waiting in the read. The thread will be
scheduled and report the thread latency via tracer - as for the kernel
thread.

The difference from the in-kernel timerlat is that, instead of re-arming
the timer, timerlat will return to the read() system call. At this point,
the user can run any code.

If the application rereads the file timerlat file descriptor, the tracer
will report the return from user-space latency, which is the total
latency. If this is the end of the work, it can be interpreted as the
response time for the request.

After reporting the total latency, timerlat will restart the cycle, arm
a timer, and go to sleep for the following activation.

If at any time one of the conditions is broken, e.g., the thread migrates
while in user space, or the timerlat tracer is disabled, the SIG_KILL
signal will be sent to the user-space thread.

Here is an basic example of user-space code for timerlat::

int main(void)
{
char buffer[1024];
int timerlat_fd;
int retval;
long cpu = 0; /* place in CPU 0 */
cpu_set_t set;

CPU_ZERO(&set);
CPU_SET(cpu, &set);

if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
return 1;

snprintf(buffer, sizeof(buffer),
"/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
cpu);

timerlat_fd = open(buffer, O_RDONLY);
if (timerlat_fd < 0) {
printf("error opening %s: %s\n", buffer, strerror(errno));
exit(1);
}

for (;;) {
retval = read(timerlat_fd, buffer, 1024);
if (retval < 0)
break;
}

close(timerlat_fd);
exit(0);
}
1 change: 1 addition & 0 deletions arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ config ARM64
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GCC_PLUGINS
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \
Expand Down
22 changes: 22 additions & 0 deletions arch/arm64/include/asm/ftrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -192,4 +192,26 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
}
#endif /* ifndef __ASSEMBLY__ */

#ifndef __ASSEMBLY__
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
struct fgraph_ret_regs {
/* x0 - x7 */
unsigned long regs[8];

unsigned long fp;
unsigned long __unused;
};

static inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs)
{
return ret_regs->regs[0];
}

static inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs)
{
return ret_regs->fp;
}
#endif /* ifdef CONFIG_FUNCTION_GRAPH_TRACER */
#endif

#endif /* __ASM_FTRACE_H */
13 changes: 13 additions & 0 deletions arch/arm64/kernel/asm-offsets.c
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,19 @@ int main(void)
#endif
#ifdef CONFIG_FUNCTION_TRACER
DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func));
#endif
BLANK();
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
DEFINE(FGRET_REGS_X0, offsetof(struct fgraph_ret_regs, regs[0]));
DEFINE(FGRET_REGS_X1, offsetof(struct fgraph_ret_regs, regs[1]));
DEFINE(FGRET_REGS_X2, offsetof(struct fgraph_ret_regs, regs[2]));
DEFINE(FGRET_REGS_X3, offsetof(struct fgraph_ret_regs, regs[3]));
DEFINE(FGRET_REGS_X4, offsetof(struct fgraph_ret_regs, regs[4]));
DEFINE(FGRET_REGS_X5, offsetof(struct fgraph_ret_regs, regs[5]));
DEFINE(FGRET_REGS_X6, offsetof(struct fgraph_ret_regs, regs[6]));
DEFINE(FGRET_REGS_X7, offsetof(struct fgraph_ret_regs, regs[7]));
DEFINE(FGRET_REGS_FP, offsetof(struct fgraph_ret_regs, fp));
DEFINE(FGRET_REGS_SIZE, sizeof(struct fgraph_ret_regs));
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
DEFINE(FTRACE_OPS_DIRECT_CALL, offsetof(struct ftrace_ops, direct_call));
#endif
Expand Down
27 changes: 14 additions & 13 deletions arch/arm64/kernel/entry-ftrace.S
Original file line number Diff line number Diff line change
Expand Up @@ -330,22 +330,23 @@ SYM_FUNC_END(ftrace_stub_graph)
*/
SYM_CODE_START(return_to_handler)
/* save return value regs */
sub sp, sp, #64
stp x0, x1, [sp]
stp x2, x3, [sp, #16]
stp x4, x5, [sp, #32]
stp x6, x7, [sp, #48]
sub sp, sp, #FGRET_REGS_SIZE
stp x0, x1, [sp, #FGRET_REGS_X0]
stp x2, x3, [sp, #FGRET_REGS_X2]
stp x4, x5, [sp, #FGRET_REGS_X4]
stp x6, x7, [sp, #FGRET_REGS_X6]
str x29, [sp, #FGRET_REGS_FP] // parent's fp

mov x0, x29 // parent's fp
bl ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
mov x30, x0 // restore the original return address
mov x0, sp
bl ftrace_return_to_handler // addr = ftrace_return_to_hander(regs);
mov x30, x0 // restore the original return address

/* restore return value regs */
ldp x0, x1, [sp]
ldp x2, x3, [sp, #16]
ldp x4, x5, [sp, #32]
ldp x6, x7, [sp, #48]
add sp, sp, #64
ldp x0, x1, [sp, #FGRET_REGS_X0]
ldp x2, x3, [sp, #FGRET_REGS_X2]
ldp x4, x5, [sp, #FGRET_REGS_X4]
ldp x6, x7, [sp, #FGRET_REGS_X6]
add sp, sp, #FGRET_REGS_SIZE

ret
SYM_CODE_END(return_to_handler)
Expand Down
1 change: 1 addition & 0 deletions arch/loongarch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ config LOONGARCH
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GENERIC_VDSO
Expand Down
22 changes: 22 additions & 0 deletions arch/loongarch/include/asm/ftrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,26 @@ __arch_ftrace_set_direct_caller(struct pt_regs *regs, unsigned long addr)

#endif /* CONFIG_FUNCTION_TRACER */

#ifndef __ASSEMBLY__
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
struct fgraph_ret_regs {
/* a0 - a1 */
unsigned long regs[2];

unsigned long fp;
unsigned long __unused;
};

static inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs)
{
return ret_regs->regs[0];
}

static inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs)
{
return ret_regs->fp;
}
#endif /* ifdef CONFIG_FUNCTION_GRAPH_TRACER */
#endif

#endif /* _ASM_LOONGARCH_FTRACE_H */
15 changes: 14 additions & 1 deletion arch/loongarch/kernel/asm-offsets.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <asm/cpu-info.h>
#include <asm/ptrace.h>
#include <asm/processor.h>
#include <asm/ftrace.h>

void output_ptreg_defines(void)
{
Expand Down Expand Up @@ -264,11 +265,23 @@ void output_smpboot_defines(void)
#ifdef CONFIG_HIBERNATION
void output_pbe_defines(void)
{
COMMENT(" Linux struct pbe offsets. ");
COMMENT("Linux struct pbe offsets.");
OFFSET(PBE_ADDRESS, pbe, address);
OFFSET(PBE_ORIG_ADDRESS, pbe, orig_address);
OFFSET(PBE_NEXT, pbe, next);
DEFINE(PBE_SIZE, sizeof(struct pbe));
BLANK();
}
#endif

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
void output_fgraph_ret_regs_defines(void)
{
COMMENT("LoongArch fgraph_ret_regs offsets.");
OFFSET(FGRET_REGS_A0, fgraph_ret_regs, regs[0]);
OFFSET(FGRET_REGS_A1, fgraph_ret_regs, regs[1]);
OFFSET(FGRET_REGS_FP, fgraph_ret_regs, fp);
DEFINE(FGRET_REGS_SIZE, sizeof(struct fgraph_ret_regs));
BLANK();
}
#endif
Loading

0 comments on commit cccf0c2

Please sign in to comment.