From f7ba52f302fdc392e0047f38e50841483d997144 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Tue, 18 Apr 2023 14:49:25 -0700 Subject: [PATCH 1/2] vmlinux.lds.h: Discard .note.gnu.property section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When tooling reads ELF notes, it assumes each note entry is aligned to the value listed in the .note section header's sh_addralign field. The kernel-created ELF notes in the .note.Linux and .note.Xen sections are aligned to 4 bytes. This causes the toolchain to set those sections' sh_addralign values to 4. On the other hand, the GCC-created .note.gnu.property section has an sh_addralign value of 8 for some reason, despite being based on struct Elf32_Nhdr which only needs 4-byte alignment. When the mismatched input sections get linked together into the vmlinux .notes output section, the higher alignment "wins", resulting in an sh_addralign of 8, which confuses tooling. For example: $ readelf -n .tmp_vmlinux.btf ... readelf: .tmp_vmlinux.btf: Warning: note with invalid namesz and/or descsz found at offset 0x170 readelf: .tmp_vmlinux.btf: Warning: type: 0x4, namesize: 0x006e6558, descsize: 0x00008801, alignment: 8 In this case readelf thinks there's alignment padding where there is none, so it starts reading an ELF note in the middle. With newer toolchains (e.g., latest Fedora Rawhide), a similar mismatch triggers a build failure when combined with CONFIG_X86_KERNEL_IBT: btf_encoder__encode: btf__dedup failed! Failed to encode BTF libbpf: failed to find '.BTF' ELF section in vmlinux FAILED: load BTF from vmlinux: No data available make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 255 This latter error was caused by pahole crashing when it encountered the corrupt .notes section. This crash has been fixed in dwarves version 1.25. As Tianyi Liu describes: "Pahole reads .notes to look for LINUX_ELFNOTE_BUILD_LTO. When LTO is enabled, pahole needs to call cus__merge_and_process_cu to merge compile units, at which point there should only be one unspecified type (used to represent some compilation information) in the global context. However, when the kernel is compiled without LTO, if pahole calls cus__merge_and_process_cu due to alignment issues with notes, multiple unspecified types may appear after merging the cus, and older versions of pahole only support up to one. This is why pahole 1.24 crashes, while newer versions support multiple. However, the latest version of pahole still does not solve the problem of incorrect LTO recognition, so compiling the kernel may be slower than normal." Even with the newer pahole, the note section misaligment issue still exists and pahole is misinterpreting the LTO note. Fix it by discarding the .note.gnu.property section. While GNU properties are important for user space (and VDSO), they don't seem to have any use for vmlinux. (In fact, they're already getting (inadvertently) stripped from vmlinux when CONFIG_DEBUG_INFO_BTF is enabled. The BTF data is extracted from vmlinux.o with "objcopy --only-section=.BTF" into .btf.vmlinux.bin.o. That file doesn't have .note.gnu.property, so when it gets modified and linked back into the main object, the linker automatically strips it (see "How GNU properties are merged" in the ld man page).) Reported-by: Daniel Xu Link: https://lkml.kernel.org/bpf/57830c30-cd77-40cf-9cd1-3bb608aa602e@app.fastmail.com Debugged-by: Tianyi Liu Suggested-by: Joan Bruguera Micó Link: https://lore.kernel.org/r/20230418214925.ay3jpf2zhw75kgmd@treble Signed-off-by: Josh Poimboeuf --- include/asm-generic/vmlinux.lds.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index d1f57e4868ed3..cebdf1ca415de 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -891,9 +891,16 @@ /* * Discard .note.GNU-stack, which is emitted as PROGBITS by the compiler. * Otherwise, the type of .notes section would become PROGBITS instead of NOTES. + * + * Also, discard .note.gnu.property, otherwise it forces the notes section to + * be 8-byte aligned which causes alignment mismatches with the kernel's custom + * 4-byte aligned notes. */ #define NOTES \ - /DISCARD/ : { *(.note.GNU-stack) } \ + /DISCARD/ : { \ + *(.note.GNU-stack) \ + *(.note.gnu.property) \ + } \ .notes : AT(ADDR(.notes) - LOAD_OFFSET) { \ BOUNDED_SECTION_BY(.note.*, _notes) \ } NOTES_HEADERS \ From 2e4be0d011f21593c6b316806779ba1eba2cd7e0 Mon Sep 17 00:00:00 2001 From: Vernon Lovejoy Date: Fri, 12 May 2023 12:42:32 +0200 Subject: [PATCH 2/2] x86/show_trace_log_lvl: Ensure stack pointer is aligned, again The commit e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned") tried to align the stack pointer in show_trace_log_lvl(), otherwise the "stack < stack_info.end" check can't guarantee that the last read does not go past the end of the stack. However, we have the same problem with the initial value of the stack pointer, it can also be unaligned. So without this patch this trivial kernel module #include static int init(void) { asm volatile("sub $0x4,%rsp"); dump_stack(); asm volatile("add $0x4,%rsp"); return -EAGAIN; } module_init(init); MODULE_LICENSE("GPL"); crashes the kernel. Fixes: e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned") Signed-off-by: Vernon Lovejoy Signed-off-by: Oleg Nesterov Link: https://lore.kernel.org/r/20230512104232.GA10227@redhat.com Signed-off-by: Josh Poimboeuf --- arch/x86/kernel/dumpstack.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 0bf6779187dda..f18ca44c904b7 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -195,7 +195,6 @@ static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, printk("%sCall Trace:\n", log_lvl); unwind_start(&state, task, regs, stack); - stack = stack ? : get_stack_pointer(task, regs); regs = unwind_get_entry_regs(&state, &partial); /* @@ -214,9 +213,13 @@ static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, * - hardirq stack * - entry stack */ - for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { + for (stack = stack ?: get_stack_pointer(task, regs); + stack; + stack = stack_info.next_sp) { const char *stack_name; + stack = PTR_ALIGN(stack, sizeof(long)); + if (get_stack_info(stack, task, &stack_info, &visit_mask)) { /* * We weren't on a valid stack. It's possible that