Skip to content

Kernel: Dmesg overflow at startup #1849

Closed
thomas opened this issue Jul 22, 2020 · 12 comments
Closed

Kernel: Dmesg overflow at startup #1849

thomas opened this issue Jul 22, 2020 · 12 comments

Comments

@thomas
Copy link
Collaborator

thomas commented Jul 22, 2020

Startup messages are lost when booting a new server. Caused by a too small kernel ring buffer.

Current CONFIG_LOG_BUF_SHIFT=17 should be changed to CONFIG_LOG_BUF_SHIFT=19.

@pmenzel
Copy link
Collaborator

pmenzel commented Jul 22, 2020

Startup messages are lost when booting a new server. Caused by a too small kernel ring buffer.

Are you referring to the problems on sang and mouches?

Current CONFIG_LOG_BUF_SHIFT=17 should be changed to CONFIG_LOG_BUF_SHIFT=19.

What size in KB will 19 be?

The first logs were missing on sang even with log_buf_len=2M (can also be verified in /var/log/messages and journalctl -k).

@thomas
Copy link
Collaborator Author

thomas commented Jul 22, 2020

Google 'CONFIG_LOG_BUF_SHIFT' ?

@pmenzel
Copy link
Collaborator

pmenzel commented Jul 22, 2020

But you already knew the value, so why let all other team members search to find out it’s 2^19 = 512 KiB?

Why 512 KiB, and not 2 MiB?

As it’s run-time configurable, the Linux configuration shouldn’t be changed in my opinion, and our GRUB configuration should be adapted to use log_buf_len=2M (or some higher value). (And as written above, it didn’t fix the problem with mouches and sang. Plugging in a serial cable on sang helped.)

@donald
Copy link
Collaborator

donald commented Jul 31, 2020

it didn’t fix the problem with mouches and sang.

@thomas, is there a machine where log messages are lost with default settings but not with CONFIG_LOG_BUF_SHIFT=19 ?

Is there a strong argument to put it either into the kernel config or into grub.cfg? I can only find very minor arguments.

@thomas
Copy link
Collaborator Author

thomas commented Jul 31, 2020

At least mouches dmesg started once around second 2.0 after a regular boot, so yes, messages where lost. To me this means the buffer has become too small in the meantime, even if it's not reproducible (and like it or not, look at other distros). Also we often have had the situation where the buffer was stuffed with output from a 'spammy' service/driver, a larger buffer would offer some more chances to catch the cause.

I strongly opt against further cramming the kernel command line (s.b.). A manually choosen value is also stupid, since within the kernel are further mechanics at work that adjust the buffer size; example: each core adds a certain amount of bufferspace. Using the bitshifted-value * some_const_value frees us from fiddling with a somehow guessed buffersize when some_const_value changes for reasons beyond our interest - or?

N.B. in an ideal world the command line would boil down to 'root=801' -- ok, not everything is sda1 :)

@donald
Copy link
Collaborator

donald commented Aug 3, 2020

I've put that into /src/mariux/beeroot/bee-files.issues for the next config.

@pmenzel
Copy link
Collaborator

pmenzel commented Aug 10, 2020

At least mouches dmesg started once around second 2.0 after a regular boot, so yes, messages where lost. To me this means the buffer has become too small in the meantime, even if it's not reproducible

As written, it has nothing to do with this. Currently, the buffer is at least 128 KB big, and currently on mouches the actual messages use up around 70 KB.

(and like it or not, look at other distros).

Debian uses:

$ grep BUF_SHIFT /boot/config-5.7.0-2-amd64
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13

Also we often have had the situation where the buffer was stuffed with output from a 'spammy' service/driver, a larger buffer would offer some more chances to catch the cause.

After a long run-time we are going to run into that situation. So, why choose 512 KB as the base?

I strongly opt against further cramming the kernel command line (s.b.).

What is your solution for older kernels, which were build with the old limit?

A manually choosen value is also stupid, since within the kernel are further mechanics at work that adjust the buffer size; example: each core adds a certain amount of bufferspace. Using the bitshifted-value * some_const_value frees us from fiddling with a somehow guessed buffersize when some_const_value changes for reasons beyond our interest - or?

(Please try to not use words like stupid in the future, as, especially in written context, it might be insulting.)

4 KB for each CPU, we’d need 1500 CPUs to reach the 2 MB limit.

I am not against increasing the buffer size, but the issue title is about the overflow at start-up, and that is not solved by the proposed solution. To not speak past each other, I’d welcome, if the purpose of this issue is clarified.

@thomas
Copy link
Collaborator Author

thomas commented Aug 10, 2020

Again, booting with CONFIG_LOG_BUF_SHIFT=17 lets the remaining dmesg output start at second 4+, booting with CONFIG_LOG_BUF_SHIFT=19 lets one see the whole startup.
Looks like a solution to me...

Recently used by others:
Ubuntu 20.04 LTS -> CONFIG_LOG_BUF_SHIFT=18
SLACKWARE-current -> CONFIG_LOG_BUF_SHIFT=18

As for the 'older kernels', hopefully we don't need to boot new machines with an ancient kernel, so the problem will grow out with time.

@donald
Copy link
Collaborator

donald commented Aug 10, 2020

As written, it has nothing to do with this.

It does. I've tested it:

--> rebuild kernel with LOG_BUF_SHIFT=   17 --> 19

root@mouches:/scratch/local/linux-5.4.39-334.x86_64/build# cp arch/x86/boot/bzImage /boot/bzImage-5.4.39.mx64.334
root@mouches:/scratch/local/linux-5.4.39-334.x86_64/build# shutdown -r now

buczek@mouches:~$ dmesg|head
[    0.000000] Linux version 5.4.39.mx64.334 (root@mouches.molgen.mpg.de) (gcc version 7.5.0 (GCC)) #3 SMP Mon Aug 10 14:07:05 CEST 2020
[    0.000000] Command line: BOOT_IMAGE=/boot/bzImage-5.4.39.mx64.334 root=LABEL=root ro crashkernel=256M console=ttyS1,115200n8 console=tty0 init=/bin/systemd audit=0 random.trust_cpu=on
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
root@mouches:~# zgrep LOG_BUF_SHIFT /proc/config.gz 
CONFIG_LOG_BUF_SHIFT=19
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13

--> rebuild kernel with LOG_BUG_SHIFT   19 --> 17

root@mouches:/scratch/local/linux-5.4.39-334.x86_64/build# cp arch/x86/boot/bzImage /boot/bzImage-5.4.39.mx64.334
root@mouches:/scratch/local/linux-5.4.39-334.x86_64/build# shutdown -r now

buczek@mouches:~$ dmesg|head
[    4.885874] pci 0000:5e:00.0: reg 0x30: [mem 0xffe00000-0xffffffff pref]
[    4.892892] pci 0000:5e:00.0: supports D1
[    4.896867] pci 0000:5e:00.0: PME# supported from D0 D1 D3hot
[    4.902911] pci 0000:5d:00.0: PCI bridge to [bus 5e]
[    4.907868] pci 0000:5d:00.0:   bridge window [io  0x8000-0x8fff]
[    4.914868] pci 0000:5d:00.0:   bridge window [mem 0xbb100000-0xbb1fffff]
[    4.921896] pci 0000:5f:00.0: [8086:1572] type 00 class 0x020000
[    4.927882] pci 0000:5f:00.0: reg 0x10: [mem 0xba000000-0xbaffffff 64bit pref]
[    4.935878] pci 0000:5f:00.0: reg 0x1c: [mem 0xbb008000-0xbb00ffff 64bit pref]
[    4.942875] pci 0000:5f:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
buczek@mouches:~$ zgrep LOG_BUF_SHIFT /proc/config.gz 
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13

This was referenced Aug 11, 2020
@pmenzel
Copy link
Collaborator

pmenzel commented Aug 11, 2020

I looked more into it, and the problem is, that these two socket Intel Skylake-E systems have over 280 PCI devices, compared to 50 on other 16 CPU systems, resulting in a lot more messages.

@mouches:~$ lspci | wc -l
281

That log_buf_len=2M didn’t work, was probably an error of editing the wrong GRUB entry (and can’t be checked from the logs anymore).

@pmenzel
Copy link
Collaborator

pmenzel commented Aug 11, 2020

I still think, using the command line parameter, is the better solution, as it will improve all systems. Changing the build configuration only fixes the Linux kernels built from now on.

@donald
Copy link
Collaborator

donald commented Aug 11, 2020

I still think, using the command line parameter, is the better solution, as it will improve all systems.

I would agree to both solutions but didn't want to wait until there is an consensus (which will be never), because I'd like to switch the default kernel to latest and greatest asap and @thomas asked for the proposed change to be in the default kernel. I've included the config in #1855, so we can switch the default kernel without delay because of this minor issue. The kernel option doesn't hurt much and an additional command line option can still be proposed and discussed.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants