-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This driver is needed to read out temperature sensors. Resolves: #1819
The Nvidia drivers still need to be built. |
...
|
Sorry, for not setting the WIP label. You can work around it by manually loading the module
My plan is to build it into the Linux kernel again. Too much time wasted again thanks to the Nvidia driver. |
The IPMI drivers are not needed on all systems, and we try to avoid that interface. This also resolves a conflict with other watchdog timers. handsomejack:~$ dmesg --level=err [ 11.618887] watchdog: iTCO_wdt: cannot register miscdev on minor=130 (err=-16). [ 11.627956] watchdog: iTCO_wdt: a legacy watchdog module is probably present. handsomejack:~$ dmesg | grep -e iTCO -e watchdog [ 11.603138] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 [ 11.609888] iTCO_wdt: Found a Wellsburg TCO device (Version=2, TCOBASE=0x0460) [ 11.618887] watchdog: iTCO_wdt: cannot register miscdev on minor=130 (err=-16). [ 11.627956] watchdog: iTCO_wdt: a legacy watchdog module is probably present. [ 11.636462] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) [ 11.643679] iTCO_vendor_support: vendor-support=0 The Linux error when shutting down *sympathyforthedevil* – not in the logs, only on the monitor or the serial console – is also gone now, as the drivers are not automatically loaded. [ 189.063113] reboot: Power down [ 189.068549] IPMI poweroff: Powering down via IPMI chassis control command [ 189.075498] ------------[ cut here ]------------ [ 189.080259] sched: Unexpected reschedule of offline CPU#8! [ 189.085898] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/apic/ipi.c:67 native_smp_send_reschedule+0x34/0x40 [ 189.095605] Modules linked in: 8021q garp stp mrp llc amd64_edac_mod edac_mce_amd kvm_amd kvm input_leds led_class irqbypass ixgbe crc32c_intel acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables unix ipv6 nf_defrag_ipv6 autofs4 [ 189.118332] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.4.39.mx64.334 #1 [ 189.125774] Hardware name: Supermicro Super Server/H11DSU-iN, BIOS 1.3 01/30/2020 [ 189.133482] RIP: 0010:native_smp_send_reschedule+0x34/0x40 [ 189.139114] Code: 05 31 9c 52 01 73 15 48 8b 05 a8 7f 2d 01 be fd 00 00 00 48 8b 40 30 e9 6a 8b db 00 89 fe 48 c7 c7 20 9e 21 82 e8 5c 1d 02 00 <0f> 0b c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 05 74 7f [ 189.158198] RSP: 0018:ffffc9001892fbc8 EFLAGS: 00010086 [ 189.163571] RAX: 0000000000000000 RBX: ffff889faa6f5200 RCX: ffffffff82454348 [ 189.170858] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff82b2cbec [ 189.178139] RBP: 0000000000028b00 R08: 0000000000000796 R09: 0000000000000000 [ 189.185420] R10: ffffc9001892fbb8 R11: 00000000000000f0 R12: 0000000000000008 [ 189.192706] R13: 0000000000000000 R14: ffff889faa6f589c R15: 0000000000000046 [ 189.199988] FS: 00007f7a26e6f800(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000 [ 189.208299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 189.214192] CR2: 00007f950950c8a0 CR3: 000000ff9f3b4000 CR4: 00000000003406f0 [ 189.221473] Call Trace: [ 189.224068] try_to_wake_up+0x3bd/0x5a0 [ 189.228045] check_start_timer_thread.part.12+0x2a/0x50 [ 189.233418] sender+0x65/0x70 [ 189.236527] i_ipmi_request+0x2de/0x9d0 [ 189.240507] ipmi_request_supply_msgs+0x102/0x130 [ 189.245358] ipmi_request_in_rc_mode+0x2f/0x80 [ 189.249944] ipmi_poweroff_chassis+0xa0/0x110 [ 189.254452] __do_sys_reboot+0x150/0x1e0 [ 189.258517] ? do_writev+0xd8/0x120 [ 189.262146] ? do_writev+0xd8/0x120 [ 189.265779] do_syscall_64+0x48/0x130 [ 189.269586] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 189.274782] RIP: 0033:0x7f7a2662a2a3 [ 189.278501] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 f3 c3 0f 1f 00 48 8b 15 b1 4b 2c 00 f7 d8 [ 189.297584] RSP: 002b:00007ffed7660078 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 [ 189.305376] RAX: ffffffffffffffda RBX: 000000004321fedc RCX: 00007f7a2662a2a3 [ 189.312663] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead [ 189.319944] RBP: 0000000000000000 R08: 0000000000000040 R09: 0000000000000005 [ 189.327224] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 [ 189.334512] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 189.341794] ---[ end trace 4c38720b40d3b851 ]---
Building them into the Linux kernel causes resource conflicts. Resolves: #1821
> linux CONFIG_BLK_DEV_NBD should be "m" not "y"
> This option enables access to the in-kernel headers that are generated during │ > the build process. These can be used to build eBPF tracing programs, │ > or similar programs. If you build the headers as a module, a module called │ > kheaders.ko is built which can be loaded on-demand to get access to headers.
On several AMD server and desktop systems, we observe NMI stalls, which sometimes even require a reboot. Add a patch by the Linux maintainer to print more information in these cases.
1. `CONFIG_SENSORS_K8TEMP=m` > If you say yes here you get support for the temperature sensor(s) inside > your CPU. Supported is whole AMD K8 microarchitecture. Please note that > you will need at least lm-sensors 2.10.1 for proper userspace support. > > This driver can also be built as a module. If so, the module will be > called k8temp. 2. `CONFIG_SENSORS_K10TEMP=m` > If you say yes here you get support for the temperature sensor(s) > inside your CPU. Supported are later revisions of the AMD Family 10h and > all revisions of the AMD Family 11h, 12h (Llano), 14h (Brazos), 15h > (Bulldozer/Trinity/Kaveri/Carrizo) and 16h (Kabini/Mullins) > microarchitectures. > > This driver can also be built as a module. If so, the module will be > called k10temp. 3. `CONFIG_SENSORS_FAM15H_POWER=m` > If you say yes here you get support for processor power information > of your AMD family 15h CPU. > > This driver can also be built as a module. If so, the module will be > called fam15h_power.
Building ipmi_msghandler as a module causes – as always – problems with the proprietary Nvidia driver. For whatever reasons, it depends on functions from the module, and is unable to load the module itself – probably because of our mxgfx indirection. 2020-06-17T13:56:09.272068+02:00 sigchld kernel: [ 0.000000] Linux version 5.4.46.mx64.337 (root@invidia.molgen.mpg.de) (gcc version 7.5.0 (GCC)) #1 SMP Tue Jun 16 23:32:15 CEST 2020 […] 2020-06-17T13:56:09.322119+02:00 sigchld kernel: [ 3.907200] nvidia: loading out-of-tree module taints kernel. 2020-06-17T13:56:09.322140+02:00 sigchld kernel: [ 3.911716] nvidia: module license 'NVIDIA' taints kernel. 2020-06-17T13:56:09.333611+02:00 sigchld kernel: [ 3.923028] nvidia: module verification failed: signature and/or required key missing - tainting kernel 2020-06-17T13:56:09.333620+02:00 sigchld kernel: [ 3.926029] nvidia: Unknown symbol ipmi_create_user (err -2) 2020-06-17T13:56:09.335472+02:00 sigchld kernel: [ 3.927879] nvidia: Unknown symbol ipmi_destroy_user (err -2) 2020-06-17T13:56:09.337338+02:00 sigchld kernel: [ 3.929720] nvidia: Unknown symbol ipmi_validate_addr (err -2) 2020-06-17T13:56:09.337342+02:00 sigchld kernel: [ 3.931552] nvidia: Unknown symbol ipmi_free_recv_msg (err -2) 2020-06-17T13:56:09.339180+02:00 sigchld kernel: [ 3.933377] nvidia: Unknown symbol ipmi_set_my_address (err -2) 2020-06-17T13:56:09.341000+02:00 sigchld kernel: [ 3.935221] nvidia: Unknown symbol ipmi_request_settime (err -2) 2020-06-17T13:56:09.342899+02:00 sigchld kernel: [ 3.937102] nvidia: Unknown symbol ipmi_set_gets_events (err -2) 2020-06-17T13:56:09.385602+02:00 sigchld kernel: [ 3.975577] nvidia_uvm: Unknown symbol nvUvmInterfaceDisableAccessCntr (err -2) 2020-06-17T13:56:09.385614+02:00 sigchld kernel: [ 3.977740] nvidia_uvm: Unknown symbol nvUvmInterfaceChannelDestroy (err -2) 2020-06-17T13:56:09.385615+02:00 sigchld kernel: [ 3.979796] nvidia_uvm: Unknown symbol nvUvmInterfaceQueryCaps (err -2) 2020-06-17T13:56:09.387549+02:00 sigchld kernel: [ 3.981756] nvidia_uvm: Unknown symbol nvUvmInterfaceUnsetPageDirectory (err -2) 2020-06-17T13:56:09.389361+02:00 sigchld kernel: [ 3.983558] nvidia_uvm: Unknown symbol nvUvmInterfaceInitAccessCntrInfo (err -2) 2020-06-17T13:56:09.391153+02:00 sigchld kernel: [ 3.985352] nvidia_uvm: Unknown symbol nvUvmInterfaceReleaseChannel (err -2) 2020-06-17T13:56:09.392781+02:00 sigchld kernel: [ 3.986986] nvidia_uvm: Unknown symbol nvUvmInterfaceMemoryAllocSys (err -2) 2020-06-17T13:56:09.394816+02:00 sigchld kernel: [ 3.989018] nvidia_uvm: Unknown symbol nvUvmInterfaceMemoryCpuMap (err -2) 2020-06-17T13:56:09.398324+02:00 sigchld kernel: [ 3.992539] nvidia_uvm: Unknown symbol nvUvmInterfaceRetainChannelResources (err -2) 2020-06-17T13:56:09.403240+02:00 sigchld kernel: [ 3.997423] nvidia_uvm: Unknown symbol nvUvmInterfacePmaFreePages (err -2) […] So partly revert commit 32c9443 (linux-5.4.46: Build IPMI drivers as modules), and build impi_msghandler into the Linux kernel.
Fix cosmetic issue, that two lines belonging together have a different log message. The line below is now printed in one line. 1. old: [ 0.979142] pci 0000:00:00.2: AMD-Vi: Extended features (0xf77ef22294ada): [ 0.979546] PPR NX GT IA GA PC GA_vAPIC 2. new: [ 0.979142] pci 0000:00:00.2: AMD-Vi: Extended features (0xf77ef22294ada): PPR NX GT IA GA PC GA_vAPIC
This simplies the interpretation of the values, as it is a bitmask.
fead469
to
bf04e46
Compare
Sign in
to join this conversation on GitHub.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tested on hypnotoad, sigchld, and sigfpe.