Skip to content

Commit

Permalink
Merge branch 'netconsole-cpu-population'
Browse files Browse the repository at this point in the history
Breno Leitao says:

====================
netconsole: Add support for CPU population

The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.

As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.

Example of a message got from produciton hosts:

	------------[ cut here ]------------
	------------[ cut here ]------------
	refcount_t: saturated; leaking memory.
	WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
	refcount_t: addition on 0; use-after-free.
	WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
	Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)

This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:

* Improved demultiplexing of parallel log output: When multiple CPUs are
  sending messages concurrently, the added CPU number in the userdata
  makes it easier to differentiate and attribute the messages to their
  originating CPUs.

* Better visibility into message sources: The CPU number information
  gives users and developers more insight into which specific CPU a
  particular log message came from, which can be valuable for debugging
  and analysis.

The changes in this series are as follows Patches::

Patch "consolidate send buffers into netconsole_target struct"
=================================================

Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().

Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao

Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.

Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)

Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata

Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space

Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature

Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata

Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature

I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.

PS: This patchset is on top of the patch that created
netcons_fragmented_msg selftest:

https://lore.kernel.org/all/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@debian.org/

---
Changes in v5:
- Fixed a kernel doc syntax syntax (Simon)
- Link to v4: https://lore.kernel.org/r/20250204-netcon_cpu-v4-0-9480266ef556@debian.org

Changes in v4:
- Fixed Kernel doc for netconsole_target (Simon)
- Fixed a typo in disable_sysdata_feature (Simon)
- Improved sysdata_cpu_nr_show() to return !! in a bit-wise operation
- Link to v3: https://lore.kernel.org/r/20250124-netcon_cpu-v3-0-12a0d286ba1d@debian.org

Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
  the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
  a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org

Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
  easier to understand, and the code easier to read.
  * Basically extradata encompasses userdata and the new sysdata.
    Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
  checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
====================

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Feb 10, 2025
2 parents acdefab + a7aec70 commit 34c84b3
Show file tree
Hide file tree
Showing 5 changed files with 427 additions and 64 deletions.
45 changes: 45 additions & 0 deletions Documentation/networking/netconsole.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Release prepend support by Breno Leitao <leitao@debian.org>, Jul 7 2023

Userdata append support by Matthew Wood <thepacketgeek@gmail.com>, Jan 22 2024

Sysdata append support by Breno Leitao <leitao@debian.org>, Jan 15 2025

Please send bug reports to Matt Mackall <mpm@selenic.com>
Satyam Sharma <satyam.sharma@gmail.com>, and Cong Wang <xiyou.wangcong@gmail.com>

Expand Down Expand Up @@ -238,6 +240,49 @@ Delete `userdata` entries with `rmdir`::

It is recommended to not write user data values with newlines.

CPU number auto population in userdata
--------------------------------------

Inside the netconsole configfs hierarchy, there is a file called
`cpu_nr` under the `userdata` directory. This file is used to enable or disable
the automatic CPU number population feature. This feature automatically
populates the CPU number that is sending the message.

To enable the CPU number auto-population::

echo 1 > /sys/kernel/config/netconsole/target1/userdata/cpu_nr

When this option is enabled, the netconsole messages will include an additional
line in the userdata field with the format `cpu=<cpu_number>`. This allows the
receiver of the netconsole messages to easily differentiate and demultiplex
messages originating from different CPUs, which is particularly useful when
dealing with parallel log output.

Example::

echo "This is a message" > /dev/kmsg
12,607,22085407756,-;This is a message
cpu=42

In this example, the message was sent by CPU 42.

.. note::

If the user has set a conflicting `cpu` key in the userdata dictionary,
both keys will be reported, with the kernel-populated entry appearing after
the user one. For example::

# User-defined CPU entry
mkdir -p /sys/kernel/config/netconsole/target1/userdata/cpu
echo "1" > /sys/kernel/config/netconsole/target1/userdata/cpu/value

Output might look like::

12,607,22085407756,-;This is a message
cpu=1
cpu=42 # kernel-populated value


Extended console:
=================

Expand Down
Loading

0 comments on commit 34c84b3

Please sign in to comment.