Skip to content

Commit

Permalink
PCI: Allow numa_node override via sysfs
Browse files Browse the repository at this point in the history
NUMA systems with ACPI normally describe the physical topology via _PXM
methods.  But many BIOSes don't implement _PXM, which leaves the kernel
with no way to discover the device topology, which reduces performance
because we can't put memory and processes close to the device.

The NUMA node of a PCI device is already exported in the sysfs "numa_node"
file.  Make that file writable so users can workaround the lack of _PXM
methods in the BIOS.  For example:

  echo 3 > /sys/devices/pci0000:ff/0000:03:1f.3/numa_node

sets the node for PCI device 0000:03:1f.3.

Writing the file emits a FW_BUG warning to encourage users to request
firmware updates.  It also taints the kernel with TAINT_FIRMWARE_WORKAROUND
because overriding the node incorrectly can cause performance issues.

[bhelgaas: changelog, documentation text]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Myron Stowe <mstowe@redhat.com>
CC: Alexander Ducyk <alexander.h.duyck@redhat.com>
CC: Jiang Liu <jiang.liu@linux.intel.com>
  • Loading branch information
Prarit Bhargava authored and Bjorn Helgaas committed Nov 6, 2014
1 parent f114040 commit 63692df
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 1 deletion.
13 changes: 13 additions & 0 deletions Documentation/ABI/testing/sysfs-bus-pci
Original file line number Diff line number Diff line change
Expand Up @@ -281,3 +281,16 @@ Description:
opt-out of driver binding using a driver_override name such as
"none". Only a single driver may be specified in the override,
there is no support for parsing delimiters.

What: /sys/bus/pci/devices/.../numa_node
Date: Oct 2014
Contact: Prarit Bhargava <prarit@redhat.com>
Description:
This file contains the NUMA node to which the PCI device is
attached, or -1 if the node is unknown. The initial value
comes from an ACPI _PXM method or a similar firmware
source. If that is missing or incorrect, this file can be
written to override the node. In that case, please report
a firmware bug to the system vendor. Writing to this file
taints the kernel with TAINT_FIRMWARE_WORKAROUND, which
reduces the supportability of your system.
27 changes: 26 additions & 1 deletion drivers/pci/pci-sysfs.c
Original file line number Diff line number Diff line change
Expand Up @@ -221,12 +221,37 @@ static ssize_t enabled_show(struct device *dev, struct device_attribute *attr,
static DEVICE_ATTR_RW(enabled);

#ifdef CONFIG_NUMA
static ssize_t numa_node_store(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
int node, ret;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;

ret = kstrtoint(buf, 0, &node);
if (ret)
return ret;

if (!node_online(node))
return -EINVAL;

add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
dev_alert(&pdev->dev, FW_BUG "Overriding NUMA node to %d. Contact your vendor for updates.",
node);

dev->numa_node = node;
return count;
}

static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
return sprintf(buf, "%d\n", dev->numa_node);
}
static DEVICE_ATTR_RO(numa_node);
static DEVICE_ATTR_RW(numa_node);
#endif

static ssize_t dma_mask_bits_show(struct device *dev,
Expand Down

0 comments on commit 63692df

Please sign in to comment.