Skip to content

Commit

Permalink
Merge tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux…
Browse files Browse the repository at this point in the history
…/kernel/git/tip/tip

Pull x86 APIC updates from Thomas Gleixner:
 "Rework of APIC enumeration and topology evaluation.

  The current implementation has a couple of shortcomings:

   - It fails to handle hybrid systems correctly.

   - The APIC registration code which handles CPU number assignents is
     in the middle of the APIC code and detached from the topology
     evaluation.

   - The various mechanisms which enumerate APICs, ACPI, MPPARSE and
     guest specific ones, tweak global variables as they see fit or in
     case of XENPV just hack around the generic mechanisms completely.

   - The CPUID topology evaluation code is sprinkled all over the vendor
     code and reevaluates global variables on every hotplug operation.

   - There is no way to analyze topology on the boot CPU before bringing
     up the APs. This causes problems for infrastructure like PERF which
     needs to size certain aspects upfront or could be simplified if
     that would be possible.

   - The APIC admission and CPU number association logic is
     incomprehensible and overly complex and needs to be kept around
     after boot instead of completing this right after the APIC
     enumeration.

  This update addresses these shortcomings with the following changes:

   - Rework the CPUID evaluation code so it is common for all vendors
     and provides information about the APIC ID segments in a uniform
     way independent of the number of segments (Thread, Core, Module,
     ..., Die, Package) so that this information can be computed instead
     of rewriting global variables of dubious value over and over.

   - A few cleanups and simplifcations of the APIC, IO/APIC and related
     interfaces to prepare for the topology evaluation changes.

   - Seperation of the parser stages so the early evaluation which tries
     to find the APIC address can be seperately overridden from the late
     evaluation which enumerates and registers the local APIC as further
     preparation for sanitizing the topology evaluation.

   - A new registration and admission logic which

       - encapsulates the inner workings so that parsers and guest logic
         cannot longer fiddle in it

       - uses the APIC ID segments to build topology bitmaps at
         registration time

       - provides a sane admission logic

       - allows to detect the crash kernel case, where CPU0 does not run
         on the real BSP, automatically. This is required to prevent
         sending INIT/SIPI sequences to the real BSP which would reset
         the whole machine. This was so far handled by a tedious command
         line parameter, which does not even work in nested crash
         scenarios.

       - Associates CPU number after the enumeration completed and
         prevents the late registration of APICs, which was somehow
         tolerated before.

   - Converting all parsers and guest enumeration mechanisms over to the
     new interfaces.

     This allows to get rid of all global variable tweaking from the
     parsers and enumeration mechanisms and sanitizes the XEN[PV]
     handling so it can use CPUID evaluation for the first time.

   - Mopping up existing sins by taking the information from the APIC ID
     segment bitmaps.

     This evaluates hybrid systems correctly on the boot CPU and allows
     for cleanups and fixes in the related drivers, e.g. PERF.

  The series has been extensively tested and the minimal late fallout
  due to a broken ACPI/MADT table has been addressed by tightening the
  admission logic further"

* tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
  x86/topology: Ignore non-present APIC IDs in a present package
  x86/apic: Build the x86 topology enumeration functions on UP APIC builds too
  smp: Provide 'setup_max_cpus' definition on UP too
  smp: Avoid 'setup_max_cpus' namespace collision/shadowing
  x86/bugs: Use fixed addressing for VERW operand
  x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
  x86/cpu/topology: Provide __num_[cores|threads]_per_package
  x86/cpu/topology: Rename topology_max_die_per_package()
  x86/cpu/topology: Rename smp_num_siblings
  x86/cpu/topology: Retrieve cores per package from topology bitmaps
  x86/cpu/topology: Use topology logical mapping mechanism
  x86/cpu/topology: Provide logical pkg/die mapping
  x86/cpu/topology: Simplify cpu_mark_primary_thread()
  x86/cpu/topology: Mop up primary thread mask handling
  x86/cpu/topology: Use topology bitmaps for sizing
  x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT
  x86/xen/smp_pv: Count number of vCPUs early
  x86/cpu/topology: Assign hotpluggable CPUIDs during init
  x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug
  x86/topology: Add a mechanism to track topology via APIC IDs
  ...
  • Loading branch information
Linus Torvalds committed Mar 11, 2024
2 parents d08c407 + f0551af commit ca7e917
Show file tree
Hide file tree
Showing 86 changed files with 1,555 additions and 1,569 deletions.
7 changes: 2 additions & 5 deletions Documentation/admin-guide/kdump/kdump.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,7 @@ Dump-capture kernel config options (Arch Dependent, i386 and x86_64)
CPU is enough for kdump kernel to dump vmcore on most of systems.

However, you can also specify nr_cpus=X to enable multiple processors
in kdump kernel. In this case, "disable_cpu_apicid=" is needed to
tell kdump kernel which cpu is 1st kernel's BSP. Please refer to
admin-guide/kernel-parameters.txt for more details.
in kdump kernel.

With CONFIG_SMP=n, the above things are not related.

Expand Down Expand Up @@ -454,8 +452,7 @@ Notes on loading the dump-capture kernel:
to use multi-thread programs with it, such as parallel dump feature of
makedumpfile. Otherwise, the multi-thread program may have a great
performance degradation. To enable multi-cpu support, you should bring up an
SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
options while loading it.
SMP dump-capture kernel and specify maxcpus/nr_cpus options while loading it.

* For s390x there are two kdump modes: If a ELF header is specified with
the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
Expand Down
9 changes: 0 additions & 9 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1095,15 +1095,6 @@
Disable TLBIE instruction. Currently does not work
with KVM, with HASH MMU, or with coherent accelerators.

disable_cpu_apicid= [X86,APIC,SMP]
Format: <int>
The number of initial APIC ID for the
corresponding CPU to be disabled at boot,
mostly used for the kdump 2nd kernel to
disable BSP to wake up multiple CPUs without
causing system reset or hang due to sending
INIT from AP to BSP.

disable_ddw [PPC/PSERIES,EARLY]
Disable Dynamic DMA Window support. Use this
to workaround buggy firmware.
Expand Down
24 changes: 9 additions & 15 deletions Documentation/arch/x86/topology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,21 @@ AMD nomenclature for package is 'Node'.

Package-related topology information in the kernel:

- cpuinfo_x86.x86_max_cores:
- topology_num_threads_per_package()

The number of cores in a package. This information is retrieved via CPUID.
The number of threads in a package.

- cpuinfo_x86.x86_max_dies:
- topology_num_cores_per_package()

The number of dies in a package. This information is retrieved via CPUID.
The number of cores in a package.

- topology_max_dies_per_package()

The maximum number of dies in a package.

- cpuinfo_x86.topo.die_id:

The physical ID of the die. This information is retrieved via CPUID.
The physical ID of the die.

- cpuinfo_x86.topo.pkg_id:

Expand Down Expand Up @@ -96,16 +100,6 @@ are SMT- or CMT-type threads.
AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses
"core".

Core-related topology information in the kernel:

- smp_num_siblings:

The number of threads in a core. The number of threads in a package can be
calculated by::

threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings


Threads
=======
A thread is a single scheduling unit. It's the equivalent to a logical Linux
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/events/amd/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -579,7 +579,7 @@ static void amd_pmu_cpu_starting(int cpu)
if (!x86_pmu.amd_nb_constraints)
return;

nb_id = topology_die_id(cpu);
nb_id = topology_amd_node_id(cpu);
WARN_ON_ONCE(nb_id == BAD_APICID);

for_each_online_cpu(i) {
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/events/intel/cstate.c
Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,7 @@ static int __init cstate_init(void)
}

if (has_cstate_pkg) {
if (topology_max_die_per_package() > 1) {
if (topology_max_dies_per_package() > 1) {
err = perf_pmu_register(&cstate_pkg_pmu,
"cstate_die", -1);
} else {
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/events/intel/uncore.c
Original file line number Diff line number Diff line change
Expand Up @@ -1893,7 +1893,7 @@ static int __init intel_uncore_init(void)
return -ENODEV;

__uncore_max_dies =
topology_max_packages() * topology_max_die_per_package();
topology_max_packages() * topology_max_dies_per_package();

id = x86_match_cpu(intel_uncore_match);
if (!id) {
Expand Down
4 changes: 2 additions & 2 deletions arch/x86/events/intel/uncore_nhmex.c
Original file line number Diff line number Diff line change
Expand Up @@ -1221,8 +1221,8 @@ void nhmex_uncore_cpu_init(void)
uncore_nhmex = true;
else
nhmex_uncore_mbox.event_descs = wsmex_uncore_mbox_events;
if (nhmex_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
nhmex_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
if (nhmex_uncore_cbox.num_boxes > topology_num_cores_per_package())
nhmex_uncore_cbox.num_boxes = topology_num_cores_per_package();
uncore_msr_uncores = nhmex_msr_uncores;
}
/* end of Nehalem-EX uncore support */
8 changes: 4 additions & 4 deletions arch/x86/events/intel/uncore_snb.c
Original file line number Diff line number Diff line change
Expand Up @@ -364,8 +364,8 @@ static struct intel_uncore_type *snb_msr_uncores[] = {
void snb_uncore_cpu_init(void)
{
uncore_msr_uncores = snb_msr_uncores;
if (snb_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
snb_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
if (snb_uncore_cbox.num_boxes > topology_num_cores_per_package())
snb_uncore_cbox.num_boxes = topology_num_cores_per_package();
}

static void skl_uncore_msr_init_box(struct intel_uncore_box *box)
Expand Down Expand Up @@ -428,8 +428,8 @@ static struct intel_uncore_type *skl_msr_uncores[] = {
void skl_uncore_cpu_init(void)
{
uncore_msr_uncores = skl_msr_uncores;
if (skl_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
skl_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
if (skl_uncore_cbox.num_boxes > topology_num_cores_per_package())
skl_uncore_cbox.num_boxes = topology_num_cores_per_package();
snb_uncore_arb.ops = &skl_uncore_msr_ops;
}

Expand Down
18 changes: 9 additions & 9 deletions arch/x86/events/intel/uncore_snbep.c
Original file line number Diff line number Diff line change
Expand Up @@ -1172,8 +1172,8 @@ static struct intel_uncore_type *snbep_msr_uncores[] = {

void snbep_uncore_cpu_init(void)
{
if (snbep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
snbep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
if (snbep_uncore_cbox.num_boxes > topology_num_cores_per_package())
snbep_uncore_cbox.num_boxes = topology_num_cores_per_package();
uncore_msr_uncores = snbep_msr_uncores;
}

Expand Down Expand Up @@ -1406,7 +1406,7 @@ static int topology_gidnid_map(int nodeid, u32 gidnid)
*/
for (i = 0; i < 8; i++) {
if (nodeid == GIDNIDMAP(gidnid, i)) {
if (topology_max_die_per_package() > 1)
if (topology_max_dies_per_package() > 1)
die_id = i;
else
die_id = topology_phys_to_logical_pkg(i);
Expand Down Expand Up @@ -1845,8 +1845,8 @@ static struct intel_uncore_type *ivbep_msr_uncores[] = {

void ivbep_uncore_cpu_init(void)
{
if (ivbep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
ivbep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
if (ivbep_uncore_cbox.num_boxes > topology_num_cores_per_package())
ivbep_uncore_cbox.num_boxes = topology_num_cores_per_package();
uncore_msr_uncores = ivbep_msr_uncores;
}

Expand Down Expand Up @@ -2917,8 +2917,8 @@ static bool hswep_has_limit_sbox(unsigned int device)

void hswep_uncore_cpu_init(void)
{
if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
if (hswep_uncore_cbox.num_boxes > topology_num_cores_per_package())
hswep_uncore_cbox.num_boxes = topology_num_cores_per_package();

/* Detect 6-8 core systems with only two SBOXes */
if (hswep_has_limit_sbox(HSWEP_PCU_DID))
Expand Down Expand Up @@ -3280,8 +3280,8 @@ static struct event_constraint bdx_uncore_pcu_constraints[] = {

void bdx_uncore_cpu_init(void)
{
if (bdx_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
bdx_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
if (bdx_uncore_cbox.num_boxes > topology_num_cores_per_package())
bdx_uncore_cbox.num_boxes = topology_num_cores_per_package();
uncore_msr_uncores = bdx_msr_uncores;

/* Detect systems with no SBOXes */
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/events/rapl.c
Original file line number Diff line number Diff line change
Expand Up @@ -674,7 +674,7 @@ static const struct attribute_group *rapl_attr_update[] = {

static int __init init_rapl_pmus(void)
{
int maxdie = topology_max_packages() * topology_max_die_per_package();
int maxdie = topology_max_packages() * topology_max_dies_per_package();
size_t size;

size = sizeof(*rapl_pmus) + maxdie * sizeof(struct rapl_pmu *);
Expand Down
5 changes: 3 additions & 2 deletions arch/x86/hyperv/hv_vtl.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@ void __init hv_vtl_init_platform(void)
x86_init.timers.timer_init = x86_init_noop;

/* Avoid searching for BIOS MP tables */
x86_init.mpparse.find_smp_config = x86_init_noop;
x86_init.mpparse.get_smp_config = x86_init_uint_noop;
x86_init.mpparse.find_mptable = x86_init_noop;
x86_init.mpparse.early_parse_smp_cfg = x86_init_noop;
x86_init.mpparse.parse_smp_cfg = x86_init_noop;

x86_platform.get_wallclock = get_rtc_noop;
x86_platform.set_wallclock = set_rtc_noop;
Expand Down
22 changes: 14 additions & 8 deletions arch/x86/include/asm/apic.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ extern void x86_32_probe_apic(void);
static inline void x86_32_probe_apic(void) { }
#endif

extern u32 cpuid_to_apicid[];

#define CPU_ACPIID_INVALID U32_MAX

#ifdef CONFIG_X86_LOCAL_APIC

extern int apic_verbosity;
Expand All @@ -54,8 +58,6 @@ extern int local_apic_timer_c2_ok;
extern bool apic_is_disabled;
extern unsigned int lapic_timer_period;

extern u32 cpuid_to_apicid[];

extern enum apic_intr_mode_id apic_intr_mode;
enum apic_intr_mode_id {
APIC_PIC,
Expand Down Expand Up @@ -169,6 +171,14 @@ extern bool apic_needs_pit(void);

extern void apic_send_IPI_allbutself(unsigned int vector);

extern void topology_register_apic(u32 apic_id, u32 acpi_id, bool present);
extern void topology_register_boot_apic(u32 apic_id);
extern int topology_hotplug_apic(u32 apic_id, u32 acpi_id);
extern void topology_hotunplug_apic(unsigned int cpu);
extern void topology_apply_cmdline_limits_early(void);
extern void topology_init_possible_cpus(void);
extern void topology_reset_possible_cpus_up(void);

#else /* !CONFIG_X86_LOCAL_APIC */
static inline void lapic_shutdown(void) { }
#define local_apic_timer_c2_ok 1
Expand All @@ -183,6 +193,8 @@ static inline void apic_intr_mode_init(void) { }
static inline void lapic_assign_system_vectors(void) { }
static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
static inline bool apic_needs_pit(void) { return true; }
static inline void topology_apply_cmdline_limits_early(void) { }
static inline void topology_init_possible_cpus(void) { }
#endif /* !CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_X86_X2APIC
Expand Down Expand Up @@ -289,16 +301,11 @@ struct apic {
/* Probe, setup and smpboot functions */
int (*probe)(void);
int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
bool (*apic_id_registered)(void);

bool (*check_apicid_used)(physid_mask_t *map, u32 apicid);
void (*init_apic_ldr)(void);
void (*ioapic_phys_id_map)(physid_mask_t *phys_map, physid_mask_t *retmap);
u32 (*cpu_present_to_apicid)(int mps_cpu);
u32 (*phys_pkg_id)(u32 cpuid_apic, int index_msb);

u32 (*get_apic_id)(u32 id);
u32 (*set_apic_id)(u32 apicid);

/* wakeup_secondary_cpu */
int (*wakeup_secondary_cpu)(u32 apicid, unsigned long start_eip);
Expand Down Expand Up @@ -527,7 +534,6 @@ extern int default_apic_id_valid(u32 apicid);
extern u32 apic_default_calc_apicid(unsigned int cpu);
extern u32 apic_flat_calc_apicid(unsigned int cpu);

extern void default_ioapic_phys_id_map(physid_mask_t *phys_map, physid_mask_t *retmap);
extern u32 default_cpu_present_to_apicid(int mps_cpu);

void apic_send_nmi_to_offline_cpu(unsigned int cpu);
Expand Down
10 changes: 1 addition & 9 deletions arch/x86/include/asm/cpu.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,10 @@
#include <linux/percpu.h>
#include <asm/ibt.h>

#ifdef CONFIG_SMP

extern void prefill_possible_map(void);

#else /* CONFIG_SMP */

static inline void prefill_possible_map(void) {}

#ifndef CONFIG_SMP
#define cpu_physical_id(cpu) boot_cpu_physical_apicid
#define cpu_acpi_id(cpu) 0
#define safe_smp_processor_id() 0

#endif /* CONFIG_SMP */

#ifdef CONFIG_HOTPLUG_CPU
Expand Down
36 changes: 36 additions & 0 deletions arch/x86/include/asm/cpuid.h
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,42 @@ static inline unsigned int cpuid_edx(unsigned int op)
return edx;
}

static inline void __cpuid_read(unsigned int leaf, unsigned int subleaf, u32 *regs)
{
regs[CPUID_EAX] = leaf;
regs[CPUID_ECX] = subleaf;
__cpuid(regs + CPUID_EAX, regs + CPUID_EBX, regs + CPUID_ECX, regs + CPUID_EDX);
}

#define cpuid_subleaf(leaf, subleaf, regs) { \
static_assert(sizeof(*(regs)) == 16); \
__cpuid_read(leaf, subleaf, (u32 *)(regs)); \
}

#define cpuid_leaf(leaf, regs) { \
static_assert(sizeof(*(regs)) == 16); \
__cpuid_read(leaf, 0, (u32 *)(regs)); \
}

static inline void __cpuid_read_reg(unsigned int leaf, unsigned int subleaf,
enum cpuid_regs_idx regidx, u32 *reg)
{
u32 regs[4];

__cpuid_read(leaf, subleaf, regs);
*reg = regs[regidx];
}

#define cpuid_subleaf_reg(leaf, subleaf, regidx, reg) { \
static_assert(sizeof(*(reg)) == 4); \
__cpuid_read_reg(leaf, subleaf, regidx, (u32 *)(reg)); \
}

#define cpuid_leaf_reg(leaf, regidx, reg) { \
static_assert(sizeof(*(reg)) == 4); \
__cpuid_read_reg(leaf, 0, regidx, (u32 *)(reg)); \
}

static __always_inline bool cpuid_function_is_indexed(u32 function)
{
switch (function) {
Expand Down
1 change: 0 additions & 1 deletion arch/x86/include/asm/io_apic.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,6 @@ extern void mask_ioapic_entries(void);
extern int restore_ioapic_entries(void);

extern void setup_ioapic_ids_from_mpc(void);
extern void setup_ioapic_ids_from_mpc_nocheck(void);

extern int mp_find_ioapic(u32 gsi);
extern int mp_find_ioapic_pin(int ioapic, u32 gsi);
Expand Down
Loading

0 comments on commit ca7e917

Please sign in to comment.