Skip to content

Commit

Permalink
tools/power turbostat: disable "cpuidle" invocation counters, by default
Browse files Browse the repository at this point in the history
Create "pct_idle" counter group, the sofware notion of residency
so it can now be singled out, independent of other counter groups.

Create "cpuidle" group, the cpuidle invocation counts.
Disable "cpuidle", by default.

Create "swidle" = "cpuidle" + "pct_idle".
Undocument "sysfs", the old name for "swidle", but keep it working
for backwards compatibilty.

Create "hwidle", all the HW idle counters

Modify "idle", enabled by default
"idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle")

Signed-off-by: Len Brown <len.brown@intel.com>
  • Loading branch information
Len Brown committed Apr 6, 2025
1 parent 9946338 commit ec4acd3
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 13 deletions.
12 changes: 6 additions & 6 deletions tools/power/x86/turbostat/turbostat.8
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ The column name "all" can be used to enable all disabled-by-default built-in cou
.PP
\fB--show column\fP show only the specified built-in columns. May be invoked multiple times, or with a comma-separated list of column names.
.PP
\fB--show CATEGORY --hide CATEGORY\fP Show and hide also accept a single CATEGORY of columns: "all", "topology", "idle", "frequency", "power", "sysfs", "other".
\fB--show CATEGORY --hide CATEGORY\fP Show and hide also accept a single CATEGORY of columns: "all", "topology", "idle", "frequency", "power", "cpuidle", "hwidle", "swidle", "other". "idle" (enabled by default), includes "hwidle" and "idle_pct". "cpuidle" (default disabled) includes cpuidle software invocation counters. "swidle" includes "cpuidle" plus "idle_pct". "hwidle" includes only hardware based idle residency counters. Older versions of turbostat used the term "sysfs" for what is now "swidle".
.PP
\fB--Dump\fP displays the raw counter values.
.PP
Expand Down Expand Up @@ -158,15 +158,15 @@ The system configuration dump (if --quiet is not used) is followed by statistics
.PP
\fBSMI\fP The number of System Management Interrupts serviced CPU during the measurement interval. While this counter is actually per-CPU, SMI are triggered on all processors, so the number should be the same for all CPUs.
.PP
\fBC1, C2, C3...\fP The number times Linux requested the C1, C2, C3 idle state during the measurement interval. The system summary line shows the sum for all CPUs. These are C-state names as exported in /sys/devices/system/cpu/cpu*/cpuidle/state*/name. While their names are generic, their attributes are processor specific. They the system description section of output shows what MWAIT sub-states they are mapped to on each system.
\fBC1, C2, C3...\fP The number times Linux requested the C1, C2, C3 idle state during the measurement interval. The system summary line shows the sum for all CPUs. These are C-state names as exported in /sys/devices/system/cpu/cpu*/cpuidle/state*/name. While their names are generic, their attributes are processor specific. They the system description section of output shows what MWAIT sub-states they are mapped to on each system. These counters are in the "cpuidle" group, which is disabled, by default.
.PP
\fBC1+, C2+, C3+...\fP The idle governor idle state misprediction statistics. Inidcates the number times Linux requested the C1, C2, C3 idle state during the measurement interval, but should have requested a deeper idle state (if it exists and enabled). These statistics come from the /sys/devices/system/cpu/cpu*/cpuidle/state*/below file.
\fBC1+, C2+, C3+...\fP The idle governor idle state misprediction statistics. Inidcates the number times Linux requested the C1, C2, C3 idle state during the measurement interval, but should have requested a deeper idle state (if it exists and enabled). These statistics come from the /sys/devices/system/cpu/cpu*/cpuidle/state*/below file. These counters are in the "cpuidle" group, which is disabled, by default.
.PP
\fBC1-, C2-, C3-...\fP The idle governor idle state misprediction statistics. Inidcates the number times Linux requested the C1, C2, C3 idle state during the measurement interval, but should have requested a shallower idle state (if it exists and enabled). These statistics come from the /sys/devices/system/cpu/cpu*/cpuidle/state*/above file.
\fBC1-, C2-, C3-...\fP The idle governor idle state misprediction statistics. Inidcates the number times Linux requested the C1, C2, C3 idle state during the measurement interval, but should have requested a shallower idle state (if it exists and enabled). These statistics come from the /sys/devices/system/cpu/cpu*/cpuidle/state*/above file. These counters are in the "cpuidle" group, which is disabled, by default.
.PP
\fBC1%, C2%, C3%\fP The residency percentage that Linux requested C1, C2, C3.... The system summary is the average of all CPUs in the system. Note that these are software, reflecting what was requested. The hardware counters reflect what was actually achieved.
\fBC1%, C2%, C3%\fP The residency percentage that Linux requested C1, C2, C3.... The system summary is the average of all CPUs in the system. Note that these are software, reflecting what was requested. The hardware counters reflect what was actually achieved. These counters are in the "pct_idle" group, which is enabled by default.
.PP
\fBCPU%c1, CPU%c3, CPU%c6, CPU%c7\fP show the percentage residency in hardware core idle states. These numbers are from hardware residency counters.
\fBCPU%c1, CPU%c3, CPU%c6, CPU%c7\fP show the percentage residency in hardware core idle states. These numbers are from hardware residency counters and are in the "hwidle" group, which is enabled, by default.
.PP
\fBCoreTmp\fP Degrees Celsius reported by the per-core Digital Thermal Sensor.
.PP
Expand Down
34 changes: 27 additions & 7 deletions tools/power/x86/turbostat/turbostat.c
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ struct msr_counter bic[] = {
{ 0x0, "TSC_MHz", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "IRQ", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "SMI", NULL, 32, 0, FORMAT_DELTA, NULL, 0 },
{ 0x0, "sysfs", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "cpuidle", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "CPU%c1", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "CPU%c3", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "CPU%c6", NULL, 0, 0, 0, NULL, 0 },
Expand Down Expand Up @@ -206,6 +206,7 @@ struct msr_counter bic[] = {
{ 0x0, "Sys_J", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "NMI", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "CPU%c1e", NULL, 0, 0, 0, NULL, 0 },
{ 0x0, "pct_idle", NULL, 0, 0, 0, NULL, 0 },
};

#define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter))
Expand All @@ -219,7 +220,7 @@ struct msr_counter bic[] = {
#define BIC_TSC_MHz (1ULL << 7)
#define BIC_IRQ (1ULL << 8)
#define BIC_SMI (1ULL << 9)
#define BIC_sysfs (1ULL << 10)
#define BIC_cpuidle (1ULL << 10)
#define BIC_CPU_c1 (1ULL << 11)
#define BIC_CPU_c3 (1ULL << 12)
#define BIC_CPU_c6 (1ULL << 13)
Expand Down Expand Up @@ -272,17 +273,20 @@ struct msr_counter bic[] = {
#define BIC_Sys_J (1ULL << 60)
#define BIC_NMI (1ULL << 61)
#define BIC_CPU_c1e (1ULL << 62)
#define BIC_pct_idle (1ULL << 63)

#define BIC_GROUP_TOPOLOGY (BIC_Package | BIC_Node | BIC_CoreCnt | BIC_PkgCnt | BIC_Core | BIC_CPU | BIC_Die)
#define BIC_GROUP_THERMAL_PWR (BIC_CoreTmp | BIC_PkgTmp | BIC_PkgWatt | BIC_CorWatt | BIC_GFXWatt | BIC_RAMWatt | BIC_PKG__ | BIC_RAM__ | BIC_SysWatt)
#define BIC_GROUP_FREQUENCY (BIC_Avg_MHz | BIC_Busy | BIC_Bzy_MHz | BIC_TSC_MHz | BIC_GFXMHz | BIC_GFXACTMHz | BIC_SAMMHz | BIC_SAMACTMHz | BIC_UNCORE_MHZ)
#define BIC_GROUP_IDLE (BIC_Busy | BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_SAM_mc6 | BIC_Diec6)
#define BIC_GROUP_HW_IDLE (BIC_Busy | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_SAM_mc6 | BIC_Diec6)
#define BIC_GROUP_SW_IDLE (BIC_Busy | BIC_cpuidle | BIC_pct_idle )
#define BIC_GROUP_IDLE (BIC_GROUP_HW_IDLE | BIC_pct_idle)
#define BIC_OTHER (BIC_IRQ | BIC_NMI | BIC_SMI | BIC_ThreadC | BIC_CoreTmp | BIC_IPC)

#define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC)
#define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC | BIC_cpuidle)

unsigned long long bic_enabled = (0xFFFFFFFFFFFFFFFFULL & ~BIC_DISABLED_BY_DEFAULT);
unsigned long long bic_present = BIC_USEC | BIC_TOD | BIC_sysfs | BIC_APIC | BIC_X2APIC;
unsigned long long bic_present = BIC_USEC | BIC_TOD | BIC_cpuidle | BIC_pct_idle | BIC_APIC | BIC_X2APIC;

#define DO_BIC(COUNTER_NAME) (bic_enabled & bic_present & COUNTER_NAME)
#define DO_BIC_READ(COUNTER_NAME) (bic_present & COUNTER_NAME)
Expand Down Expand Up @@ -2362,6 +2366,15 @@ unsigned long long bic_lookup(char *name_list, enum show_hide_mode mode)
} else if (!strcmp(name_list, "idle")) {
retval |= BIC_GROUP_IDLE;
break;
} else if (!strcmp(name_list, "swidle")) {
retval |= BIC_GROUP_SW_IDLE;
break;
} else if (!strcmp(name_list, "sysfs")) { /* legacy compatibility */
retval |= BIC_GROUP_SW_IDLE;
break;
} else if (!strcmp(name_list, "hwidle")) {
retval |= BIC_GROUP_HW_IDLE;
break;
} else if (!strcmp(name_list, "frequency")) {
retval |= BIC_GROUP_FREQUENCY;
break;
Expand All @@ -2372,6 +2385,7 @@ unsigned long long bic_lookup(char *name_list, enum show_hide_mode mode)

}
if (i == MAX_BIC) {
fprintf(stderr, "deferred %s\n", name_list);
if (mode == SHOW_LIST) {
deferred_add_names[deferred_add_index++] = name_list;
if (deferred_add_index >= MAX_DEFERRED) {
Expand Down Expand Up @@ -10269,6 +10283,9 @@ void probe_cpuidle_residency(void)
int min_state = 1024, max_state = 0;
char *sp;

if (!DO_BIC(BIC_pct_idle))
return;

for (state = 10; state >= 0; --state) {

sprintf(path, "/sys/devices/system/cpu/cpu%d/cpuidle/state%d/name", base_cpu, state);
Expand All @@ -10291,7 +10308,7 @@ void probe_cpuidle_residency(void)

sprintf(path, "cpuidle/state%d/time", state);

if (!DO_BIC(BIC_sysfs) && !is_deferred_add(name_buf))
if (!DO_BIC(BIC_pct_idle) && !is_deferred_add(name_buf))
continue;

if (is_deferred_skip(name_buf))
Expand All @@ -10315,6 +10332,9 @@ void probe_cpuidle_counts(void)
int min_state = 1024, max_state = 0;
char *sp;

if (!DO_BIC(BIC_cpuidle))
return;

for (state = 10; state >= 0; --state) {

sprintf(path, "/sys/devices/system/cpu/cpu%d/cpuidle/state%d/name", base_cpu, state);
Expand All @@ -10327,7 +10347,7 @@ void probe_cpuidle_counts(void)

remove_underbar(name_buf);

if (!DO_BIC(BIC_sysfs) && !is_deferred_add(name_buf))
if (!DO_BIC(BIC_cpuidle) && !is_deferred_add(name_buf))
continue;

if (is_deferred_skip(name_buf))
Expand Down

0 comments on commit ec4acd3

Please sign in to comment.