Skip to content

Commit

Permalink
Merge branches 'pm-em' and 'pm-cpuidle'
Browse files Browse the repository at this point in the history
Marge Energy Model support updates and cpuidle updates for 5.19-rc1:

 - Update the Energy Model support code to allow the Energy Model to be
   artificial, which means that the power values may not be on a uniform
   scale with other devices providing power information, and update the
   cpufreq_cooling and devfreq_cooling thermal drivers to support
   artificial Energy Models (Lukasz Luba).

 - Make DTPM check the Energy Model type (Lukasz Luba).

 - Fix policy counter decrementation in cpufreq if Energy Model is in
   use (Pierre Gondois).

 - Add AlderLake processor support to the intel_idle driver (Zhang Rui).

 - Fix regression leading to no genpd governor in the PSCI cpuidle
   driver and fix the riscv-sbi cpuidle driver to allow a genpd
   governor to be used (Ulf Hansson).

* pm-em:
  PM: EM: Decrement policy counter
  powercap: DTPM: Check for Energy Model type
  thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling
  Documentation: EM: Add artificial EM registration description
  PM: EM: Remove old debugfs files and print all 'flags'
  PM: EM: Change the order of arguments in the .active_power() callback
  PM: EM: Use the new .get_cost() callback while registering EM
  PM: EM: Add artificial EM flag
  PM: EM: Add .get_cost() callback

* pm-cpuidle:
  cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used
  cpuidle: psci: Fix regression leading to no genpd governor
  intel_idle: Add AlderLake support
  • Loading branch information
Rafael J. Wysocki committed May 23, 2022
3 parents 95f2ce5 + c9d8923 + a6653fb commit 16a23f3
Show file tree
Hide file tree
Showing 12 changed files with 240 additions and 51 deletions.
24 changes: 22 additions & 2 deletions Documentation/power/energy-model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,26 @@ allows a platform to register EM power values which are reflecting total power
(static + dynamic). These power values might be coming directly from
experiments and measurements.

Registration of 'artificial' EM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is an option to provide a custom callback for drivers missing detailed
knowledge about power value for each performance state. The callback
.get_cost() is optional and provides the 'cost' values used by the EAS.
This is useful for platforms that only provide information on relative
efficiency between CPU types, where one could use the information to
create an abstract power model. But even an abstract power model can
sometimes be hard to fit in, given the input power value size restrictions.
The .get_cost() allows to provide the 'cost' values which reflect the
efficiency of the CPUs. This would allow to provide EAS information which
has different relation than what would be forced by the EM internal
formulas calculating 'cost' values. To register an EM for such platform, the
driver must set the flag 'milliwatts' to 0, provide .get_power() callback
and provide .get_cost() callback. The EM framework would handle such platform
properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
platform. Special care should be taken by other frameworks which are using EM
to test and treat this flag properly.

Registration of 'simple' EM
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -181,8 +201,8 @@ EM framework::

-> drivers/cpufreq/foo_cpufreq.c

01 static int est_power(unsigned long *mW, unsigned long *KHz,
02 struct device *dev)
01 static int est_power(struct device *dev, unsigned long *mW,
02 unsigned long *KHz)
03 {
04 long freq, power;
05
Expand Down
4 changes: 2 additions & 2 deletions drivers/cpufreq/mediatek-cpufreq-hw.c
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ static const u16 cpufreq_mtk_offsets[REG_ARRAY_SIZE] = {
};

static int __maybe_unused
mtk_cpufreq_get_cpu_power(unsigned long *mW,
unsigned long *KHz, struct device *cpu_dev)
mtk_cpufreq_get_cpu_power(struct device *cpu_dev, unsigned long *mW,
unsigned long *KHz)
{
struct mtk_cpufreq_data *data;
struct cpufreq_policy *policy;
Expand Down
4 changes: 2 additions & 2 deletions drivers/cpufreq/scmi-cpufreq.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,8 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
}

static int __maybe_unused
scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
struct device *cpu_dev)
scmi_get_cpu_power(struct device *cpu_dev, unsigned long *power,
unsigned long *KHz)
{
unsigned long Hz;
int ret, domain;
Expand Down
4 changes: 2 additions & 2 deletions drivers/cpuidle/cpuidle-psci-domain.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ static int psci_pd_init(struct device_node *np, bool use_osi)
struct generic_pm_domain *pd;
struct psci_pd_provider *pd_provider;
struct dev_power_governor *pd_gov;
int ret = -ENOMEM, state_count = 0;
int ret = -ENOMEM;

pd = dt_idle_pd_alloc(np, psci_dt_parse_state_node);
if (!pd)
Expand All @@ -71,7 +71,7 @@ static int psci_pd_init(struct device_node *np, bool use_osi)
pd->flags |= GENPD_FLAG_ALWAYS_ON;

/* Use governor for CPU PM domains if it has some states to manage. */
pd_gov = state_count > 0 ? &pm_domain_cpu_gov : NULL;
pd_gov = pd->states ? &pm_domain_cpu_gov : NULL;

ret = pm_genpd_init(pd, pd_gov, false);
if (ret)
Expand Down
4 changes: 2 additions & 2 deletions drivers/cpuidle/cpuidle-riscv-sbi.c
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ static int sbi_pd_init(struct device_node *np)
struct generic_pm_domain *pd;
struct sbi_pd_provider *pd_provider;
struct dev_power_governor *pd_gov;
int ret = -ENOMEM, state_count = 0;
int ret = -ENOMEM;

pd = dt_idle_pd_alloc(np, sbi_dt_parse_state_node);
if (!pd)
Expand All @@ -433,7 +433,7 @@ static int sbi_pd_init(struct device_node *np)
pd->flags |= GENPD_FLAG_ALWAYS_ON;

/* Use governor for CPU PM domains if it has some states to manage. */
pd_gov = state_count > 0 ? &pm_domain_cpu_gov : NULL;
pd_gov = pd->states ? &pm_domain_cpu_gov : NULL;

ret = pm_genpd_init(pd, pd_gov, false);
if (ret)
Expand Down
133 changes: 133 additions & 0 deletions drivers/idle/intel_idle.c
Original file line number Diff line number Diff line change
Expand Up @@ -764,6 +764,106 @@ static struct cpuidle_state icx_cstates[] __initdata = {
.enter = NULL }
};

/*
* On AlderLake C1 has to be disabled if C1E is enabled, and vice versa.
* C1E is enabled only if "C1E promotion" bit is set in MSR_IA32_POWER_CTL.
* But in this case there is effectively no C1, because C1 requests are
* promoted to C1E. If the "C1E promotion" bit is cleared, then both C1
* and C1E requests end up with C1, so there is effectively no C1E.
*
* By default we enable C1E and disable C1 by marking it with
* 'CPUIDLE_FLAG_UNUSABLE'.
*/
static struct cpuidle_state adl_cstates[] __initdata = {
{
.name = "C1",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_UNUSABLE,
.exit_latency = 1,
.target_residency = 1,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C1E",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE,
.exit_latency = 2,
.target_residency = 4,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C6",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 220,
.target_residency = 600,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C8",
.desc = "MWAIT 0x40",
.flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 280,
.target_residency = 800,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C10",
.desc = "MWAIT 0x60",
.flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 680,
.target_residency = 2000,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.enter = NULL }
};

static struct cpuidle_state adl_l_cstates[] __initdata = {
{
.name = "C1",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_UNUSABLE,
.exit_latency = 1,
.target_residency = 1,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C1E",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE,
.exit_latency = 2,
.target_residency = 4,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C6",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 170,
.target_residency = 500,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C8",
.desc = "MWAIT 0x40",
.flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 200,
.target_residency = 600,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C10",
.desc = "MWAIT 0x60",
.flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 230,
.target_residency = 700,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.enter = NULL }
};

/*
* On Sapphire Rapids Xeon C1 has to be disabled if C1E is enabled, and vice
* versa. On SPR C1E is enabled only if "C1E promotion" bit is set in
Expand Down Expand Up @@ -1147,6 +1247,14 @@ static const struct idle_cpu idle_cpu_icx __initconst = {
.use_acpi = true,
};

static const struct idle_cpu idle_cpu_adl __initconst = {
.state_table = adl_cstates,
};

static const struct idle_cpu idle_cpu_adl_l __initconst = {
.state_table = adl_l_cstates,
};

static const struct idle_cpu idle_cpu_spr __initconst = {
.state_table = spr_cstates,
.disable_promotion_to_c1e = true,
Expand Down Expand Up @@ -1215,6 +1323,8 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &idle_cpu_skx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &idle_cpu_icx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &idle_cpu_icx),
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &idle_cpu_adl),
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &idle_cpu_adl_l),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl),
Expand Down Expand Up @@ -1573,6 +1683,25 @@ static void __init skx_idle_state_table_update(void)
}
}

/**
* adl_idle_state_table_update - Adjust AlderLake idle states table.
*/
static void __init adl_idle_state_table_update(void)
{
/* Check if user prefers C1 over C1E. */
if (preferred_states_mask & BIT(1) && !(preferred_states_mask & BIT(2))) {
cpuidle_state_table[0].flags &= ~CPUIDLE_FLAG_UNUSABLE;
cpuidle_state_table[1].flags |= CPUIDLE_FLAG_UNUSABLE;

/* Disable C1E by clearing the "C1E promotion" bit. */
c1e_promotion = C1E_PROMOTION_DISABLE;
return;
}

/* Make sure C1E is enabled by default */
c1e_promotion = C1E_PROMOTION_ENABLE;
}

/**
* spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
*/
Expand Down Expand Up @@ -1642,6 +1771,10 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
case INTEL_FAM6_SAPPHIRERAPIDS_X:
spr_idle_state_table_update();
break;
case INTEL_FAM6_ALDERLAKE:
case INTEL_FAM6_ALDERLAKE_L:
adl_idle_state_table_update();
break;
}

for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
Expand Down
6 changes: 3 additions & 3 deletions drivers/opp/of.c
Original file line number Diff line number Diff line change
Expand Up @@ -1448,7 +1448,7 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
* Returns 0 on success or a proper -EINVAL value in case of error.
*/
static int __maybe_unused
_get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
_get_dt_power(struct device *dev, unsigned long *mW, unsigned long *kHz)
{
struct dev_pm_opp *opp;
unsigned long opp_freq, opp_power;
Expand Down Expand Up @@ -1482,8 +1482,8 @@ _get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
* Returns -EINVAL if the power calculation failed because of missing
* parameters, 0 otherwise.
*/
static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz,
struct device *dev)
static int __maybe_unused _get_power(struct device *dev, unsigned long *mW,
unsigned long *kHz)
{
struct dev_pm_opp *opp;
struct device_node *np;
Expand Down
2 changes: 1 addition & 1 deletion drivers/powercap/dtpm_cpu.c
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
return 0;

pd = em_cpu_get(cpu);
if (!pd)
if (!pd || em_is_artificial(pd))
return -EINVAL;

dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
Expand Down
2 changes: 1 addition & 1 deletion drivers/thermal/cpufreq_cooling.c
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
struct cpufreq_policy *policy;
unsigned int nr_levels;

if (!em)
if (!em || em_is_artificial(em))
return false;

policy = cpufreq_cdev->policy;
Expand Down
8 changes: 5 additions & 3 deletions drivers/thermal/devfreq_cooling.c
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
struct thermal_cooling_device *cdev;
struct device *dev = df->dev.parent;
struct devfreq_cooling_device *dfc;
struct em_perf_domain *em;
char *name;
int err, num_opps;

Expand All @@ -367,8 +368,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,

dfc->devfreq = df;

dfc->em_pd = em_pd_get(dev);
if (dfc->em_pd) {
em = em_pd_get(dev);
if (em && !em_is_artificial(em)) {
dfc->em_pd = em;
devfreq_cooling_ops.get_requested_power =
devfreq_cooling_get_requested_power;
devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
Expand All @@ -379,7 +381,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
num_opps = em_pd_nr_perf_states(dfc->em_pd);
} else {
/* Backward compatibility for drivers which do not use IPA */
dev_dbg(dev, "missing EM for cooling device\n");
dev_dbg(dev, "missing proper EM for cooling device\n");

num_opps = dev_pm_opp_get_opp_count(dev);

Expand Down
Loading

0 comments on commit 16a23f3

Please sign in to comment.