Skip to content

Commit

Permalink
---
Browse files Browse the repository at this point in the history
yaml
---
r: 186470
b: refs/heads/master
c: 2ddb3b1
h: refs/heads/master
v: v3
  • Loading branch information
Linus Torvalds committed Mar 7, 2010
1 parent 95eb178 commit 0eb3f5e
Show file tree
Hide file tree
Showing 18 changed files with 1,286 additions and 93 deletions.
2 changes: 1 addition & 1 deletion [refs]
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
---
refs/heads/master: 6ee77658ce387ad6c85dcbda4a68bc33efd8de39
refs/heads/master: 2ddb3b15f1b46836c61cfac5b00d8f08a24236e6
207 changes: 207 additions & 0 deletions trunk/Documentation/cpu-freq/pcc-cpufreq.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
/*
* pcc-cpufreq.txt - PCC interface documentation
*
* Copyright (C) 2009 Red Hat, Matthew Garrett <mjg@redhat.com>
* Copyright (C) 2009 Hewlett-Packard Development Company, L.P.
* Nagananda Chumbalkar <nagananda.chumbalkar@hp.com>
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; version 2 of the License.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or NON
* INFRINGEMENT. See the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 675 Mass Ave, Cambridge, MA 02139, USA.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/


Processor Clocking Control Driver
---------------------------------

Contents:
---------
1. Introduction
1.1 PCC interface
1.1.1 Get Average Frequency
1.1.2 Set Desired Frequency
1.2 Platforms affected
2. Driver and /sys details
2.1 scaling_available_frequencies
2.2 cpuinfo_transition_latency
2.3 cpuinfo_cur_freq
2.4 related_cpus
3. Caveats

1. Introduction:
----------------
Processor Clocking Control (PCC) is an interface between the platform
firmware and OSPM. It is a mechanism for coordinating processor
performance (ie: frequency) between the platform firmware and the OS.

The PCC driver (pcc-cpufreq) allows OSPM to take advantage of the PCC
interface.

OS utilizes the PCC interface to inform platform firmware what frequency the
OS wants for a logical processor. The platform firmware attempts to achieve
the requested frequency. If the request for the target frequency could not be
satisfied by platform firmware, then it usually means that power budget
conditions are in place, and "power capping" is taking place.

1.1 PCC interface:
------------------
The complete PCC specification is available here:
http://www.acpica.org/download/Processor-Clocking-Control-v1p0.pdf

PCC relies on a shared memory region that provides a channel for communication
between the OS and platform firmware. PCC also implements a "doorbell" that
is used by the OS to inform the platform firmware that a command has been
sent.

The ACPI PCCH() method is used to discover the location of the PCC shared
memory region. The shared memory region header contains the "command" and
"status" interface. PCCH() also contains details on how to access the platform
doorbell.

The following commands are supported by the PCC interface:
* Get Average Frequency
* Set Desired Frequency

The ACPI PCCP() method is implemented for each logical processor and is
used to discover the offsets for the input and output buffers in the shared
memory region.

When PCC mode is enabled, the platform will not expose processor performance
or throttle states (_PSS, _TSS and related ACPI objects) to OSPM. Therefore,
the native P-state driver (such as acpi-cpufreq for Intel, powernow-k8 for
AMD) will not load.

However, OSPM remains in control of policy. The governor (eg: "ondemand")
computes the required performance for each processor based on server workload.
The PCC driver fills in the command interface, and the input buffer and
communicates the request to the platform firmware. The platform firmware is
responsible for delivering the requested performance.

Each PCC command is "global" in scope and can affect all the logical CPUs in
the system. Therefore, PCC is capable of performing "group" updates. With PCC
the OS is capable of getting/setting the frequency of all the logical CPUs in
the system with a single call to the BIOS.

1.1.1 Get Average Frequency:
----------------------------
This command is used by the OSPM to query the running frequency of the
processor since the last time this command was completed. The output buffer
indicates the average unhalted frequency of the logical processor expressed as
a percentage of the nominal (ie: maximum) CPU frequency. The output buffer
also signifies if the CPU frequency is limited by a power budget condition.

1.1.2 Set Desired Frequency:
----------------------------
This command is used by the OSPM to communicate to the platform firmware the
desired frequency for a logical processor. The output buffer is currently
ignored by OSPM. The next invocation of "Get Average Frequency" will inform
OSPM if the desired frequency was achieved or not.

1.2 Platforms affected:
-----------------------
The PCC driver will load on any system where the platform firmware:
* supports the PCC interface, and the associated PCCH() and PCCP() methods
* assumes responsibility for managing the hardware clocking controls in order
to deliver the requested processor performance

Currently, certain HP ProLiant platforms implement the PCC interface. On those
platforms PCC is the "default" choice.

However, it is possible to disable this interface via a BIOS setting. In
such an instance, as is also the case on platforms where the PCC interface
is not implemented, the PCC driver will fail to load silently.

2. Driver and /sys details:
---------------------------
When the driver loads, it merely prints the lowest and the highest CPU
frequencies supported by the platform firmware.

The PCC driver loads with a message such as:
pcc-cpufreq: (v1.00.00) driver loaded with frequency limits: 1600 MHz, 2933
MHz

This means that the OPSM can request the CPU to run at any frequency in
between the limits (1600 MHz, and 2933 MHz) specified in the message.

Internally, there is no need for the driver to convert the "target" frequency
to a corresponding P-state.

The VERSION number for the driver will be of the format v.xy.ab.
eg: 1.00.02
----- --
| |
| -- this will increase with bug fixes/enhancements to the driver
|-- this is the version of the PCC specification the driver adheres to


The following is a brief discussion on some of the fields exported via the
/sys filesystem and how their values are affected by the PCC driver:

2.1 scaling_available_frequencies:
----------------------------------
scaling_available_frequencies is not created in /sys. No intermediate
frequencies need to be listed because the BIOS will try to achieve any
frequency, within limits, requested by the governor. A frequency does not have
to be strictly associated with a P-state.

2.2 cpuinfo_transition_latency:
-------------------------------
The cpuinfo_transition_latency field is 0. The PCC specification does
not include a field to expose this value currently.

2.3 cpuinfo_cur_freq:
---------------------
A) Often cpuinfo_cur_freq will show a value different than what is declared
in the scaling_available_frequencies or scaling_cur_freq, or scaling_max_freq.
This is due to "turbo boost" available on recent Intel processors. If certain
conditions are met the BIOS can achieve a slightly higher speed than requested
by OSPM. An example:

scaling_cur_freq : 2933000
cpuinfo_cur_freq : 3196000

B) There is a round-off error associated with the cpuinfo_cur_freq value.
Since the driver obtains the current frequency as a "percentage" (%) of the
nominal frequency from the BIOS, sometimes, the values displayed by
scaling_cur_freq and cpuinfo_cur_freq may not match. An example:

scaling_cur_freq : 1600000
cpuinfo_cur_freq : 1583000

In this example, the nominal frequency is 2933 MHz. The driver obtains the
current frequency, cpuinfo_cur_freq, as 54% of the nominal frequency:

54% of 2933 MHz = 1583 MHz

Nominal frequency is the maximum frequency of the processor, and it usually
corresponds to the frequency of the P0 P-state.

2.4 related_cpus:
-----------------
The related_cpus field is identical to affected_cpus.

affected_cpus : 4
related_cpus : 4

Currently, the PCC driver does not evaluate _PSD. The platforms that support
PCC do not implement SW_ALL. So OSPM doesn't need to perform any coordination
to ensure that the same frequency is requested of all dependent CPUs.

3. Caveats:
-----------
The "cpufreq_stats" module in its present form cannot be loaded and
expected to work with the PCC driver. Since the "cpufreq_stats" module
provides information wrt each P-state, it is not applicable to the PCC driver.
93 changes: 93 additions & 0 deletions trunk/Documentation/power/runtime_pm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,12 @@ defined in include/linux/pm.h:
RPM_SUSPENDED, which means that each device is initially regarded by the
PM core as 'suspended', regardless of its real hardware status

unsigned int runtime_auto;
- if set, indicates that the user space has allowed the device driver to
power manage the device at run time via the /sys/devices/.../power/control
interface; it may only be modified with the help of the pm_runtime_allow()
and pm_runtime_forbid() helper functions

All of the above fields are members of the 'power' member of 'struct device'.

4. Run-time PM Device Helper Functions
Expand Down Expand Up @@ -329,6 +335,20 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
'power.runtime_error' is set or 'power.disable_depth' is greater than
zero)

bool pm_runtime_suspended(struct device *dev);
- return true if the device's runtime PM status is 'suspended', or false
otherwise

void pm_runtime_allow(struct device *dev);
- set the power.runtime_auto flag for the device and decrease its usage
counter (used by the /sys/devices/.../power/control interface to
effectively allow the device to be power managed at run time)

void pm_runtime_forbid(struct device *dev);
- unset the power.runtime_auto flag for the device and increase its usage
counter (used by the /sys/devices/.../power/control interface to
effectively prevent the device from being power managed at run time)

It is safe to execute the following helper functions from interrupt context:

pm_request_idle()
Expand Down Expand Up @@ -382,6 +402,18 @@ may be desirable to suspend the device as soon as ->probe() or ->remove() has
finished, so the PM core uses pm_runtime_idle_sync() to invoke the
subsystem-level idle callback for the device at that time.

The user space can effectively disallow the driver of the device to power manage
it at run time by changing the value of its /sys/devices/.../power/control
attribute to "on", which causes pm_runtime_forbid() to be called. In principle,
this mechanism may also be used by the driver to effectively turn off the
run-time power management of the device until the user space turns it on.
Namely, during the initialization the driver can make sure that the run-time PM
status of the device is 'active' and call pm_runtime_forbid(). It should be
noted, however, that if the user space has already intentionally changed the
value of /sys/devices/.../power/control to "auto" to allow the driver to power
manage the device at run time, the driver may confuse it by using
pm_runtime_forbid() this way.

6. Run-time PM and System Sleep

Run-time PM and system sleep (i.e., system suspend and hibernation, also known
Expand Down Expand Up @@ -431,3 +463,64 @@ The PM core always increments the run-time usage counter before calling the
->prepare() callback and decrements it after calling the ->complete() callback.
Hence disabling run-time PM temporarily like this will not cause any run-time
suspend callbacks to be lost.

7. Generic subsystem callbacks

Subsystems may wish to conserve code space by using the set of generic power
management callbacks provided by the PM core, defined in
driver/base/power/generic_ops.c:

int pm_generic_runtime_idle(struct device *dev);
- invoke the ->runtime_idle() callback provided by the driver of this
device, if defined, and call pm_runtime_suspend() for this device if the
return value is 0 or the callback is not defined

int pm_generic_runtime_suspend(struct device *dev);
- invoke the ->runtime_suspend() callback provided by the driver of this
device and return its result, or return -EINVAL if not defined

int pm_generic_runtime_resume(struct device *dev);
- invoke the ->runtime_resume() callback provided by the driver of this
device and return its result, or return -EINVAL if not defined

int pm_generic_suspend(struct device *dev);
- if the device has not been suspended at run time, invoke the ->suspend()
callback provided by its driver and return its result, or return 0 if not
defined

int pm_generic_resume(struct device *dev);
- invoke the ->resume() callback provided by the driver of this device and,
if successful, change the device's runtime PM status to 'active'

int pm_generic_freeze(struct device *dev);
- if the device has not been suspended at run time, invoke the ->freeze()
callback provided by its driver and return its result, or return 0 if not
defined

int pm_generic_thaw(struct device *dev);
- if the device has not been suspended at run time, invoke the ->thaw()
callback provided by its driver and return its result, or return 0 if not
defined

int pm_generic_poweroff(struct device *dev);
- if the device has not been suspended at run time, invoke the ->poweroff()
callback provided by its driver and return its result, or return 0 if not
defined

int pm_generic_restore(struct device *dev);
- invoke the ->restore() callback provided by the driver of this device and,
if successful, change the device's runtime PM status to 'active'

These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(),
->runtime_resume(), ->suspend(), ->resume(), ->freeze(), ->thaw(), ->poweroff(),
or ->restore() callback pointers in the subsystem-level dev_pm_ops structures.

If a subsystem wishes to use all of them at the same time, it can simply assign
the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its
dev_pm_ops structure pointer.

Device drivers that wish to use the same function as a system suspend, freeze,
poweroff and run-time suspend callback, and similarly for system resume, thaw,
restore, and run-time resume, can achieve this with the help of the
UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
last argument to NULL).
14 changes: 14 additions & 0 deletions trunk/arch/x86/kernel/cpu/cpufreq/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,20 @@ if CPU_FREQ

comment "CPUFreq processor drivers"

config X86_PCC_CPUFREQ
tristate "Processor Clocking Control interface driver"
depends on ACPI && ACPI_PROCESSOR
help
This driver adds support for the PCC interface.

For details, take a look at:
<file:Documentation/cpu-freq/pcc-cpufreq.txt>.

To compile this driver as a module, choose M here: the
module will be called pcc-cpufreq.

If in doubt, say N.

config X86_ACPI_CPUFREQ
tristate "ACPI Processor P-States driver"
select CPU_FREQ_TABLE
Expand Down
1 change: 1 addition & 0 deletions trunk/arch/x86/kernel/cpu/cpufreq/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o
obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o
obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o
obj-$(CONFIG_X86_POWERNOW_K7) += powernow-k7.o
obj-$(CONFIG_X86_LONGHAUL) += longhaul.o
Expand Down
Loading

0 comments on commit 0eb3f5e

Please sign in to comment.