-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
powerpc: Provide initial documentation for PAPR hcalls
This doc patch provides an initial description of the hcall op-codes that are used by Linux kernel running as a guest (LPAR) on top of PowerVM or any other sPAPR compliant hyper-visor (e.g qemu). Apart from documenting the hcalls the doc-patch also provides a rudimentary overview of how hcall ABI, how they are issued with the Linux kernel and how information/control flows between the guest and hypervisor. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add SPDX tag, add it to index.rst] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190828082729.16695-1-vaibhav@linux.ibm.com
- Loading branch information
Vaibhav Jain
authored and
Michael Ellerman
committed
Jan 29, 2020
1 parent
9933819
commit 58b278f
Showing
3 changed files
with
254 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ powerpc | |
isa-versions | ||
kaslr-booke32 | ||
mpc52xx | ||
papr_hcalls | ||
pci_iov_resource_on_powernv | ||
pmu-ebb | ||
ptrace | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
.. SPDX-License-Identifier: GPL-2.0 | ||
=========================== | ||
Hypercall Op-codes (hcalls) | ||
=========================== | ||
|
||
Overview | ||
========= | ||
|
||
Virtualization on 64-bit Power Book3S Platforms is based on the PAPR | ||
specification [1]_ which describes the run-time environment for a guest | ||
operating system and how it should interact with the hypervisor for | ||
privileged operations. Currently there are two PAPR compliant hypervisors: | ||
|
||
- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, | ||
IBM-i and Linux as supported guests (termed as Logical Partitions | ||
or LPARS). It supports the full PAPR specification. | ||
|
||
- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. | ||
Though it only implements a subset of PAPR specification called LoPAPR [2]_. | ||
|
||
On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called | ||
a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must | ||
issue hypercalls to the hypervisor whenever it needs to perform an action | ||
that is hypervisor priviledged [3]_ or for other services managed by the | ||
hypervisor. | ||
|
||
Hence a Hypercall (hcall) is essentially a request by the pseries guest | ||
asking hypervisor to perform a privileged operation on behalf of the guest. The | ||
guest issues a with necessary input operands. The hypervisor after performing | ||
the privilege operation returns a status code and output operands back to the | ||
guest. | ||
|
||
HCALL ABI | ||
========= | ||
The ABI specification for a hcall between a pseries guest and PAPR hypervisor | ||
is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is | ||
done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* | ||
and any in-arguments for the hcall are provided in registers *r4-r12*. If values | ||
have to be passed through a memory buffer, the data stored in that buffer should be | ||
in Big-endian byte order. | ||
|
||
Once control is returns back to the guest after hypervisor has serviced the | ||
'HVCS' instruction the return value of the hcall is available in *r3* and any | ||
out values are returned in registers *r4-r12*. Again like in case of in-arguments, | ||
any out values stored in a memory buffer will be in Big-endian byte order. | ||
|
||
Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined | ||
in a arch specific header [4]_ to issue hcalls from the linux kernel | ||
running as pseries guest. | ||
|
||
Register Conventions | ||
==================== | ||
|
||
Any hcall should follow same register convention as described in section 2.2.1.1 | ||
of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below | ||
summarizes these conventions: | ||
|
||
+----------+----------+-------------------------------------------+ | ||
| Register |Volatile | Purpose | | ||
| Range |(Y/N) | | | ||
+==========+==========+===========================================+ | ||
| r0 | Y | Optional-usage | | ||
+----------+----------+-------------------------------------------+ | ||
| r1 | N | Stack Pointer | | ||
+----------+----------+-------------------------------------------+ | ||
| r2 | N | TOC | | ||
+----------+----------+-------------------------------------------+ | ||
| r3 | Y | hcall opcode/return value | | ||
+----------+----------+-------------------------------------------+ | ||
| r4-r10 | Y | in and out values | | ||
+----------+----------+-------------------------------------------+ | ||
| r11 | Y | Optional-usage/Environmental pointer | | ||
+----------+----------+-------------------------------------------+ | ||
| r12 | Y | Optional-usage/Function entry address at | | ||
| | | global entry point | | ||
+----------+----------+-------------------------------------------+ | ||
| r13 | N | Thread-Pointer | | ||
+----------+----------+-------------------------------------------+ | ||
| r14-r31 | N | Local Variables | | ||
+----------+----------+-------------------------------------------+ | ||
| LR | Y | Link Register | | ||
+----------+----------+-------------------------------------------+ | ||
| CTR | Y | Loop Counter | | ||
+----------+----------+-------------------------------------------+ | ||
| XER | Y | Fixed-point exception register. | | ||
+----------+----------+-------------------------------------------+ | ||
| CR0-1 | Y | Condition register fields. | | ||
+----------+----------+-------------------------------------------+ | ||
| CR2-4 | N | Condition register fields. | | ||
+----------+----------+-------------------------------------------+ | ||
| CR5-7 | Y | Condition register fields. | | ||
+----------+----------+-------------------------------------------+ | ||
| Others | N | | | ||
+----------+----------+-------------------------------------------+ | ||
|
||
DRC & DRC Indexes | ||
================= | ||
:: | ||
|
||
DR1 Guest | ||
+--+ +------------+ +---------+ | ||
| | <----> | | | User | | ||
+--+ DRC1 | | DRC | Space | | ||
| PAPR | Index +---------+ | ||
DR2 | Hypervisor | | | | ||
+--+ | | <-----> | Kernel | | ||
| | <----> | | Hcall | | | ||
+--+ DRC2 +------------+ +---------+ | ||
|
||
PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc | ||
available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to | ||
an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) | ||
to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number | ||
called DRC-Index. The DRC-index value is provided to the LPAR via device-tree | ||
where its present as an attribute in the device tree node associated with the | ||
DR. | ||
|
||
HCALL Return-values | ||
=================== | ||
|
||
After servicing the hcall, hypervisor sets the return-value in *r3* indicating | ||
success or failure of the hcall. In case of a failure an error code indicates | ||
the cause for error. These codes are defined and documented in arch specific | ||
header [4]_. | ||
|
||
In some cases a hcall can potentially take a long time and need to be issued | ||
multiple times in order to be completely serviced. These hcalls will usually | ||
accept an opaque value *continue-token* within there argument list and a | ||
return value of *H_CONTINUE* indicates that hypervisor hasn't still finished | ||
servicing the hcall yet. | ||
|
||
To make such hcalls the guest need to set *continue-token == 0* for the | ||
initial call and use the hypervisor returned value of *continue-token* | ||
for each subsequent hcall until hypervisor returns a non *H_CONTINUE* | ||
return value. | ||
|
||
HCALL Op-codes | ||
============== | ||
|
||
Below is a partial list of HCALLs that are supported by PHYP. For the | ||
corresponding opcode values please look into the arch specific header [4]_: | ||
|
||
**H_SCM_READ_METADATA** | ||
|
||
| Input: *drcIndex, offset, buffer-address, numBytesToRead* | ||
| Out: *numBytesRead* | ||
| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* | ||
Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area | ||
associated with it, at a specified offset and copy it to provided buffer. | ||
The metadata area stores configuration information such as label information, | ||
bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage | ||
area hence a separate access semantics is provided. | ||
|
||
**H_SCM_WRITE_METADATA** | ||
|
||
| Input: *drcIndex, offset, data, numBytesToWrite* | ||
| Out: *None* | ||
| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* | ||
Given a DRC Index of an NVDIMM, write N-bytes to the metadata area | ||
associated with it, at the specified offset and from the provided buffer. | ||
|
||
**H_SCM_BIND_MEM** | ||
|
||
| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* | ||
| *targetLogicalMemoryAddress, continue-token* | ||
| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* | ||
| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* | ||
| *H_Too_Big, H_P5, H_Busy* | ||
Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range | ||
*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest | ||
at *targetLogicalMemoryAddress* within guest physical address space. In | ||
case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor | ||
assigns a target address to the guest. The HCALL can fail if the Guest has | ||
an active PTE entry to the SCM block being bound. | ||
|
||
**H_SCM_UNBIND_MEM** | ||
| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind | ||
| Out: numScmBlocksUnbound | ||
| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* | ||
| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* | ||
Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting | ||
at *startingScmLogicalMemoryAddress* from guest physical address space. The | ||
HCALL can fail if the Guest has an active PTE entry to the SCM block being | ||
unbound. | ||
|
||
**H_SCM_QUERY_BLOCK_MEM_BINDING** | ||
|
||
| Input: *drcIndex, scmBlockIndex* | ||
| Out: *Guest-Physical-Address* | ||
| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* | ||
Given a DRC-Index and an SCM Block index return the guest physical address to | ||
which the SCM block is mapped to. | ||
|
||
**H_SCM_QUERY_LOGICAL_MEM_BINDING** | ||
|
||
| Input: *Guest-Physical-Address* | ||
| Out: *drcIndex, scmBlockIndex* | ||
| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* | ||
Given a guest physical address return which DRC Index and SCM block is mapped | ||
to that address. | ||
|
||
**H_SCM_UNBIND_ALL** | ||
|
||
| Input: *scmTargetScope, drcIndex* | ||
| Out: *None* | ||
| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* | ||
| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* | ||
Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs | ||
or all SCM blocks belonging to a single NVDIMM identified by its drcIndex | ||
from the LPAR memory. | ||
|
||
**H_SCM_HEALTH** | ||
|
||
| Input: drcIndex | ||
| Out: *health-bitmap, health-bit-valid-bitmap* | ||
| Return Value: *H_Success, H_Parameter, H_Hardware* | ||
Given a DRC Index return the info on predictive failure and overall health of | ||
the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive | ||
failure and health-bit-valid-bitmap indicate which bits in health-bitmap are | ||
valid. | ||
|
||
**H_SCM_PERFORMANCE_STATS** | ||
|
||
| Input: drcIndex, resultBuffer Addr | ||
| Out: None | ||
| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* | ||
Given a DRC Index collect the performance statistics for NVDIMM and copy them | ||
to the resultBuffer. | ||
|
||
References | ||
========== | ||
.. [1] "Power Architecture Platform Reference" | ||
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference | ||
.. [2] "Linux on Power Architecture Platform Reference" | ||
https://members.openpowerfoundation.org/document/dl/469 | ||
.. [3] "Definitions and Notation" Book III-Section 14.5.3 | ||
https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 | ||
.. [4] arch/powerpc/include/asm/hvcall.h | ||
.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" | ||
https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters