Skip to content

Commit

Permalink
Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/ker…
Browse files Browse the repository at this point in the history
…nel/git/paulus/powerpc into HEAD

The big feature this time is support for POWER9 using the radix-tree
MMU for host and guest.  This required some changes to arch/powerpc
code, so I talked with Michael Ellerman and he created a topic branch
with this patchset, which I merged into kvm-ppc-next and which Michael
will pull into his tree.  Michael also put in some patches from Nick
Piggin which fix bugs in the interrupt vector code in relocatable
kernels when coming from a KVM guest.

Other notable changes include:

* Add the ability to change the size of the hashed page table,
  from David Gibson.

* XICS (interrupt controller) emulation fixes and improvements,
  from Li Zhong.

* Bug fixes from myself and Thomas Huth.

These patches define some new KVM capabilities and ioctls, but there
should be no conflicts with anything else currently upstream, as far
as I am aware.
  • Loading branch information
Paolo Bonzini committed Feb 7, 2017
2 parents 55dd00a + 050f233 commit d5b798c
Show file tree
Hide file tree
Showing 36 changed files with 2,608 additions and 563 deletions.
194 changes: 187 additions & 7 deletions Documentation/virtual/kvm/api.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2443,18 +2443,20 @@ are, it will do nothing and return an EBUSY error.
The parameter is a pointer to a 32-bit unsigned integer variable
containing the order (log base 2) of the desired size of the hash
table, which must be between 18 and 46. On successful return from the
ioctl, it will have been updated with the order of the hash table that
was allocated.
ioctl, the value will not be changed by the kernel.

If no hash table has been allocated when any vcpu is asked to run
(with the KVM_RUN ioctl), the host kernel will allocate a
default-sized hash table (16 MB).

If this ioctl is called when a hash table has already been allocated,
the kernel will clear out the existing hash table (zero all HPTEs) and
return the hash table order in the parameter. (If the guest is using
the virtualized real-mode area (VRMA) facility, the kernel will
re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
with a different order from the existing hash table, the existing hash
table will be freed and a new one allocated. If this is ioctl is
called when a hash table has already been allocated of the same order
as specified, the kernel will clear out the existing hash table (zero
all HPTEs). In either case, if the guest is using the virtualized
real-mode area (VRMA) facility, the kernel will re-create the VMRA
HPTEs on the next KVM_RUN of any vcpu.

4.77 KVM_S390_INTERRUPT

Expand Down Expand Up @@ -3177,7 +3179,7 @@ of IOMMU pages.

The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.

4.98 KVM_REINJECT_CONTROL
4.99 KVM_REINJECT_CONTROL

Capability: KVM_CAP_REINJECT_CONTROL
Architectures: x86
Expand All @@ -3201,6 +3203,166 @@ struct kvm_reinject_control {
pit_reinject = 0 (!reinject mode) is recommended, unless running an old
operating system that uses the PIT for timing (e.g. Linux 2.4.x).

4.100 KVM_PPC_CONFIGURE_V3_MMU

Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_mmuv3_cfg (in)
Returns: 0 on success,
-EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
-EINVAL if the configuration is invalid

This ioctl controls whether the guest will use radix or HPT (hashed
page table) translation, and sets the pointer to the process table for
the guest.

struct kvm_ppc_mmuv3_cfg {
__u64 flags;
__u64 process_table;
};

There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest
to use radix tree translation, and if clear, to use HPT translation.
KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
to be able to use the global TLB and SLB invalidation instructions;
if clear, the guest may not use these instructions.

The process_table field specifies the address and size of the guest
process table, which is in the guest's space. This field is formatted
as the second doubleword of the partition table entry, as defined in
the Power ISA V3.00, Book III section 5.7.6.1.

4.101 KVM_PPC_GET_RMMU_INFO

Capability: KVM_CAP_PPC_RADIX_MMU
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_rmmu_info (out)
Returns: 0 on success,
-EFAULT if struct kvm_ppc_rmmu_info cannot be written,
-EINVAL if no useful information can be returned

This ioctl returns a structure containing two things: (a) a list
containing supported radix tree geometries, and (b) a list that maps
page sizes to put in the "AP" (actual page size) field for the tlbie
(TLB invalidate entry) instruction.

struct kvm_ppc_rmmu_info {
struct kvm_ppc_radix_geom {
__u8 page_shift;
__u8 level_bits[4];
__u8 pad[3];
} geometries[8];
__u32 ap_encodings[8];
};

The geometries[] field gives up to 8 supported geometries for the
radix page table, in terms of the log base 2 of the smallest page
size, and the number of bits indexed at each level of the tree, from
the PTE level up to the PGD level in that order. Any unused entries
will have 0 in the page_shift field.

The ap_encodings gives the supported page sizes and their AP field
encodings, encoded with the AP value in the top 3 bits and the log
base 2 of the page size in the bottom 6 bits.

4.102 KVM_PPC_RESIZE_HPT_PREPARE

Capability: KVM_CAP_SPAPR_RESIZE_HPT
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_ppc_resize_hpt (in)
Returns: 0 on successful completion,
>0 if a new HPT is being prepared, the value is an estimated
number of milliseconds until preparation is complete
-EFAULT if struct kvm_reinject_control cannot be read,
-EINVAL if the supplied shift or flags are invalid
-ENOMEM if unable to allocate the new HPT
-ENOSPC if there was a hash collision when moving existing
HPT entries to the new HPT
-EIO on other error conditions

Used to implement the PAPR extension for runtime resizing of a guest's
Hashed Page Table (HPT). Specifically this starts, stops or monitors
the preparation of a new potential HPT for the guest, essentially
implementing the H_RESIZE_HPT_PREPARE hypercall.

If called with shift > 0 when there is no pending HPT for the guest,
this begins preparation of a new pending HPT of size 2^(shift) bytes.
It then returns a positive integer with the estimated number of
milliseconds until preparation is complete.

If called when there is a pending HPT whose size does not match that
requested in the parameters, discards the existing pending HPT and
creates a new one as above.

If called when there is a pending HPT of the size requested, will:
* If preparation of the pending HPT is already complete, return 0
* If preparation of the pending HPT has failed, return an error
code, then discard the pending HPT.
* If preparation of the pending HPT is still in progress, return an
estimated number of milliseconds until preparation is complete.

If called with shift == 0, discards any currently pending HPT and
returns 0 (i.e. cancels any in-progress preparation).

flags is reserved for future expansion, currently setting any bits in
flags will result in an -EINVAL.

Normally this will be called repeatedly with the same parameters until
it returns <= 0. The first call will initiate preparation, subsequent
ones will monitor preparation until it completes or fails.

struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};

4.103 KVM_PPC_RESIZE_HPT_COMMIT

Capability: KVM_CAP_SPAPR_RESIZE_HPT
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_ppc_resize_hpt (in)
Returns: 0 on successful completion,
-EFAULT if struct kvm_reinject_control cannot be read,
-EINVAL if the supplied shift or flags are invalid
-ENXIO is there is no pending HPT, or the pending HPT doesn't
have the requested size
-EBUSY if the pending HPT is not fully prepared
-ENOSPC if there was a hash collision when moving existing
HPT entries to the new HPT
-EIO on other error conditions

Used to implement the PAPR extension for runtime resizing of a guest's
Hashed Page Table (HPT). Specifically this requests that the guest be
transferred to working with the new HPT, essentially implementing the
H_RESIZE_HPT_COMMIT hypercall.

This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
returned 0 with the same parameters. In other cases
KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
-EBUSY, though others may be possible if the preparation was started,
but failed).

This will have undefined effects on the guest if it has not already
placed itself in a quiescent state where no vcpu will make MMU enabled
memory accesses.

On succsful completion, the pending HPT will become the guest's active
HPT and the previous HPT will be discarded.

On failure, the guest will still be operating on its previous HPT.

struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};

5. The kvm_run structure
------------------------

Expand Down Expand Up @@ -3942,3 +4104,21 @@ In order to use SynIC, it has to be activated by setting this
capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
will disable the use of APIC hardware virtualization even if supported
by the CPU, as it's incompatible with SynIC auto-EOI behavior.

8.3 KVM_CAP_PPC_RADIX_MMU

Architectures: ppc

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel can support guests using the
radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
processor).

8.4 KVM_CAP_PPC_HASH_MMU_V3

Architectures: ppc

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel can support guests using the
hashed page table MMU defined in Power ISA V3.00 (as implemented in
the POWER9 processor), including in-memory segment tables.
18 changes: 17 additions & 1 deletion arch/powerpc/include/asm/book3s/64/mmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,20 @@ struct patb_entry {
};
extern struct patb_entry *partition_tb;

/* Bits in patb0 field */
#define PATB_HR (1UL << 63)
#define PATB_GR (1UL << 63)
#define RPDB_MASK 0x0ffffffffffff00fUL
#define RPDB_SHIFT (1UL << 8)
#define RTS1_SHIFT 61 /* top 2 bits of radix tree size */
#define RTS1_MASK (3UL << RTS1_SHIFT)
#define RTS2_SHIFT 5 /* bottom 3 bits of radix tree size */
#define RTS2_MASK (7UL << RTS2_SHIFT)
#define RPDS_MASK 0x1f /* root page dir. size field */

/* Bits in patb1 field */
#define PATB_GR (1UL << 63) /* guest uses radix; must match HR */
#define PRTS_MASK 0x1f /* process table size field */

/*
* Limit process table to PAGE_SIZE table. This
* also limit the max pid we can support.
Expand Down Expand Up @@ -138,5 +148,11 @@ static inline void setup_initial_memory_limit(phys_addr_t first_memblock_base,
extern int (*register_process_table)(unsigned long base, unsigned long page_size,
unsigned long tbl_size);

#ifdef CONFIG_PPC_PSERIES
extern void radix_init_pseries(void);
#else
static inline void radix_init_pseries(void) { };
#endif

#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
75 changes: 55 additions & 20 deletions arch/powerpc/include/asm/exception-64s.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,15 @@
ld reg,PACAKBASE(r13); \
ori reg,reg,(ABS_ADDR(label))@l;

/*
* Branches from unrelocated code (e.g., interrupts) to labels outside
* head-y require >64K offsets.
*/
#define __LOAD_FAR_HANDLER(reg, label) \
ld reg,PACAKBASE(r13); \
ori reg,reg,(ABS_ADDR(label))@l; \
addis reg,reg,(ABS_ADDR(label))@h;

/* Exception register prefixes */
#define EXC_HV H
#define EXC_STD
Expand Down Expand Up @@ -227,13 +236,41 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
mtctr reg; \
bctr

/*
* KVM requires __LOAD_FAR_HANDLER.
*
* __BRANCH_TO_KVM_EXIT branches are also a special case because they
* explicitly use r9 then reload it from PACA before branching. Hence
* the double-underscore.
*/
#define __BRANCH_TO_KVM_EXIT(area, label) \
mfctr r9; \
std r9,HSTATE_SCRATCH1(r13); \
__LOAD_FAR_HANDLER(r9, label); \
mtctr r9; \
ld r9,area+EX_R9(r13); \
bctr

#define BRANCH_TO_KVM(reg, label) \
__LOAD_FAR_HANDLER(reg, label); \
mtctr reg; \
bctr

#else
#define BRANCH_TO_COMMON(reg, label) \
b label

#define BRANCH_TO_KVM(reg, label) \
b label

#define __BRANCH_TO_KVM_EXIT(area, label) \
ld r9,area+EX_R9(r13); \
b label

#endif

#define __KVM_HANDLER_PROLOG(area, n) \

#define __KVM_HANDLER(area, h, n) \
BEGIN_FTR_SECTION_NESTED(947) \
ld r10,area+EX_CFAR(r13); \
std r10,HSTATE_CFAR(r13); \
Expand All @@ -243,30 +280,28 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,HSTATE_PPR(r13); \
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948); \
ld r10,area+EX_R10(r13); \
stw r9,HSTATE_SCRATCH1(r13); \
ld r9,area+EX_R9(r13); \
std r12,HSTATE_SCRATCH0(r13); \

#define __KVM_HANDLER(area, h, n) \
__KVM_HANDLER_PROLOG(area, n) \
li r12,n; \
b kvmppc_interrupt
sldi r12,r9,32; \
ori r12,r12,(n); \
/* This reloads r9 before branching to kvmppc_interrupt */ \
__BRANCH_TO_KVM_EXIT(area, kvmppc_interrupt)

#define __KVM_HANDLER_SKIP(area, h, n) \
cmpwi r10,KVM_GUEST_MODE_SKIP; \
ld r10,area+EX_R10(r13); \
beq 89f; \
stw r9,HSTATE_SCRATCH1(r13); \
BEGIN_FTR_SECTION_NESTED(948) \
ld r9,area+EX_PPR(r13); \
std r9,HSTATE_PPR(r13); \
ld r10,area+EX_PPR(r13); \
std r10,HSTATE_PPR(r13); \
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948); \
ld r9,area+EX_R9(r13); \
ld r10,area+EX_R10(r13); \
std r12,HSTATE_SCRATCH0(r13); \
li r12,n; \
b kvmppc_interrupt; \
sldi r12,r9,32; \
ori r12,r12,(n); \
/* This reloads r9 before branching to kvmppc_interrupt */ \
__BRANCH_TO_KVM_EXIT(area, kvmppc_interrupt); \
89: mtocrf 0x80,r9; \
ld r9,area+EX_R9(r13); \
ld r10,area+EX_R10(r13); \
b kvmppc_skip_##h##interrupt

#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
Expand Down Expand Up @@ -393,12 +428,12 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
EXCEPTION_RELON_PROLOG_PSERIES_1(label, EXC_STD)

#define STD_RELON_EXCEPTION_HV(loc, vec, label) \
/* No guest interrupts come through here */ \
SET_SCRATCH0(r13); /* save r13 */ \
EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label, EXC_HV, NOTEST, vec);
EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label, \
EXC_HV, KVMTEST_HV, vec);

#define STD_RELON_EXCEPTION_HV_OOL(vec, label) \
EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec); \
EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_HV, vec); \
EXCEPTION_RELON_PROLOG_PSERIES_1(label, EXC_HV)

/* This associate vector numbers with bits in paca->irq_happened */
Expand Down Expand Up @@ -475,10 +510,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)

#define MASKABLE_RELON_EXCEPTION_HV(loc, vec, label) \
_MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, \
EXC_HV, SOFTEN_NOTEST_HV)
EXC_HV, SOFTEN_TEST_HV)

#define MASKABLE_RELON_EXCEPTION_HV_OOL(vec, label) \
EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec); \
EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec); \
EXCEPTION_PROLOG_PSERIES_1(label, EXC_HV)

/*
Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/include/asm/head-64.h
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ end_##sname:

#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#define TRAMP_KVM_BEGIN(name) \
TRAMP_REAL_BEGIN(name)
TRAMP_VIRT_BEGIN(name)
#else
#define TRAMP_KVM_BEGIN(name)
#endif
Expand Down
Loading

0 comments on commit d5b798c

Please sign in to comment.