Skip to content

Commit

Permalink
KVM: PPC: Allow book3s_hv guests to use SMT processor modes
Browse files Browse the repository at this point in the history
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7.  The host still has to run single-threaded.

This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability.  The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.

To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode.  KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline).  To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c.  In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it.  Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.

When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host.  This number is exported
to userspace via the KVM_CAP_PPC_SMT capability.  If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.

We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host.  We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked.  This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.

When a vcore starts to run, it executes in the context of one of the
vcpu threads.  The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).

It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running.  In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest.  It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.

Note that there is no fixed relationship between the hardware thread
number and the vcpu number.  Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
  • Loading branch information
Paul Mackerras authored and Avi Kivity committed Jul 12, 2011
1 parent 54738c0 commit 371fefd
Show file tree
Hide file tree
Showing 13 changed files with 567 additions and 45 deletions.
13 changes: 13 additions & 0 deletions Documentation/virtual/kvm/api.txt
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,19 @@ KVM_CHECK_EXTENSION ioctl() to determine the value for max_vcpus at run-time.
If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
cpus max.

On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
threads in one or more virtual CPU cores. (This is because the
hardware requires all the hardware threads in a CPU core to be in the
same partition.) The KVM_CAP_PPC_SMT capability indicates the number
of vcpus per virtual core (vcore). The vcore id is obtained by
dividing the vcpu id by the number of vcpus per vcore. The vcpus in a
given vcore will always be in the same physical core as each other
(though that might be a different physical core from time to time).
Userspace can control the threading (SMT) mode of the guest by its
allocation of vcpu ids. For example, if userspace wants
single-threaded guest vcpus, it should make all vcpu ids be a multiple
of the number of vcpus per vcore.

4.8 KVM_GET_DIRTY_LOG (vm ioctl)

Capability: basic
Expand Down
1 change: 1 addition & 0 deletions arch/powerpc/include/asm/kvm.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

/* Select powerpc specific features in <linux/kvm.h> */
#define __KVM_HAVE_SPAPR_TCE
#define __KVM_HAVE_PPC_SMT

struct kvm_regs {
__u64 pc;
Expand Down
2 changes: 2 additions & 0 deletions arch/powerpc/include/asm/kvm_book3s_asm.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ struct kvmppc_host_state {

#ifdef CONFIG_KVM_BOOK3S_64_HV
struct kvm_vcpu *kvm_vcpu;
struct kvmppc_vcore *kvm_vcore;
unsigned long xics_phys;
u64 dabr;
u64 host_mmcr[3];
u32 host_pmc[6];
Expand Down
46 changes: 45 additions & 1 deletion arch/powerpc/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,14 @@
#include <linux/interrupt.h>
#include <linux/types.h>
#include <linux/kvm_types.h>
#include <linux/threads.h>
#include <linux/spinlock.h>
#include <linux/kvm_para.h>
#include <asm/kvm_asm.h>
#include <asm/processor.h>

#define KVM_MAX_VCPUS 1
#define KVM_MAX_VCPUS NR_CPUS
#define KVM_MAX_VCORES NR_CPUS
#define KVM_MEMORY_SLOTS 32
/* memory slots that does not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 4
Expand Down Expand Up @@ -167,9 +171,34 @@ struct kvm_arch {
int tlbie_lock;
struct list_head spapr_tce_tables;
unsigned short last_vcpu[NR_CPUS];
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
#endif /* CONFIG_KVM_BOOK3S_64_HV */
};

/*
* Struct for a virtual core.
* Note: entry_exit_count combines an entry count in the bottom 8 bits
* and an exit count in the next 8 bits. This is so that we can
* atomically increment the entry count iff the exit count is 0
* without taking the lock.
*/
struct kvmppc_vcore {
int n_runnable;
int n_blocked;
int num_threads;
int entry_exit_count;
int n_woken;
int nap_count;
u16 pcpu;
u8 vcore_running;
u8 in_guest;
struct list_head runnable_threads;
spinlock_t lock;
};

#define VCORE_ENTRY_COUNT(vc) ((vc)->entry_exit_count & 0xff)
#define VCORE_EXIT_COUNT(vc) ((vc)->entry_exit_count >> 8)

struct kvmppc_pte {
ulong eaddr;
u64 vpage;
Expand Down Expand Up @@ -365,14 +394,29 @@ struct kvm_vcpu_arch {
struct slb_shadow *slb_shadow;
struct dtl *dtl;
struct dtl *dtl_end;

struct kvmppc_vcore *vcore;
int ret;
int trap;
int state;
int ptid;
wait_queue_head_t cpu_run;

struct kvm_vcpu_arch_shared *shared;
unsigned long magic_page_pa; /* phys addr to map the magic page to */
unsigned long magic_page_ea; /* effect. addr to map the magic page to */

#ifdef CONFIG_KVM_BOOK3S_64_HV
struct kvm_vcpu_arch_shared shregs;

struct list_head run_list;
struct task_struct *run_task;
struct kvm_run *kvm_run;
#endif
};

#define KVMPPC_VCPU_BUSY_IN_HOST 0
#define KVMPPC_VCPU_BLOCKED 1
#define KVMPPC_VCPU_RUNNABLE 2

#endif /* __POWERPC_KVM_HOST_H__ */
13 changes: 13 additions & 0 deletions arch/powerpc/include/asm/kvm_ppc.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@
#else
#include <asm/kvm_booke.h>
#endif
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#include <asm/paca.h>
#endif

enum emulation_result {
EMULATE_DONE, /* no further processing */
Expand Down Expand Up @@ -169,4 +172,14 @@ int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);

void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid);

#ifdef CONFIG_KVM_BOOK3S_64_HV
static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
{
paca[cpu].kvm_hstate.xics_phys = addr;
}
#else
static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
{}
#endif

#endif /* __POWERPC_KVM_PPC_H__ */
6 changes: 6 additions & 0 deletions arch/powerpc/kernel/asm-offsets.c
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,10 @@ int main(void)
DEFINE(VCPU_FAULT_DAR, offsetof(struct kvm_vcpu, arch.fault_dar));
DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
DEFINE(VCPU_TRAP, offsetof(struct kvm_vcpu, arch.trap));
DEFINE(VCPU_PTID, offsetof(struct kvm_vcpu, arch.ptid));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCPU_SVCPU, offsetof(struct kvmppc_vcpu_book3s, shadow_vcpu) -
offsetof(struct kvmppc_vcpu_book3s, vcpu));
DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
Expand Down Expand Up @@ -530,6 +534,8 @@ int main(void)

#ifdef CONFIG_KVM_BOOK3S_64_HV
HSTATE_FIELD(HSTATE_KVM_VCPU, kvm_vcpu);
HSTATE_FIELD(HSTATE_KVM_VCORE, kvm_vcore);
HSTATE_FIELD(HSTATE_XICS_PHYS, xics_phys);
HSTATE_FIELD(HSTATE_MMCR, host_mmcr);
HSTATE_FIELD(HSTATE_PMC, host_pmc);
HSTATE_FIELD(HSTATE_PURR, host_purr);
Expand Down
31 changes: 22 additions & 9 deletions arch/powerpc/kernel/exceptions-64s.S
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,32 @@ BEGIN_FTR_SECTION
* state loss at this time.
*/
mfspr r13,SPRN_SRR1
rlwinm r13,r13,47-31,30,31
cmpwi cr0,r13,1
bne 1f
b .power7_wakeup_noloss
1: cmpwi cr0,r13,2
bne 1f
b .power7_wakeup_loss
rlwinm. r13,r13,47-31,30,31
beq 9f

/* waking up from powersave (nap) state */
cmpwi cr1,r13,2
/* Total loss of HV state is fatal, we could try to use the
* PIR to locate a PACA, then use an emergency stack etc...
* but for now, let's just stay stuck here
*/
1: cmpwi cr0,r13,3
beq .
bgt cr1,.
GET_PACA(r13)

#ifdef CONFIG_KVM_BOOK3S_64_HV
lbz r0,PACAPROCSTART(r13)
cmpwi r0,0x80
bne 1f
li r0,0
stb r0,PACAPROCSTART(r13)
b kvm_start_guest
1:
#endif

beq cr1,2f
b .power7_wakeup_noloss
2: b .power7_wakeup_loss
9:
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE_206)
#endif /* CONFIG_PPC_P7_NAP */
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common, EXC_STD,
Expand Down
2 changes: 0 additions & 2 deletions arch/powerpc/kernel/idle_power7.S
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ _GLOBAL(power7_idle)
b .

_GLOBAL(power7_wakeup_loss)
GET_PACA(r13)
ld r1,PACAR1(r13)
REST_NVGPRS(r1)
REST_GPR(2, r1)
Expand All @@ -87,7 +86,6 @@ _GLOBAL(power7_wakeup_loss)
rfid

_GLOBAL(power7_wakeup_noloss)
GET_PACA(r13)
ld r1,PACAR1(r13)
ld r4,_MSR(r1)
ld r5,_NIP(r1)
Expand Down
Loading

0 comments on commit 371fefd

Please sign in to comment.