Skip to content

Commit

Permalink
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
Browse files Browse the repository at this point in the history
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode).  Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads.  The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems.  This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.

The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional.  The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated.  The trechkpt
instruction also causes a soft patch interrupt.

On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present.  The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state.  Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
reads back as 0.

On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.

Emulation of the instructions that cause a softpatch interrupt is
handled in two paths.  If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state.  This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active.  If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.

The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0.  The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.

With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
  • Loading branch information
Paul Mackerras authored and Michael Ellerman committed Mar 23, 2018
1 parent 7672691 commit 4bb3c7a
Show file tree
Hide file tree
Showing 16 changed files with 557 additions and 10 deletions.
2 changes: 2 additions & 0 deletions arch/powerpc/include/asm/kvm_asm.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,8 @@

/* book3s_hv */

#define BOOK3S_INTERRUPT_HV_SOFTPATCH 0x1500

/*
* Special trap used to indicate to host that this is a
* passthrough interrupt that could not be handled
Expand Down
4 changes: 4 additions & 0 deletions arch/powerpc/include/asm/kvm_book3s.h
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,10 @@ extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
unsigned long mask);
extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);

extern int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu);
extern int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu);
extern void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu);

extern void kvmppc_entry_trampoline(void);
extern void kvmppc_hv_entry_trampoline(void);
extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
Expand Down
43 changes: 43 additions & 0 deletions arch/powerpc/include/asm/kvm_book3s_64.h
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,49 @@ static inline void set_dirty_bits_atomic(unsigned long *map, unsigned long i,
set_bit_le(i, map);
}

static inline u64 sanitize_msr(u64 msr)
{
msr &= ~MSR_HV;
msr |= MSR_ME;
return msr;
}

#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu)
{
vcpu->arch.cr = vcpu->arch.cr_tm;
vcpu->arch.xer = vcpu->arch.xer_tm;
vcpu->arch.lr = vcpu->arch.lr_tm;
vcpu->arch.ctr = vcpu->arch.ctr_tm;
vcpu->arch.amr = vcpu->arch.amr_tm;
vcpu->arch.ppr = vcpu->arch.ppr_tm;
vcpu->arch.dscr = vcpu->arch.dscr_tm;
vcpu->arch.tar = vcpu->arch.tar_tm;
memcpy(vcpu->arch.gpr, vcpu->arch.gpr_tm,
sizeof(vcpu->arch.gpr));
vcpu->arch.fp = vcpu->arch.fp_tm;
vcpu->arch.vr = vcpu->arch.vr_tm;
vcpu->arch.vrsave = vcpu->arch.vrsave_tm;
}

static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu)
{
vcpu->arch.cr_tm = vcpu->arch.cr;
vcpu->arch.xer_tm = vcpu->arch.xer;
vcpu->arch.lr_tm = vcpu->arch.lr;
vcpu->arch.ctr_tm = vcpu->arch.ctr;
vcpu->arch.amr_tm = vcpu->arch.amr;
vcpu->arch.ppr_tm = vcpu->arch.ppr;
vcpu->arch.dscr_tm = vcpu->arch.dscr;
vcpu->arch.tar_tm = vcpu->arch.tar;
memcpy(vcpu->arch.gpr_tm, vcpu->arch.gpr,
sizeof(vcpu->arch.gpr));
vcpu->arch.fp_tm = vcpu->arch.fp;
vcpu->arch.vr_tm = vcpu->arch.vr;
vcpu->arch.vrsave_tm = vcpu->arch.vrsave;
}
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */

#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */

#endif /* __ASM_KVM_BOOK3S_64_H__ */
1 change: 1 addition & 0 deletions arch/powerpc/include/asm/kvm_book3s_asm.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ struct kvmppc_host_state {
u8 host_ipi;
u8 ptid; /* thread number within subcore when split */
u8 tid; /* thread number within whole core */
u8 fake_suspend;
struct kvm_vcpu *kvm_vcpu;
struct kvmppc_vcore *kvm_vcore;
void __iomem *xics_phys;
Expand Down
1 change: 1 addition & 0 deletions arch/powerpc/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
u64 tfhar;
u64 texasr;
u64 tfiar;
u64 orig_texasr;

u32 cr_tm;
u64 xer_tm;
Expand Down
4 changes: 4 additions & 0 deletions arch/powerpc/include/asm/ppc-opcode.h
Original file line number Diff line number Diff line change
Expand Up @@ -232,15 +232,18 @@
#define PPC_INST_MSGSYNC 0x7c0006ec
#define PPC_INST_MSGSNDP 0x7c00011c
#define PPC_INST_MSGCLRP 0x7c00015c
#define PPC_INST_MTMSRD 0x7c000164
#define PPC_INST_MTTMR 0x7c0003dc
#define PPC_INST_NOP 0x60000000
#define PPC_INST_PASTE 0x7c20070d
#define PPC_INST_POPCNTB 0x7c0000f4
#define PPC_INST_POPCNTB_MASK 0xfc0007fe
#define PPC_INST_POPCNTD 0x7c0003f4
#define PPC_INST_POPCNTW 0x7c0002f4
#define PPC_INST_RFEBB 0x4c000124
#define PPC_INST_RFCI 0x4c000066
#define PPC_INST_RFDI 0x4c00004e
#define PPC_INST_RFID 0x4c000024
#define PPC_INST_RFMCI 0x4c00004c
#define PPC_INST_MFSPR 0x7c0002a6
#define PPC_INST_MFSPR_DSCR 0x7c1102a6
Expand Down Expand Up @@ -277,6 +280,7 @@
#define PPC_INST_TRECHKPT 0x7c0007dd
#define PPC_INST_TRECLAIM 0x7c00075d
#define PPC_INST_TABORT 0x7c00071d
#define PPC_INST_TSR 0x7c0005dd

#define PPC_INST_NAP 0x4c000364
#define PPC_INST_SLEEP 0x4c0003a4
Expand Down
7 changes: 7 additions & 0 deletions arch/powerpc/include/asm/reg.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,8 @@
#define PSSCR_SD 0x00400000 /* Status Disable */
#define PSSCR_PLS 0xf000000000000000 /* Power-saving Level Status */
#define PSSCR_GUEST_VIS 0xf0000000000003ff /* Guest-visible PSSCR fields */
#define PSSCR_FAKE_SUSPEND 0x00000400 /* Fake-suspend bit (P9 DD2.2) */
#define PSSCR_FAKE_SUSPEND_LG 10 /* Fake-suspend bit position */

/* Floating Point Status and Control Register (FPSCR) Fields */
#define FPSCR_FX 0x80000000 /* FPU exception summary */
Expand Down Expand Up @@ -237,7 +239,12 @@
#define SPRN_TFIAR 0x81 /* Transaction Failure Inst Addr */
#define SPRN_TEXASR 0x82 /* Transaction EXception & Summary */
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
#define TEXASR_ABORT __MASK(63-31) /* terminated by tabort or treclaim */
#define TEXASR_SUSP __MASK(63-32) /* tx failed in suspended state */
#define TEXASR_HV __MASK(63-34) /* MSR[HV] when failure occurred */
#define TEXASR_PR __MASK(63-35) /* MSR[PR] when failure occurred */
#define TEXASR_FS __MASK(63-36) /* TEXASR Failure Summary */
#define TEXASR_EXACT __MASK(63-37) /* TFIAR value is exact */
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
#define SPRN_TIDR 144 /* Thread ID register */
#define SPRN_CTRLF 0x088
Expand Down
2 changes: 2 additions & 0 deletions arch/powerpc/kernel/asm-offsets.c
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,7 @@ int main(void)
OFFSET(VCPU_TFHAR, kvm_vcpu, arch.tfhar);
OFFSET(VCPU_TFIAR, kvm_vcpu, arch.tfiar);
OFFSET(VCPU_TEXASR, kvm_vcpu, arch.texasr);
OFFSET(VCPU_ORIG_TEXASR, kvm_vcpu, arch.orig_texasr);
OFFSET(VCPU_GPR_TM, kvm_vcpu, arch.gpr_tm);
OFFSET(VCPU_FPRS_TM, kvm_vcpu, arch.fp_tm.fpr);
OFFSET(VCPU_VRS_TM, kvm_vcpu, arch.vr_tm.vr);
Expand Down Expand Up @@ -650,6 +651,7 @@ int main(void)
HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi);
HSTATE_FIELD(HSTATE_PTID, ptid);
HSTATE_FIELD(HSTATE_TID, tid);
HSTATE_FIELD(HSTATE_FAKE_SUSPEND, fake_suspend);
HSTATE_FIELD(HSTATE_MMCR0, host_mmcr[0]);
HSTATE_FIELD(HSTATE_MMCR1, host_mmcr[1]);
HSTATE_FIELD(HSTATE_MMCRA, host_mmcr[2]);
Expand Down
1 change: 0 additions & 1 deletion arch/powerpc/kernel/cputable.c
Original file line number Diff line number Diff line change
Expand Up @@ -569,7 +569,6 @@ static struct cpu_spec __initdata cpu_specs[] = {
.oprofile_type = PPC_OPROFILE_INVALID,
.cpu_setup = __setup_cpu_power9,
.cpu_restore = __restore_cpu_power9,
.flush_tlb = __flush_tlb_power9,
.machine_check_early = __machine_check_early_realmode_p9,
.platform = "power9",
},
Expand Down
4 changes: 2 additions & 2 deletions arch/powerpc/kernel/exceptions-64s.S
Original file line number Diff line number Diff line change
Expand Up @@ -1273,7 +1273,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100)
bne+ denorm_assist
#endif

KVMTEST_PR(0x1500)
KVMTEST_HV(0x1500)
EXCEPTION_PROLOG_PSERIES_1(denorm_common, EXC_HV)
EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100)

Expand All @@ -1285,7 +1285,7 @@ EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
EXC_VIRT_NONE(0x5500, 0x100)
#endif

TRAMP_KVM_SKIP(PACA_EXGEN, 0x1500)
TRAMP_KVM_HV(PACA_EXGEN, 0x1500)

#ifdef CONFIG_PPC_DENORMALISATION
TRAMP_REAL_BEGIN(denorm_assist)
Expand Down
7 changes: 7 additions & 0 deletions arch/powerpc/kvm/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -74,16 +74,23 @@ kvm-hv-y += \
book3s_64_mmu_hv.o \
book3s_64_mmu_radix.o

kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
book3s_hv_tm.o

kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
book3s_hv_rm_xics.o book3s_hv_rm_xive.o

kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
book3s_hv_tm_builtin.o

ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
book3s_hv_hmi.o \
book3s_hv_rmhandlers.o \
book3s_hv_rm_mmu.o \
book3s_hv_ras.o \
book3s_hv_builtin.o \
$(kvm-book3s_64-builtin-tm-objs-y) \
$(kvm-book3s_64-builtin-xics-objs-y)
endif

Expand Down
18 changes: 17 additions & 1 deletion arch/powerpc/kvm/book3s_hv.c
Original file line number Diff line number Diff line change
Expand Up @@ -1206,6 +1206,19 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
}
break;

#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case BOOK3S_INTERRUPT_HV_SOFTPATCH:
/*
* This occurs for various TM-related instructions that
* we need to emulate on POWER9 DD2.2. We have already
* handled the cases where the guest was in real-suspend
* mode and was transitioning to transactional state.
*/
r = kvmhv_p9_tm_emulation(vcpu);
break;
#endif

case BOOK3S_INTERRUPT_HV_RM_HARD:
r = RESUME_PASSTHROUGH;
break;
Expand Down Expand Up @@ -1978,7 +1991,9 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
* turn off the HFSCR bit, which causes those instructions to trap.
*/
vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
if (!cpu_has_feature(CPU_FTR_TM))
if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
vcpu->arch.hfscr |= HFSCR_TM;
else if (!cpu_has_feature(CPU_FTR_TM_COMP))
vcpu->arch.hfscr &= ~HFSCR_TM;
if (cpu_has_feature(CPU_FTR_ARCH_300))
vcpu->arch.hfscr &= ~HFSCR_MSGP;
Expand Down Expand Up @@ -2242,6 +2257,7 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
tpaca = &paca[cpu];
tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.ptid = cpu - vc->pcpu;
tpaca->kvm_hstate.fake_suspend = 0;
/* Order stores to hstate.kvm_vcpu etc. before store to kvm_vcore */
smp_wmb();
tpaca->kvm_hstate.kvm_vcore = vc;
Expand Down
Loading

0 comments on commit 4bb3c7a

Please sign in to comment.