Skip to content

Commit

Permalink
Merge branch 'kvm-tdx-interrupts' into HEAD
Browse files Browse the repository at this point in the history
Introduces support for interrupt handling for TDX guests, including
virtual interrupt injection and VM-Exits caused by vectored events.

Injection
=========

TDX supports non-NMI interrupt injection only by posted interrupt. Posted
interrupt descriptors (PIDs) are allocated in shared memory, KVM
can update them directly. To post pending interrupts in the PID, KVM
can generate a self-IPI with notification vector prior to TD entry.
TDX guest status is protected, KVM can't get the interrupt status of
TDX guest. For now, assume the interrupt is always allowed. A later
patch set will let TDX guests to call TDVMCALL with HLT, which passes
the interrupt block flag, so that whether interrupt is allowed in HLT
will checked against the interrupt block flag.

For NMIs, KVM can request the TDX module to inject a NMI into a TDX vCPU
by setting the PEND_NMI TDVPS field to 1. Following that, KVM can call
TDH.VP.ENTER to run the vCPU and the TDX module will attempt to inject
the NMI as soon as possible.  PEND_NMI TDVPS field is a 1-bit filed,
i.e. KVM can only pend one NMI in the TDX module. Also, TDX doesn't
allow KVM to request NMI-window exit directly. When there is already
one NMI pending in the TDX module, i.e. it has not been delivered to
TDX guest yet, if there is NMI pending in KVM, collapse the pending
NMI in KVM into the one pending in the TDX module.  Such collapse is OK
considering on X86 bare metal, multiple NMIs could collapse into one NMI,
e.g. when NMI is blocked by SMI.  It's OS's responsibility to poll all
NMI sources in the NMI handler to avoid missing handling of some NMI
events. More details can be found in the changelog of the patch "KVM:
TDX: Implement methods to inject NMI".

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs because TDX module doesn't provide a way for
VMM to inject SMI into guest TD or switch guest vCPU mode into SMM.
SMI requests return -ENOTTY similar to CONFIG_KVM_SMM=n.  Likewise,
INIT and SIPI events are not used and are blocked for TDX guests;
TDX defines its own vCPU creation and initialization sequence, which
is done on the host via SEAMCALLs at TD build time.

VM-exit for external events
===========================

Similar to the VMX case, external interrupts are with interrupts off:
in the .handle_exit_irqoff() callback for external interrupts and in
the noinstr region for NMIs.  Just like VMX, NMI remains blocked after
exiting from TDX guest for NMI-induced exits.

Machine check, which is handled in the .handle_exit_irqoff() callback, is
the only exception type KVM handles for TDX guests. For other exceptions,
because TDX guest state is protected, exceptions in TDX guests can't be
intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to
userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs.

Host SMIs also cause an exit to KVM.  This is needed because in SEAM
root mode (TDX module) all interrupts are blocked.  An SMI can be "I/O
SMI" or "other SMI".  For TDX, there will be no I/O SMI because I/O
instructions inside TDX guest trigger #VE and TDX guest needs to use
TDVMCALL to request VMM to do I/O emulation.  The only case of interest
for "other SMI" is an #MC occurring in the guest when MCE-SMI morphing
is enabled in the host firmware.  Such "MSMI" is marked by having bit 0
set in the exit qualification; MSMI exits are fatal for the TD and
are eventually handled by the kernel machine check handler (7911f14
x86/mce: Implement recovery for errors in TDX/SEAM non-root mode),
which marks the page as poisoned.  It is not possible right now to
pass machine check exceptions to the guest.

SMIs other than machine check SMIs are handled just by leaving SEAM
root mode and KVM doesn't need to do anything.
  • Loading branch information
Paolo Bonzini committed Mar 14, 2025
2 parents 4d2dc9a + 6c441e4 commit 9913212
Show file tree
Hide file tree
Showing 19 changed files with 522 additions and 120 deletions.
1 change: 1 addition & 0 deletions arch/x86/include/asm/kvm-x86-ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ KVM_X86_OP_OPTIONAL(pi_start_assignment)
KVM_X86_OP_OPTIONAL(apicv_pre_state_restore)
KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
KVM_X86_OP_OPTIONAL(set_hv_timer)
KVM_X86_OP_OPTIONAL(cancel_hv_timer)
KVM_X86_OP(setup_mce)
Expand Down
1 change: 1 addition & 0 deletions arch/x86/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -1842,6 +1842,7 @@ struct kvm_x86_ops {
void (*apicv_pre_state_restore)(struct kvm_vcpu *vcpu);
void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);

int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
bool *expired);
Expand Down
5 changes: 5 additions & 0 deletions arch/x86/include/asm/posted_intr.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ static inline bool pi_test_sn(struct pi_desc *pi_desc)
return test_bit(POSTED_INTR_SN, (unsigned long *)&pi_desc->control);
}

static inline bool pi_test_pir(int vector, struct pi_desc *pi_desc)
{
return test_bit(vector, (unsigned long *)pi_desc->pir);
}

/* Non-atomic helpers */
static inline void __pi_set_sn(struct pi_desc *pi_desc)
{
Expand Down
1 change: 1 addition & 0 deletions arch/x86/include/uapi/asm/vmx.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#define EXIT_REASON_TRIPLE_FAULT 2
#define EXIT_REASON_INIT_SIGNAL 3
#define EXIT_REASON_SIPI_SIGNAL 4
#define EXIT_REASON_OTHER_SMI 6

#define EXIT_REASON_INTERRUPT_WINDOW 7
#define EXIT_REASON_NMI_WINDOW 8
Expand Down
3 changes: 3 additions & 0 deletions arch/x86/kvm/irq.c
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
if (kvm_cpu_has_extint(v))
return 1;

if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
return kvm_x86_call(protected_apic_has_interrupt)(v);

return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
}
EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
Expand Down
14 changes: 13 additions & 1 deletion arch/x86/kvm/lapic.c
Original file line number Diff line number Diff line change
Expand Up @@ -1797,8 +1797,17 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
u32 reg = kvm_lapic_get_reg(apic, APIC_LVTT);
u32 reg;

/*
* Assume a timer IRQ was "injected" if the APIC is protected. KVM's
* copy of the vIRR is bogus, it's the responsibility of the caller to
* precisely check whether or not a timer IRQ is pending.
*/
if (apic->guest_apic_protected)
return true;

reg = kvm_lapic_get_reg(apic, APIC_LVTT);
if (kvm_apic_hw_enabled(apic)) {
int vec = reg & APIC_VECTOR_MASK;
void *bitmap = apic->regs + APIC_ISR;
Expand Down Expand Up @@ -2967,6 +2976,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
if (!kvm_apic_present(vcpu))
return -1;

if (apic->guest_apic_protected)
return -1;

__apic_update_ppr(apic, &ppr);
return apic_has_interrupt_for_ppr(apic, ppr);
}
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/kvm/lapic.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ struct kvm_lapic {
bool sw_enabled;
bool irr_pending;
bool lvt0_in_nmi_mode;
/* Select registers in the vAPIC cannot be read/written. */
bool guest_apic_protected;
/* Number of bits set in ISR. */
s16 isr_count;
/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
Expand Down
3 changes: 3 additions & 0 deletions arch/x86/kvm/smm.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,9 @@ union kvm_smram {

static inline int kvm_inject_smi(struct kvm_vcpu *vcpu)
{
if (!kvm_x86_call(has_emulated_msr)(vcpu->kvm, MSR_IA32_SMBASE))
return -ENOTTY;

kvm_make_request(KVM_REQ_SMI, vcpu);
return 0;
}
Expand Down
70 changes: 70 additions & 0 deletions arch/x86/kvm/vmx/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ struct vcpu_vt {
* hardware.
*/
bool guest_state_loaded;
bool emulation_required;

#ifdef CONFIG_X86_64
u64 msr_host_kernel_gs_base;
Expand Down Expand Up @@ -109,4 +110,73 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
}

static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
int pi_vec)
{
#ifdef CONFIG_SMP
if (vcpu->mode == IN_GUEST_MODE) {
/*
* The vector of the virtual has already been set in the PIR.
* Send a notification event to deliver the virtual interrupt
* unless the vCPU is the currently running vCPU, i.e. the
* event is being sent from a fastpath VM-Exit handler, in
* which case the PIR will be synced to the vIRR before
* re-entering the guest.
*
* When the target is not the running vCPU, the following
* possibilities emerge:
*
* Case 1: vCPU stays in non-root mode. Sending a notification
* event posts the interrupt to the vCPU.
*
* Case 2: vCPU exits to root mode and is still runnable. The
* PIR will be synced to the vIRR before re-entering the guest.
* Sending a notification event is ok as the host IRQ handler
* will ignore the spurious event.
*
* Case 3: vCPU exits to root mode and is blocked. vcpu_block()
* has already synced PIR to vIRR and never blocks the vCPU if
* the vIRR is not empty. Therefore, a blocked vCPU here does
* not wait for any requested interrupts in PIR, and sending a
* notification event also results in a benign, spurious event.
*/

if (vcpu != kvm_get_running_vcpu())
__apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
return;
}
#endif
/*
* The vCPU isn't in the guest; wake the vCPU in case it is blocking,
* otherwise do nothing as KVM will grab the highest priority pending
* IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
*/
kvm_vcpu_wake_up(vcpu);
}

/*
* Post an interrupt to a vCPU's PIR and trigger the vCPU to process the
* interrupt if necessary.
*/
static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
struct pi_desc *pi_desc, int vector)
{
if (pi_test_and_set_pir(vector, pi_desc))
return;

/* If a previous notification has sent the IPI, nothing to do. */
if (pi_test_and_set_on(pi_desc))
return;

/*
* The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
* after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
* guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
* posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
*/
kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
}

noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu);

#endif /* __KVM_X86_VMX_COMMON_H */
Loading

0 comments on commit 9913212

Please sign in to comment.