Skip to content

Commit

Permalink
x86/tdx: Fix arch_safe_halt() execution for TDX VMs
Browse files Browse the repository at this point in the history
commit 9f98a4f upstream.

Direct HLT instruction execution causes #VEs for TDX VMs which is routed
to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE
handler will enable interrupts before TDCALL is routed to hypervisor
leading to missed wakeup events, as current TDX spec doesn't expose
interruptibility state information to allow #VE handler to selectively
enable interrupts.

Commit bfe6ed0 ("x86/tdx: Add HLT support for TDX guests")
prevented the idle routines from executing HLT instruction in STI-shadow.
But it missed the paravirt routine which can be reached via this path
as an example:

	kvm_wait()       =>
        safe_halt()      =>
        raw_safe_halt()  =>
        arch_safe_halt() =>
        irq.safe_halt()  =>
        pv_native_safe_halt()

To reliably handle arch_safe_halt() for TDX VMs, introduce explicit
dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt()
routines with TDX-safe versions that execute direct TDCALL and needed
interrupt flag updates. Executing direct TDCALL brings in additional
benefit of avoiding HLT related #VEs altogether.

As tested by Ryan Afranji:

  "Tested with the specjbb2015 benchmark. It has heavy lock contention which leads
   to many halt calls. TDX VMs suffered a poor score before this patchset.

   Verified the major performance improvement with this patchset applied."

Fixes: bfe6ed0 ("x86/tdx: Add HLT support for TDX guests")
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Ryan Afranji <afranji@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250228014416.3925664-3-vannapurve@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
  • Loading branch information
Vishal Annapurve authored and Greg Kroah-Hartman committed Apr 25, 2025
1 parent e5f0581 commit 29f040d
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 4 deletions.
1 change: 1 addition & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -881,6 +881,7 @@ config INTEL_TDX_GUEST
depends on X86_64 && CPU_SUP_INTEL
depends on X86_X2APIC
depends on EFI_STUB
depends on PARAVIRT
select ARCH_HAS_CC_PLATFORM
select X86_MEM_ENCRYPT
select X86_MCE
Expand Down
26 changes: 25 additions & 1 deletion arch/x86/coco/tdx/tdx.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include <asm/ia32.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
#include <asm/paravirt_types.h>
#include <asm/pgtable.h>
#include <asm/traps.h>

Expand Down Expand Up @@ -334,7 +335,7 @@ static int handle_halt(struct ve_info *ve)
return ve_instr_len(ve);
}

void __cpuidle tdx_safe_halt(void)
void __cpuidle tdx_halt(void)
{
const bool irq_disabled = false;

Expand All @@ -345,6 +346,16 @@ void __cpuidle tdx_safe_halt(void)
WARN_ONCE(1, "HLT instruction emulation failed\n");
}

static void __cpuidle tdx_safe_halt(void)
{
tdx_halt();
/*
* "__cpuidle" section doesn't support instrumentation, so stick
* with raw_* variant that avoids tracing hooks.
*/
raw_local_irq_enable();
}

static int read_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_hypercall_args args = {
Expand Down Expand Up @@ -888,6 +899,19 @@ void __init tdx_early_init(void)
x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required;

/*
* Avoid "sti;hlt" execution in TDX guests as HLT induces a #VE that
* will enable interrupts before HLT TDCALL invocation if executed
* in STI-shadow, possibly resulting in missed wakeup events.
*
* Modify all possible HLT execution paths to use TDX specific routines
* that directly execute TDCALL and toggle the interrupt state as
* needed after TDCALL completion. This also reduces HLT related #VEs
* in addition to having a reliable halt logic execution.
*/
pv_ops.irq.safe_halt = tdx_safe_halt;
pv_ops.irq.halt = tdx_halt;

/*
* TDX intercepts the RDMSR to read the X2APIC ID in the parallel
* bringup low level code. That raises #VE which cannot be handled
Expand Down
4 changes: 2 additions & 2 deletions arch/x86/include/asm/tdx.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ void tdx_get_ve_info(struct ve_info *ve);

bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);

void tdx_safe_halt(void);
void tdx_halt(void);

bool tdx_early_handle_ve(struct pt_regs *regs);

Expand All @@ -55,7 +55,7 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport);
#else

static inline void tdx_early_init(void) { };
static inline void tdx_safe_halt(void) { };
static inline void tdx_halt(void) { };

static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; }

Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -955,7 +955,7 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
static_call_update(x86_idle, mwait_idle);
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
pr_info("using TDX aware idle routine\n");
static_call_update(x86_idle, tdx_safe_halt);
static_call_update(x86_idle, tdx_halt);
} else
static_call_update(x86_idle, default_idle);
}
Expand Down

0 comments on commit 29f040d

Please sign in to comment.