Skip to content

Commit

Permalink
Merge branch 'kvm-tdx-finish-initial' into HEAD
Browse files Browse the repository at this point in the history
This patch ties the remaining loose ends and finally enables TDX guests to
run inside KVM.  It implements handling of EPT violation/misconfig and of
several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR,
GetTdVmCallInfo); it also adds a bunch of wrappers in vmx/main.c to
ignore operations not supported by TDX guests(*)

Finally, it introduces documentation for the new APIs that have been
added along the way.

(*) access to CPU state, VMX preemption timer, accesses to TSC offset or
    multiplier, LMCE enable/disable, hypercall patching.
  • Loading branch information
Paolo Bonzini committed Mar 14, 2025
2 parents 9913212 + 52f52ea commit 7bcf724
Show file tree
Hide file tree
Showing 21 changed files with 1,204 additions and 107 deletions.
35 changes: 31 additions & 4 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1407,6 +1407,9 @@ the memory region are automatically reflected into the guest. For example, an
mmap() that affects the region will be made visible immediately. Another
example is madvise(MADV_DROP).

For TDX guest, deleting/moving memory region loses guest memory contents.
Read only region isn't supported. Only as-id 0 is supported.

Note: On arm64, a write generated by the page-table walker (to update
the Access and Dirty flags, for example) never results in a
KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
Expand Down Expand Up @@ -4764,17 +4767,19 @@ H_GET_CPU_CHARACTERISTICS hypercall.

:Capability: basic
:Architectures: x86
:Type: vm
:Type: vm ioctl, vcpu ioctl
:Parameters: an opaque platform specific structure (in/out)
:Returns: 0 on success; -1 on error

If the platform supports creating encrypted VMs then this ioctl can be used
for issuing platform-specific memory encryption commands to manage those
encrypted VMs.

Currently, this ioctl is used for issuing Secure Encrypted Virtualization
(SEV) commands on AMD Processors. The SEV commands are defined in
Documentation/virt/kvm/x86/amd-memory-encryption.rst.
Currently, this ioctl is used for issuing both Secure Encrypted Virtualization
(SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands
on Intel Processors. The detailed commands are defined in
Documentation/virt/kvm/x86/amd-memory-encryption.rst and
Documentation/virt/kvm/x86/intel-tdx.rst.

4.111 KVM_MEMORY_ENCRYPT_REG_REGION
-----------------------------------
Expand Down Expand Up @@ -8160,6 +8165,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
and 0x489), as KVM does now allow them to
be set by userspace (KVM sets them based on
guest CPUID, for safety purposes).

KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
guest PAT and forces the effective memory
type to WB in EPT. The quirk is not available
on Intel platforms which are incapable of
safely honoring guest PAT (i.e., without CPU
self-snoop, KVM always ignores guest PAT and
forces effective memory type to WB). It is
also ignored on AMD platforms or, on Intel,
when a VM has non-coherent DMA devices
assigned; KVM always honors guest PAT in
such case. The quirk is needed to avoid
slowdowns on certain Intel Xeon platforms
(e.g. ICX, SPR) where self-snoop feature is
supported but UC is slow enough to cause
issues with some older guests that use
UC instead of WC to map the video RAM.
Userspace can disable the quirk to honor
guest PAT if it knows that there is no such
guest software, for example if it does not
expose a bochs graphics device (which is
known to have had a buggy driver).
=================================== ============================================

7.32 KVM_CAP_MAX_VCPU_ID
Expand Down
1 change: 1 addition & 0 deletions Documentation/virt/kvm/x86/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ KVM for x86 systems
cpuid
errata
hypercalls
intel-tdx
mmu
msr
nested-vmx
Expand Down
255 changes: 255 additions & 0 deletions Documentation/virt/kvm/x86/intel-tdx.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
.. SPDX-License-Identifier: GPL-2.0
===================================
Intel Trust Domain Extensions (TDX)
===================================

Overview
========
Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from the
host and physical attacks. A CPU-attested software module called 'the TDX
module' runs inside a new CPU isolated range to provide the functionalities to
manage and run protected VMs, a.k.a, TDX guests or TDs.

Please refer to [1] for the whitepaper, specifications and other resources.

This documentation describes TDX-specific KVM ABIs. The TDX module needs to be
initialized before it can be used by KVM to run any TDX guests. The host
core-kernel provides the support of initializing the TDX module, which is
described in the Documentation/arch/x86/tdx.rst.

API description
===============

KVM_MEMORY_ENCRYPT_OP
---------------------
:Type: vm ioctl, vcpu ioctl

For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
ioctl with TDX specific sub-ioctl() commands.

::

/* Trust Domain Extensions sub-ioctl() commands. */
enum kvm_tdx_cmd_id {
KVM_TDX_CAPABILITIES = 0,
KVM_TDX_INIT_VM,
KVM_TDX_INIT_VCPU,
KVM_TDX_INIT_MEM_REGION,
KVM_TDX_FINALIZE_VM,
KVM_TDX_GET_CPUID,

KVM_TDX_CMD_NR_MAX,
};

struct kvm_tdx_cmd {
/* enum kvm_tdx_cmd_id */
__u32 id;
/* flags for sub-command. If sub-command doesn't use this, set zero. */
__u32 flags;
/*
* data for each sub-command. An immediate or a pointer to the actual
* data in process virtual address. If sub-command doesn't use it,
* set zero.
*/
__u64 data;
/*
* Auxiliary error code. The sub-command may return TDX SEAMCALL
* status code in addition to -Exxx.
*/
__u64 hw_error;
};

KVM_TDX_CAPABILITIES
--------------------
:Type: vm ioctl
:Returns: 0 on success, <0 on error

Return the TDX capabilities that current KVM supports with the specific TDX
module loaded in the system. It reports what features/capabilities are allowed
to be configured to the TDX guest.

- id: KVM_TDX_CAPABILITIES
- flags: must be 0
- data: pointer to struct kvm_tdx_capabilities
- hw_error: must be 0

::

struct kvm_tdx_capabilities {
__u64 supported_attrs;
__u64 supported_xfam;
__u64 reserved[254];

/* Configurable CPUID bits for userspace */
struct kvm_cpuid2 cpuid;
};


KVM_TDX_INIT_VM
---------------
:Type: vm ioctl
:Returns: 0 on success, <0 on error

Perform TDX specific VM initialization. This needs to be called after
KVM_CREATE_VM and before creating any VCPUs.

- id: KVM_TDX_INIT_VM
- flags: must be 0
- data: pointer to struct kvm_tdx_init_vm
- hw_error: must be 0

::

struct kvm_tdx_init_vm {
__u64 attributes;
__u64 xfam;
__u64 mrconfigid[6]; /* sha384 digest */
__u64 mrowner[6]; /* sha384 digest */
__u64 mrownerconfig[6]; /* sha384 digest */

/* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
__u64 reserved[12];

/*
* Call KVM_TDX_INIT_VM before vcpu creation, thus before
* KVM_SET_CPUID2.
* This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
* TDX module directly virtualizes those CPUIDs without VMM. The user
* space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
* those values. If it doesn't, KVM may have wrong idea of vCPUIDs of
* the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
* module doesn't virtualize.
*/
struct kvm_cpuid2 cpuid;
};


KVM_TDX_INIT_VCPU
-----------------
:Type: vcpu ioctl
:Returns: 0 on success, <0 on error

Perform TDX specific VCPU initialization.

- id: KVM_TDX_INIT_VCPU
- flags: must be 0
- data: initial value of the guest TD VCPU RCX
- hw_error: must be 0

KVM_TDX_INIT_MEM_REGION
-----------------------
:Type: vcpu ioctl
:Returns: 0 on success, <0 on error

Initialize @nr_pages TDX guest private memory starting from @gpa with userspace
provided data from @source_addr.

Note, before calling this sub command, memory attribute of the range
[gpa, gpa + nr_pages] needs to be private. Userspace can use
KVM_SET_MEMORY_ATTRIBUTES to set the attribute.

If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement.

- id: KVM_TDX_INIT_MEM_REGION
- flags: currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
- data: pointer to struct kvm_tdx_init_mem_region
- hw_error: must be 0

::

#define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0)

struct kvm_tdx_init_mem_region {
__u64 source_addr;
__u64 gpa;
__u64 nr_pages;
};


KVM_TDX_FINALIZE_VM
-------------------
:Type: vm ioctl
:Returns: 0 on success, <0 on error

Complete measurement of the initial TD contents and mark it ready to run.

- id: KVM_TDX_FINALIZE_VM
- flags: must be 0
- data: must be 0
- hw_error: must be 0


KVM_TDX_GET_CPUID
-----------------
:Type: vcpu ioctl
:Returns: 0 on success, <0 on error

Get the CPUID values that the TDX module virtualizes for the TD guest.
When it returns -E2BIG, the user space should allocate a larger buffer and
retry. The minimum buffer size is updated in the nent field of the
struct kvm_cpuid2.

- id: KVM_TDX_GET_CPUID
- flags: must be 0
- data: pointer to struct kvm_cpuid2 (in/out)
- hw_error: must be 0 (out)

::

struct kvm_cpuid2 {
__u32 nent;
__u32 padding;
struct kvm_cpuid_entry2 entries[0];
};

struct kvm_cpuid_entry2 {
__u32 function;
__u32 index;
__u32 flags;
__u32 eax;
__u32 ebx;
__u32 ecx;
__u32 edx;
__u32 padding[3];
};

KVM TDX creation flow
=====================
In addition to the standard KVM flow, new TDX ioctls need to be called. The
control flow is as follows:

#. Check system wide capability

* KVM_CAP_VM_TYPES: Check if VM type is supported and if KVM_X86_TDX_VM
is supported.

#. Create VM

* KVM_CREATE_VM
* KVM_TDX_CAPABILITIES: Query TDX capabilities for creating TDX guests.
* KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum VCPUs the TD can
support at VM level (TDX has its own limitation on this).
* KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency
than host is desired. This is Optional.
* KVM_TDX_INIT_VM: Pass TDX specific VM parameters.

#. Create VCPU

* KVM_CREATE_VCPU
* KVM_TDX_INIT_VCPU: Pass TDX specific VCPU parameters.
* KVM_SET_CPUID2: Configure TD's CPUIDs.
* KVM_SET_MSRS: Configure TD's MSRs.

#. Initialize initial guest memory

* Prepare content of initial guest memory.
* KVM_TDX_INIT_MEM_REGION: Add initial guest memory.
* KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest.

#. Run VCPU

References
==========

.. [1] https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html
7 changes: 6 additions & 1 deletion arch/x86/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -2420,7 +2420,12 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
KVM_X86_QUIRK_SLOT_ZAP_ALL | \
KVM_X86_QUIRK_STUFF_FEATURE_MSRS)
KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
KVM_X86_QUIRK_IGNORE_GUEST_PAT)

#define KVM_X86_CONDITIONAL_QUIRKS \
(KVM_X86_QUIRK_CD_NW_CLEARED | \
KVM_X86_QUIRK_IGNORE_GUEST_PAT)

/*
* KVM previously used a u32 field in kvm_run to indicate the hypercall was
Expand Down
1 change: 1 addition & 0 deletions arch/x86/include/asm/shared/tdx.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
#define TD_CTLS_LOCK BIT_ULL(TD_CTLS_LOCK_BIT)

/* TDX hypercall Leaf IDs */
#define TDVMCALL_GET_TD_VM_CALL_INFO 0x10000
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_GET_QUOTE 0x10002
#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/include/asm/vmx.h
Original file line number Diff line number Diff line change
Expand Up @@ -585,12 +585,14 @@ enum vm_entry_failure_code {
#define EPT_VIOLATION_ACC_WRITE_BIT 1
#define EPT_VIOLATION_ACC_INSTR_BIT 2
#define EPT_VIOLATION_RWX_SHIFT 3
#define EPT_VIOLATION_EXEC_R3_LIN_BIT 6
#define EPT_VIOLATION_GVA_IS_VALID_BIT 7
#define EPT_VIOLATION_GVA_TRANSLATED_BIT 8
#define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT)
#define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT)
#define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT)
#define EPT_VIOLATION_RWX_MASK (VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT)
#define EPT_VIOLATION_EXEC_FOR_RING3_LIN (1 << EPT_VIOLATION_EXEC_R3_LIN_BIT)
#define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT)
#define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)

Expand Down
1 change: 1 addition & 0 deletions arch/x86/include/uapi/asm/kvm.h
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,7 @@ struct kvm_sync_regs {
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8)
#define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9)

#define KVM_STATE_NESTED_FORMAT_VMX 0
#define KVM_STATE_NESTED_FORMAT_SVM 1
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kvm/mmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return -(u32)fault & errcode;
}

bool kvm_mmu_may_ignore_guest_pat(void);
bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm);

int kvm_mmu_post_init_vm(struct kvm *kvm);
void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
Expand Down
Loading

0 comments on commit 7bcf724

Please sign in to comment.