Skip to content

Commit

Permalink
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Browse files Browse the repository at this point in the history
Pull KVM updates from Paolo Bonzini:
 "ARM:

   - More progress on the protected VM front, now with the full fixed
     feature set as well as the limitation of some hypercalls after
     initialisation.

   - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
     complicated

   - Fixes for the vgic placement in the IPA space, together with a
     bunch of selftests

   - More memcg accounting of the memory allocated on behalf of a guest

   - Timer and vgic selftests

   - Workarounds for the Apple M1 broken vgic implementation

   - KConfig cleanups

   - New kvmarm.mode=none option, for those who really dislike us

  RISC-V:

   - New KVM port.

  x86:

   - New API to control TSC offset from userspace

   - TSC scaling for nested hypervisors on SVM

   - Switch masterclock protection from raw_spin_lock to seqcount

   - Clean up function prototypes in the page fault code and avoid
     repeated memslot lookups

   - Convey the exit reason to userspace on emulation failure

   - Configure time between NX page recovery iterations

   - Expose Predictive Store Forwarding Disable CPUID leaf

   - Allocate page tracking data structures lazily (if the i915 KVM-GT
     functionality is not compiled in)

   - Cleanups, fixes and optimizations for the shadow MMU code

  s390:

   - SIGP Fixes

   - initial preparations for lazy destroy of secure VMs

   - storage key improvements/fixes

   - Log the guest CPNC

  Starting from this release, KVM-PPC patches will come from Michael
  Ellerman's PPC tree"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  RISC-V: KVM: fix boolreturn.cocci warnings
  RISC-V: KVM: remove unneeded semicolon
  RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions
  RISC-V: KVM: Factor-out FP virtualization into separate sources
  KVM: s390: add debug statement for diag 318 CPNC data
  KVM: s390: pv: properly handle page flags for protected guests
  KVM: s390: Fix handle_sske page fault handling
  KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
  KVM: x86: Clarify the kvm_run.emulation_failure structure layout
  KVM: s390: Add a routine for setting userspace CPU state
  KVM: s390: Simplify SIGP Set Arch handling
  KVM: s390: pv: avoid stalls when making pages secure
  KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
  KVM: s390: pv: avoid double free of sida page
  KVM: s390: pv: add macros for UVC CC values
  s390/mm: optimize reset_guest_reference_bit()
  s390/mm: optimize set_guest_storage_key()
  s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present
  ...
  • Loading branch information
Linus Torvalds committed Nov 2, 2021
2 parents 44261f8 + 52cf891 commit d7e0a79
Show file tree
Hide file tree
Showing 152 changed files with 11,646 additions and 1,752 deletions.
15 changes: 13 additions & 2 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2353,7 +2353,14 @@
[KVM] Controls how many 4KiB pages are periodically zapped
back to huge pages. 0 disables the recovery, otherwise if
the value is N KVM will zap 1/Nth of the 4KiB pages every
minute. The default is 60.
period (see below). The default is 60.

kvm.nx_huge_pages_recovery_period_ms=
[KVM] Controls the time period at which KVM zaps 4KiB pages
back to huge pages. If the value is a non-zero N, KVM will
zap a portion (see ratio above) of the pages every N msecs.
If the value is 0 (the default), KVM will pick a period based
on the ratio, such that a page is zapped after 1 hour on average.

kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
Default is 1 (enabled)
Expand All @@ -2365,14 +2372,18 @@
kvm-arm.mode=
[KVM,ARM] Select one of KVM/arm64's modes of operation.

none: Forcefully disable KVM.

nvhe: Standard nVHE-based mode, without support for
protected guests.

protected: nVHE-based mode with support for guests whose
state is kept private from the host.
Not valid if the kernel is running in EL2.

Defaults to VHE/nVHE based on hardware support.
Defaults to VHE/nVHE based on hardware support. Setting
mode to "protected" will disable kexec and hibernation
for the host.

kvm-arm.vgic_v3_group0_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-0
Expand Down
241 changes: 223 additions & 18 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,7 @@ translation mode.
------------------

:Capability: basic
:Architectures: x86, ppc, mips
:Architectures: x86, ppc, mips, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_interrupt (in)
:Returns: 0 on success, negative on failure.
Expand Down Expand Up @@ -601,6 +601,23 @@ interrupt number dequeues the interrupt.

This is an asynchronous vcpu ioctl and can be invoked from any thread.

RISC-V:
^^^^^^^

Queues an external interrupt to be injected into the virutal CPU. This ioctl
is overloaded with 2 different irq values:

a) KVM_INTERRUPT_SET

This sets external interrupt for a virtual CPU and it will receive
once it is ready.

b) KVM_INTERRUPT_UNSET

This clears pending external interrupt for a virtual CPU.

This is an asynchronous vcpu ioctl and can be invoked from any thread.


4.17 KVM_DEBUG_GUEST
--------------------
Expand Down Expand Up @@ -993,20 +1010,37 @@ such as migration.
When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
set of bits that KVM can return in struct kvm_clock_data's flag member.

The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
value is the exact kvmclock value seen by all VCPUs at the instant
when KVM_GET_CLOCK was called. If clear, the returned value is simply
CLOCK_MONOTONIC plus a constant offset; the offset can be modified
with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
but the exact value read by each VCPU could differ, because the host
TSC is not stable.
The following flags are defined:

KVM_CLOCK_TSC_STABLE
If set, the returned value is the exact kvmclock
value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
offset; the offset can be modified with KVM_SET_CLOCK. KVM will try
to make all VCPUs follow this clock, but the exact value read by each
VCPU could differ, because the host TSC is not stable.

KVM_CLOCK_REALTIME
If set, the `realtime` field in the kvm_clock_data
structure is populated with the value of the host's real time
clocksource at the instant when KVM_GET_CLOCK was called. If clear,
the `realtime` field does not contain a value.

KVM_CLOCK_HOST_TSC
If set, the `host_tsc` field in the kvm_clock_data
structure is populated with the value of the host's timestamp counter (TSC)
at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
does not contain a value.

::

struct kvm_clock_data {
__u64 clock; /* kvmclock current value */
__u32 flags;
__u32 pad[9];
__u32 pad0;
__u64 realtime;
__u64 host_tsc;
__u32 pad[4];
};


Expand All @@ -1023,12 +1057,25 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
such as migration.

The following flags can be passed:

KVM_CLOCK_REALTIME
If set, KVM will compare the value of the `realtime` field
with the value of the host's real time clocksource at the instant when
KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
kvmclock value that will be provided to guests.

Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored.

::

struct kvm_clock_data {
__u64 clock; /* kvmclock current value */
__u32 flags;
__u32 pad[9];
__u32 pad0;
__u64 realtime;
__u64 host_tsc;
__u32 pad[4];
};


Expand Down Expand Up @@ -1399,7 +1446,7 @@ for vm-wide capabilities.
---------------------

:Capability: KVM_CAP_MP_STATE
:Architectures: x86, s390, arm, arm64
:Architectures: x86, s390, arm, arm64, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (out)
:Returns: 0 on success; -1 on error
Expand All @@ -1416,7 +1463,8 @@ uniprocessor guests).
Possible values are:

========================== ===============================================
KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64]
KVM_MP_STATE_RUNNABLE the vcpu is currently running
[x86,arm/arm64,riscv]
KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP)
which has not yet received an INIT signal [x86]
KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
Expand All @@ -1425,7 +1473,7 @@ Possible values are:
is waiting for an interrupt [x86]
KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector
accessible via KVM_GET_VCPU_EVENTS) [x86]
KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64]
KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv]
KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390]
KVM_MP_STATE_OPERATING the vcpu is operating (running or halted)
[s390]
Expand All @@ -1437,8 +1485,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.

For arm/arm64:
^^^^^^^^^^^^^^
For arm/arm64/riscv:
^^^^^^^^^^^^^^^^^^^^

The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
Expand All @@ -1447,7 +1495,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
---------------------

:Capability: KVM_CAP_MP_STATE
:Architectures: x86, s390, arm, arm64
:Architectures: x86, s390, arm, arm64, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (in)
:Returns: 0 on success; -1 on error
Expand All @@ -1459,8 +1507,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.

For arm/arm64:
^^^^^^^^^^^^^^
For arm/arm64/riscv:
^^^^^^^^^^^^^^^^^^^^

The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
Expand Down Expand Up @@ -2577,6 +2625,144 @@ following id bit patterns::

0x7020 0000 0003 02 <0:3> <reg:5>

RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of
that is the register group type.

RISC-V config registers are meant for configuring a Guest VCPU and it has
the following id bit patterns::

0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host)
0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host)

Following are the RISC-V config registers:

======================= ========= =============================================
Encoding Register Description
======================= ========= =============================================
0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU
======================= ========= =============================================

The isa config register can be read anytime but can only be written before
a Guest VCPU runs. It will have ISA feature bits matching underlying host
set by default.

RISC-V core registers represent the general excution state of a Guest VCPU
and it has the following id bit patterns::

0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host)
0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host)

Following are the RISC-V core registers:

======================= ========= =============================================
Encoding Register Description
======================= ========= =============================================
0x80x0 0000 0200 0000 regs.pc Program counter
0x80x0 0000 0200 0001 regs.ra Return address
0x80x0 0000 0200 0002 regs.sp Stack pointer
0x80x0 0000 0200 0003 regs.gp Global pointer
0x80x0 0000 0200 0004 regs.tp Task pointer
0x80x0 0000 0200 0005 regs.t0 Caller saved register 0
0x80x0 0000 0200 0006 regs.t1 Caller saved register 1
0x80x0 0000 0200 0007 regs.t2 Caller saved register 2
0x80x0 0000 0200 0008 regs.s0 Callee saved register 0
0x80x0 0000 0200 0009 regs.s1 Callee saved register 1
0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0
0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1
0x80x0 0000 0200 000c regs.a2 Function argument 2
0x80x0 0000 0200 000d regs.a3 Function argument 3
0x80x0 0000 0200 000e regs.a4 Function argument 4
0x80x0 0000 0200 000f regs.a5 Function argument 5
0x80x0 0000 0200 0010 regs.a6 Function argument 6
0x80x0 0000 0200 0011 regs.a7 Function argument 7
0x80x0 0000 0200 0012 regs.s2 Callee saved register 2
0x80x0 0000 0200 0013 regs.s3 Callee saved register 3
0x80x0 0000 0200 0014 regs.s4 Callee saved register 4
0x80x0 0000 0200 0015 regs.s5 Callee saved register 5
0x80x0 0000 0200 0016 regs.s6 Callee saved register 6
0x80x0 0000 0200 0017 regs.s7 Callee saved register 7
0x80x0 0000 0200 0018 regs.s8 Callee saved register 8
0x80x0 0000 0200 0019 regs.s9 Callee saved register 9
0x80x0 0000 0200 001a regs.s10 Callee saved register 10
0x80x0 0000 0200 001b regs.s11 Callee saved register 11
0x80x0 0000 0200 001c regs.t3 Caller saved register 3
0x80x0 0000 0200 001d regs.t4 Caller saved register 4
0x80x0 0000 0200 001e regs.t5 Caller saved register 5
0x80x0 0000 0200 001f regs.t6 Caller saved register 6
0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode)
======================= ========= =============================================

RISC-V csr registers represent the supervisor mode control/status registers
of a Guest VCPU and it has the following id bit patterns::

0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host)
0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host)

Following are the RISC-V csr registers:

======================= ========= =============================================
Encoding Register Description
======================= ========= =============================================
0x80x0 0000 0300 0000 sstatus Supervisor status
0x80x0 0000 0300 0001 sie Supervisor interrupt enable
0x80x0 0000 0300 0002 stvec Supervisor trap vector base
0x80x0 0000 0300 0003 sscratch Supervisor scratch register
0x80x0 0000 0300 0004 sepc Supervisor exception program counter
0x80x0 0000 0300 0005 scause Supervisor trap cause
0x80x0 0000 0300 0006 stval Supervisor bad address or instruction
0x80x0 0000 0300 0007 sip Supervisor interrupt pending
0x80x0 0000 0300 0008 satp Supervisor address translation and protection
======================= ========= =============================================

RISC-V timer registers represent the timer state of a Guest VCPU and it has
the following id bit patterns::

0x8030 0000 04 <index into the kvm_riscv_timer struct:24>

Following are the RISC-V timer registers:

======================= ========= =============================================
Encoding Register Description
======================= ========= =============================================
0x8030 0000 0400 0000 frequency Time base frequency (read-only)
0x8030 0000 0400 0001 time Time value visible to Guest
0x8030 0000 0400 0002 compare Time compare programmed by Guest
0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF)
======================= ========= =============================================

RISC-V F-extension registers represent the single precision floating point
state of a Guest VCPU and it has the following id bit patterns::

0x8020 0000 05 <index into the __riscv_f_ext_state struct:24>

Following are the RISC-V F-extension registers:

======================= ========= =============================================
Encoding Register Description
======================= ========= =============================================
0x8020 0000 0500 0000 f[0] Floating point register 0
...
0x8020 0000 0500 001f f[31] Floating point register 31
0x8020 0000 0500 0020 fcsr Floating point control and status register
======================= ========= =============================================

RISC-V D-extension registers represent the double precision floating point
state of a Guest VCPU and it has the following id bit patterns::

0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr)
0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr)

Following are the RISC-V D-extension registers:

======================= ========= =============================================
Encoding Register Description
======================= ========= =============================================
0x8030 0000 0600 0000 f[0] Floating point register 0
...
0x8030 0000 0600 001f f[31] Floating point register 31
0x8020 0000 0600 0020 fcsr Floating point control and status register
======================= ========= =============================================


4.69 KVM_GET_ONE_REG
--------------------
Expand Down Expand Up @@ -5848,6 +6034,25 @@ Valid values for 'type' are:
Userspace is expected to place the hypercall result into the appropriate
field before invoking KVM_RUN again.

::

/* KVM_EXIT_RISCV_SBI */
struct {
unsigned long extension_id;
unsigned long function_id;
unsigned long args[6];
unsigned long ret[2];
} riscv_sbi;
If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has
done a SBI call which is not handled by KVM RISC-V kernel module. The details
of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The
'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the
'function_id' field represents function ID of given SBI extension. The 'args'
array field of 'riscv_sbi' represents parameters for the SBI call and 'ret'
array field represents return values. The userspace should update the return
values of SBI call before resuming the VCPU. For more details on RISC-V SBI
spec refer, https://github.com/riscv/riscv-sbi-doc.

::

/* Fix the size of the union. */
Expand Down
Loading

0 comments on commit d7e0a79

Please sign in to comment.