Skip to content

Commit

Permalink
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/g…
Browse files Browse the repository at this point in the history
…it/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (25 commits)
  mmtimer: Push BKL down into the ioctl handler
  [IA64] Remove experimental status of kdump
  [IA64] Update ia64 mmr list for SGI uv
  [IA64] Avoid overflowing ia64_cpu_to_sapicid in acpi_map_lsapic()
  [IA64] adding parameter check to module_free()
  [IA64] improper printk format in acpi-cpufreq
  [IA64] pv_ops: move some functions in ivt.S to avoid lack of space.
  [IA64] pvops: documentation on ia64/pv_ops
  [IA64] pvops: add to hooks, pv_time_ops, for steal time accounting.
  [IA64] pvops: add hooks, pv_irq_ops, to paravirtualized irq related operations.
  [IA64] pvops: add hooks, pv_iosapic_ops, to paravirtualize iosapic.
  [IA64] pvops: define initialization hooks, pv_init_ops, for paravirtualized environment.
  [IA64] pvops: paravirtualize NR_IRQS
  [IA64] pvops: paravirtualize entry.S
  [IA64] pvops: paravirtualize ivt.S
  [IA64] pvops: paravirtualize minstate.h.
  [IA64] pvops: define paravirtualized instructions for native.
  [IA64] pvops: preparation for paravirtulization of hand written assembly code.
  [IA64] pvops: introduce pv_cpu_ops to paravirtualize privileged instructions.
  [IA64] pvops: add an early setup hook for pv_ops.
  ...
  • Loading branch information
Linus Torvalds committed Jul 21, 2008
2 parents 807677f + 4cddb88 commit eb4225b
Show file tree
Hide file tree
Showing 37 changed files with 2,251 additions and 387 deletions.
137 changes: 137 additions & 0 deletions Documentation/ia64/paravirt_ops.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
Paravirt_ops on IA64
====================
21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp>


Introduction
------------
The aim of this documentation is to help with maintainability and/or to
encourage people to use paravirt_ops/IA64.

paravirt_ops (pv_ops in short) is a way for virtualization support of
Linux kernel on x86. Several ways for virtualization support were
proposed, paravirt_ops is the winner.
On the other hand, now there are also several IA64 virtualization
technologies like kvm/IA64, xen/IA64 and many other academic IA64
hypervisors so that it is good to add generic virtualization
infrastructure on Linux/IA64.


What is paravirt_ops?
---------------------
It has been developed on x86 as virtualization support via API, not ABI.
It allows each hypervisor to override operations which are important for
hypervisors at API level. And it allows a single kernel binary to run on
all supported execution environments including native machine.
Essentially paravirt_ops is a set of function pointers which represent
operations corresponding to low level sensitive instructions and high
level functionalities in various area. But one significant difference
from usual function pointer table is that it allows optimization with
binary patch. It is because some of these operations are very
performance sensitive and indirect call overhead is not negligible.
With binary patch, indirect C function call can be transformed into
direct C function call or in-place execution to eliminate the overhead.

Thus, operations of paravirt_ops are classified into three categories.
- simple indirect call
These operations correspond to high level functionality so that the
overhead of indirect call isn't very important.

- indirect call which allows optimization with binary patch
Usually these operations correspond to low level instructions. They
are called frequently and performance critical. So the overhead is
very important.

- a set of macros for hand written assembly code
Hand written assembly codes (.S files) also need paravirtualization
because they include sensitive instructions or some of code paths in
them are very performance critical.


The relation to the IA64 machine vector
---------------------------------------
Linux/IA64 has the IA64 machine vector functionality which allows the
kernel to switch implementations (e.g. initialization, ipi, dma api...)
depending on executing platform.
We can replace some implementations very easily defining a new machine
vector. Thus another approach for virtualization support would be
enhancing the machine vector functionality.
But paravirt_ops approach was taken because
- virtualization support needs wider support than machine vector does.
e.g. low level instruction paravirtualization. It must be
initialized very early before platform detection.

- virtualization support needs more functionality like binary patch.
Probably the calling overhead might not be very large compared to the
emulation overhead of virtualization. However in the native case, the
overhead should be eliminated completely.
A single kernel binary should run on each environment including native,
and the overhead of paravirt_ops on native environment should be as
small as possible.

- for full virtualization technology, e.g. KVM/IA64 or
Xen/IA64 HVM domain, the result would be
(the emulated platform machine vector. probably dig) + (pv_ops).
This means that the virtualization support layer should be under
the machine vector layer.

Possibly it might be better to move some function pointers from
paravirt_ops to machine vector. In fact, Xen domU case utilizes both
pv_ops and machine vector.


IA64 paravirt_ops
-----------------
In this section, the concrete paravirt_ops will be discussed.
Because of the architecture difference between ia64 and x86, the
resulting set of functions is very different from x86 pv_ops.

- C function pointer tables
They are not very performance critical so that simple C indirect
function call is acceptable. The following structures are defined at
this moment. For details see linux/include/asm-ia64/paravirt.h
- struct pv_info
This structure describes the execution environment.
- struct pv_init_ops
This structure describes the various initialization hooks.
- struct pv_iosapic_ops
This structure describes hooks to iosapic operations.
- struct pv_irq_ops
This structure describes hooks to irq related operations
- struct pv_time_op
This structure describes hooks to steal time accounting.

- a set of indirect calls which need optimization
Currently this class of functions correspond to a subset of IA64
intrinsics. At this moment the optimization with binary patch isn't
implemented yet.
struct pv_cpu_op is defined. For details see
linux/include/asm-ia64/paravirt_privop.h
Mostly they correspond to ia64 intrinsics 1-to-1.
Caveat: Now they are defined as C indirect function pointers, but in
order to support binary patch optimization, they will be changed
using GCC extended inline assembly code.

- a set of macros for hand written assembly code (.S files)
For maintenance purpose, the taken approach for .S files is single
source code and compile multiple times with different macros definitions.
Each pv_ops instance must define those macros to compile.
The important thing here is that sensitive, but non-privileged
instructions must be paravirtualized and that some privileged
instructions also need paravirtualization for reasonable performance.
Developers who modify .S files must be aware of that. At this moment
an easy checker is implemented to detect paravirtualization breakage.
But it doesn't cover all the cases.

Sometimes this set of macros is called pv_cpu_asm_op. But there is no
corresponding structure in the source code.
Those macros mostly 1:1 correspond to a subset of privileged
instructions. See linux/include/asm-ia64/native/inst.h.
And some functions written in assembly also need to be overrided so
that each pv_ops instance have to define some macros. Again see
linux/include/asm-ia64/native/inst.h.


Those structures must be initialized very early before start_kernel.
Probably initialized in head.S using multi entry point or some other trick.
For native case implementation see linux/arch/ia64/kernel/paravirt.c.
4 changes: 2 additions & 2 deletions arch/ia64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -540,8 +540,8 @@ config KEXEC
strongly in flux, so no good recommendation can be made.

config CRASH_DUMP
bool "kernel crash dumps (EXPERIMENTAL)"
depends on EXPERIMENTAL && IA64_MCA_RECOVERY && !IA64_HP_SIM && (!SMP || HOTPLUG_CPU)
bool "kernel crash dumps"
depends on IA64_MCA_RECOVERY && !IA64_HP_SIM && (!SMP || HOTPLUG_CPU)
help
Generate crash dump after being started by kexec.

Expand Down
6 changes: 6 additions & 0 deletions arch/ia64/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,9 @@ define archhelp
echo ' boot - Build vmlinux and bootloader for Ski simulator'
echo '* unwcheck - Check vmlinux for invalid unwind info'
endef

archprepare: make_nr_irqs_h FORCE
PHONY += make_nr_irqs_h FORCE

make_nr_irqs_h: FORCE
$(Q)$(MAKE) $(build)=arch/ia64/kernel include/asm-ia64/nr-irqs.h
44 changes: 44 additions & 0 deletions arch/ia64/kernel/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ obj-$(CONFIG_PCI_MSI) += msi_ia64.o
mca_recovery-y += mca_drv.o mca_drv_asm.o
obj-$(CONFIG_IA64_MC_ERR_INJECT)+= err_inject.o

obj-$(CONFIG_PARAVIRT) += paravirt.o paravirtentry.o

obj-$(CONFIG_IA64_ESI) += esi.o
ifneq ($(CONFIG_IA64_ESI),)
obj-y += esi_stub.o # must be in kernel proper
Expand Down Expand Up @@ -70,3 +72,45 @@ $(obj)/gate-syms.o: $(obj)/gate.lds $(obj)/gate.o FORCE
# We must build gate.so before we can assemble it.
# Note: kbuild does not track this dependency due to usage of .incbin
$(obj)/gate-data.o: $(obj)/gate.so

# Calculate NR_IRQ = max(IA64_NATIVE_NR_IRQS, XEN_NR_IRQS, ...) based on config
define sed-y
"/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}"
endef
quiet_cmd_nr_irqs = GEN $@
define cmd_nr_irqs
(set -e; \
echo "#ifndef __ASM_NR_IRQS_H__"; \
echo "#define __ASM_NR_IRQS_H__"; \
echo "/*"; \
echo " * DO NOT MODIFY."; \
echo " *"; \
echo " * This file was generated by Kbuild"; \
echo " *"; \
echo " */"; \
echo ""; \
sed -ne $(sed-y) $<; \
echo ""; \
echo "#endif" ) > $@
endef

# We use internal kbuild rules to avoid the "is up to date" message from make
arch/$(SRCARCH)/kernel/nr-irqs.s: $(srctree)/arch/$(SRCARCH)/kernel/nr-irqs.c \
$(wildcard $(srctree)/include/asm-ia64/*/irq.h)
$(Q)mkdir -p $(dir $@)
$(call if_changed_dep,cc_s_c)

include/asm-ia64/nr-irqs.h: arch/$(SRCARCH)/kernel/nr-irqs.s
$(Q)mkdir -p $(dir $@)
$(call cmd,nr_irqs)

clean-files += $(objtree)/include/asm-ia64/nr-irqs.h

#
# native ivt.S and entry.S
#
ASM_PARAVIRT_OBJS = ivt.o entry.o
define paravirtualized_native
AFLAGS_$(1) += -D__IA64_ASM_PARAVIRTUALIZED_NATIVE
endef
$(foreach obj,$(ASM_PARAVIRT_OBJS),$(eval $(call paravirtualized_native,$(obj))))
5 changes: 2 additions & 3 deletions arch/ia64/kernel/acpi.c
Original file line number Diff line number Diff line change
Expand Up @@ -774,7 +774,7 @@ int acpi_gsi_to_irq(u32 gsi, unsigned int *irq)
*/
#ifdef CONFIG_ACPI_HOTPLUG_CPU
static
int acpi_map_cpu2node(acpi_handle handle, int cpu, long physid)
int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
{
#ifdef CONFIG_ACPI_NUMA
int pxm_id;
Expand Down Expand Up @@ -854,8 +854,7 @@ int acpi_map_lsapic(acpi_handle handle, int *pcpu)
union acpi_object *obj;
struct acpi_madt_local_sapic *lsapic;
cpumask_t tmp_map;
long physid;
int cpu;
int cpu, physid;

if (ACPI_FAILURE(acpi_evaluate_object(handle, "_MAT", NULL, &buffer)))
return -EINVAL;
Expand Down
4 changes: 2 additions & 2 deletions arch/ia64/kernel/cpufreq/acpi-cpufreq.c
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ processor_set_pstate (
retval = ia64_pal_set_pstate((u64)value);

if (retval) {
dprintk("Failed to set freq to 0x%x, with error 0x%x\n",
dprintk("Failed to set freq to 0x%x, with error 0x%lx\n",
value, retval);
return -ENODEV;
}
Expand All @@ -74,7 +74,7 @@ processor_get_pstate (

if (retval)
dprintk("Failed to get current freq with "
"error 0x%x, idx 0x%x\n", retval, *value);
"error 0x%lx, idx 0x%x\n", retval, *value);

return (int)retval;
}
Expand Down
Loading

0 comments on commit eb4225b

Please sign in to comment.