Commit graph

737480 commits

Author SHA1 Message Date
Andrey Ryabinin
0d39e2669d x86/kasan: Panic if there is not enough memory to boot
Currently KASAN doesn't panic in case it don't have enough memory
to boot. Instead, it crashes in some random place:

 kernel BUG at arch/x86/mm/physaddr.c:27!

 RIP: 0010:__phys_addr+0x268/0x276
 Call Trace:
  kasan_populate_shadow+0x3f2/0x497
  kasan_init+0x12e/0x2b2
  setup_arch+0x2825/0x2a2c
  start_kernel+0xc8/0x15f4
  x86_64_start_reservations+0x2a/0x2c
  x86_64_start_kernel+0x72/0x75
  secondary_startup_64+0xa5/0xb0

Use memblock_virt_alloc_try_nid() for allocations without failure
fallback. It will panic with an out of memory message.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: lkp@01.org
Link: https://lkml.kernel.org/r/20180110153602.18919-1-aryabinin@virtuozzo.com
2018-01-15 00:32:35 +01:00
Linus Torvalds
a8750ddca9 Linux 4.15-rc8 2018-01-14 15:32:30 -08:00
Linus Torvalds
aaae98a802 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixlet from Thomas Gleixner.

Remove a warning about lack of compiler support for retpoline that most
people can't do anything about, so it just annoys them needlessly.

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Remove compile time warning
2018-01-14 15:30:02 -08:00
Linus Torvalds
6bb821193b powerpc fixes for 4.15 #7
One fix for an oops at boot if we take a hotplug interrupt before we are ready
 to handle it.
 
 The bulk is patches to implement mitigation for Meltdown, see the change logs
 for more details.
 
 Thanks to:
   Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon Masters, Jose Ricardo
   Ziviani, David Gibson.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJaWXZ+ExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBe
 ew/+KMp80VAIGSyFeYHLg8oClnQjKcEDMzlNoHo8WU6NwwNzk2XXBXNaGT7Osg7B
 Ny69R3/VRrGmi/2uule9OzF2eZt8BUmRn3cHDcUB5OwFjMysv4M2jc+OK0STQxLU
 F769rX+ZlkyAvQn4UxtNCUob+JDfVbE5Lurrx475ZGL1YHk6QqBGhlN/niXKtkBz
 upeBEpu4JgwsLLD9H7c4QQha9PkhxY51jtvxL24cgwDuKJKhTdxLahPlA3QmLafu
 FJxR9R0YzTj9Ivuae5yM7xRKr5wROTPtTRbG9sN7XOgFNrQ+LSgemXGrlV/fzdhV
 6iPn2hHs4bu45SOIM+Hhf7OwHTtnMetXmKo6NldlKHVCw1HwBLigRohnOTPjiHfZ
 P6brR4T/cglakBiE29+8T0UFqOPizEJiQRHnsoWzwIa/aTqCGEE3m3rgTxU6ovt5
 3JggC51ZRDFsjGvj+JFnxGxCjYC4tDbfBI28M+GuWZZPBKLkFnEk8PlSzYyHPW2p
 5uVA/UN4KHVGYUfJVUC+LMP+ZIOSEyEISKk7RuK0C0mwHSXcvxbeEefXbc18Xjai
 J/pl/c9/4CRnEbdPm305xBxKKmU6gWxWRK3KbRFsoTI/wp+Ryx5TjPpgAa5GnaLv
 nrtrfKs8skWfGMoDWH7+KILt0fgRR2x+91GTSxc5aaHtYBI=
 =UZHv
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One fix for an oops at boot if we take a hotplug interrupt before we
  are ready to handle it.

  The bulk is patches to implement mitigation for Meltdown, see the
  change logs for more details.

  Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon
  Masters, Jose Ricardo Ziviani, David Gibson"

* tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Check device-tree for RFI flush settings
  powerpc/pseries: Query hypervisor for RFI flush settings
  powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
  powerpc/64s: Add support for RFI flush of L1-D cache
  powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
  powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
  powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
  powerpc/64s: Simple RFI macro conversions
  powerpc/64: Add macros for annotating the destination of rfid/hrfid
  powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
  powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
2018-01-14 15:03:17 -08:00
Daniel Borkmann
e3b073c7ca Merge branch 'bpf-nfp-map-offload'
Jakub Kicinski says:

====================
This set adds support for creating maps on networking devices.  BPF is
programs+maps, the pure program offload has been around for quite some
time, this patchset adds the map part of the equation.

Maps are allocated on the target device from the start.  There is no
host copy when map is created on the device.  Device maps are represented
by struct bpf_offloaded_map, regardless of type.  Host programs can't
access such maps, access is only possible from a program also loaded
to the same device and/or via the BPF syscall.

Offloaded programs are currently only allowed to perform lookups,
control plane is responsible for populating the maps.

For brevity only infrastructure and basic NFP patches are included.
Target device reporting, netdevsim and tests will follow up as well as
some further optimizations to the NFP code.

v2:
 - leave out the array maps, we will add them trivially later to avoid
   merge conflicts with ongoing spectere&meltdown mitigations.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:32 +01:00
Jakub Kicinski
1bba4c413a nfp: bpf: implement bpf map offload
Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:31 +01:00
Jakub Kicinski
3dd43c3319 nfp: bpf: add support for reading map memory
Map memory needs to use 40 bit addressing.  Add handling of such
accesses.  Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
77a3d3113b nfp: bpf: add verification and codegen for map lookups
Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.

New relocation types have to be added - for helpers and return
addresses.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
ce4ebfd859 nfp: bpf: add helpers for updating immediate instructions
Immediate loads are used to load the return address of a helper.
We need to be able to update those loads for relocations.
Immediate loads can be slightly more complex and spread over
two instructions in general, but here we only care about simple
loads of small (< 65k) constants, so complex cases are not handled.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
9d080d5da9 nfp: bpf: parse function call and map capabilities
Parse helper function and supported map FW TLV capabilities.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
ff3d43f756 nfp: bpf: implement helpers for FW map ops
Implement calls for FW map communication.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
d48ae231c5 nfp: bpf: add basic control channel communication
For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.

Control messages are tagged with a 16 bit ID.  Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
4da98eea79 nfp: bpf: add map data structure
To be able to split code into reasonable chunks we need to add
the map data structures already.  Later patches will add code
piece by piece.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
a38845729e bpf: offload: add map offload infrastructure
BPF map offload follow similar path to program offload.  At creation
time users may specify ifindex of the device on which they want to
create the map.  Map will be validated by the kernel's
.map_alloc_check callback and device driver will be called for the
actual allocation.  Map will have an empty set of operations
associated with it (save for alloc and free callbacks).  The real
device callbacks are kept in map->offload->dev_ops because they
have slightly different signatures.  Map operations are called in
process context so the driver may communicate with HW freely,
msleep(), wait() etc.

Map alloc and free callbacks are muxed via existing .ndo_bpf, and
are always called with rtnl lock held.  Maps and programs are
guaranteed to be destroyed before .ndo_uninit (i.e. before
unregister_netdev() returns).  Map callbacks are invoked with
bpf_devs_lock *read* locked, drivers must take care of exclusive
locking if necessary.

All offload-specific branches are marked with unlikely() (through
bpf_map_is_dev_bound()), given that branch penalty will be
negligible compared to IO anyway, and we don't want to penalize
SW path unnecessarily.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:30 +01:00
Jakub Kicinski
5bc2d55c6a bpf: offload: factor out netdev checking at allocation time
Add a helper to check if netdev could be found and whether it
has .ndo_bpf callback.  There is no need to check the callback
every time it's invoked, ndos can't reasonably be swapped for
a set without .ndp_bpf while program is loaded.

bpf_dev_offload_check() will also be used by map offload.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:29 +01:00
Jakub Kicinski
0a9c1991f2 bpf: rename bpf_dev_offload -> bpf_prog_offload
With map offload coming, we need to call program offload structure
something less ambiguous.  Pure rename, no functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:29 +01:00
Jakub Kicinski
bd475643d7 bpf: add helper for copying attrs to struct bpf_map
All map types reimplement the field-by-field copy of union bpf_attr
members into struct bpf_map.  Add a helper to perform this operation.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:29 +01:00
Jakub Kicinski
9328e0d1bc bpf: hashtab: move checks out of alloc function
Use the new callback to perform allocation checks for hash maps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:29 +01:00
Jakub Kicinski
daffc5a2e6 bpf: hashtab: move attribute validation before allocation
Number of attribute checks are currently performed after hashtab
is already allocated.  Move them to be able to split them out to
the check function later on.  Checks have to now be performed on
the attr union directly instead of the members of bpf_map, since
bpf_map will be allocated later.  No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:29 +01:00
Jakub Kicinski
1110f3a9bc bpf: add map_alloc_check callback
.map_alloc callbacks contain a number of checks validating user-
-provided map attributes against constraints of a particular map
type.  For offloaded maps we will need to check map attributes
without actually allocating any memory on the host.  Add a new
callback for validating attributes before any memory is allocated.
This callback can be selectively implemented by map types for
sharing code with offloads, or simply to separate the logical
steps of validation and allocation.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:36:29 +01:00
Thomas Gleixner
ed4bbf7910 timers: Unconditionally check deferrable base
When the timer base is checked for expired timers then the deferrable base
must be checked as well. This was missed when making the deferrable base
independent of base::nohz_active.

Fixes: ced6d5c11d ("timers: Use deferrable base independent of base::nohz_active")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Cc: rt@linutronix.de
2018-01-14 23:25:33 +01:00
Alexei Starovoitov
68fda450a7 bpf: fix 32-bit divide by zero
due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the check

Fixes: 622582786c ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14 23:05:33 +01:00
Thomas Gleixner
b8b9ce4b5a x86/retpoline: Remove compile time warning
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:

  It's wrong because it will just make people turn off RETPOLINE, and the
  asm updates - and return stack clearing - that are independent of the
  compiler are likely the most important parts because they are likely the
  ones easiest to target.

  And it's annoying because most people won't be able to do anything about
  it. The number of people building their own compiler? Very small. So if
  their distro hasn't got a compiler yet (and pretty much nobody does), the
  warning is just annoying crap.

  It is already properly reported as part of the sysfs interface. The
  compile-time warning only encourages bad things.

Fixes: 76b043848f ("x86/retpoline: Add initial retpoline support")
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
2018-01-14 22:29:36 +01:00
Jan Kiszka
a0c01e4bb9 x86/jailhouse: Initialize PCI support
With this change, PCI devices can be detected and used inside a non-root
cell.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/e8d19494b96b68a749bcac514795d864ad9c28c3.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:57 +01:00
Jan Kiszka
cf878e169d x86/jailhouse: Wire up IOAPIC for legacy UART ports
The typical I/O interrupts in non-root cells are MSI-based. However, the
platform UARTs do not support MSI. In order to run a non-root cell that
shall use one of them, the standard IOAPIC must be registered and 1:1
routing for IRQ 3 and 4 set up.

If an IOAPIC is not available, the boot loader clears standard_ioapic in
the setup data, so registration is skipped. If the guest is not allowed to
to use one of those pins, Jailhouse will simply ignore the access.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/90d942dda9d48a8046e00bb3c1bb6757c83227be.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:57 +01:00
Jan Kiszka
fd49807682 x86/jailhouse: Halt instead of failing to restart
Jailhouse provides no guest-initiated restart. So, do not even try to.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/ef8a0ef95c2b17c21066e5f28ea56b58bf7eaa82.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:57 +01:00
Jan Kiszka
5ae4443010 x86/jailhouse: Silence ACPI warning
Jailhouse support does not depend on ACPI, and does not even use it. But
if it should be enabled, avoid warning about its absence in the
platform.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/939687007cbd7643b02fd330e8616e7e5944063f.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
0d7c1e2218 x86/jailhouse: Avoid access of unsupported platform resources
Non-root cells do not have CMOS access, thus the warm reset cannot be
enabled. There is no RTC, thus also no wall clock. Furthermore, there
are no ISA IRQs and no PIC.

Also disable probing of i8042 devices that are typically blocked for
non-root cells. In theory, access could also be granted to a non-root
cell, provided the root cell is not using the devices. But there is no
concrete scenario in sight, and disabling probing over Jailhouse allows
to build generic kernels that keep CONFIG_SERIO enabled for use in
normal systems.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/39b68cc2c496501c9d95e6f40e5d76e3053c3908.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
e85eb632f6 x86/jailhouse: Set up timekeeping
Get the precalibrated frequencies for the TSC and the APIC timer from
the Jailhouse platform info and set the kernel values accordingly.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/b2557426332fc337a74d3141cb920f7dce9ad601.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
87e65d05bb x86/jailhouse: Enable PMTIMER
Jailhouse exposes the PMTIMER as only reference clock to all cells. Pick
up its address from the setup data. Allow to enable the Linux support of
it by relaxing its strict dependency on ACPI.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/6d5c3fadd801eb3fba9510e2d3db14a9c404a1a0.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:55 +01:00
Jan Kiszka
11c8dc419b x86/jailhouse: Enable APIC and SMP support
Register the APIC which Jailhouse always exposes at 0xfee00000 if in
xAPIC mode or via MSRs as x2APIC. The latter is only available if it was
already activated because there is no support for switching its mode
during runtime.

Jailhouse requires the APIC to be operated in phys-flat mode. Ensure
that this mode is selected by Linux.

The available CPUs are taken from the setup data structure that the
loader filled and registered with the kernel.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/8b2255da0a9856c530293a67aa9d6addfe102a2b.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:55 +01:00
Jan Kiszka
4a362601ba x86/jailhouse: Add infrastructure for running in non-root cell
The Jailhouse hypervisor is able to statically partition a multicore
system into multiple so-called cells. Linux is used as boot loader and
continues to run in the root cell after Jailhouse is enabled. Linux can
also run in non-root cells.

Jailhouse does not emulate usual x86 devices. It also provides no
complex ACPI but basic platform information that the boot loader
forwards via setup data. This adds the infrastructure to detect when
running in a non-root cell so that the platform can be configured as
required in succeeding steps.

Support is limited to x86-64 so far, primarily because no boot loader
stub exists for i386 and, thus, we wouldn't be able to test the 32-bit
path.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/7f823d077b38b1a70c526b40b403f85688c137d3.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:54 +01:00
Jan Kiszka
a09c5ec00a x86: Introduce and use MP IRQ trigger and polarity defines
MP_IRQDIR_* constants pointed in the right direction but remained unused so
far: It's cleaner to use symbolic values for the IRQ flags in the MP config
table. That also saves some comments.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/60809926663a1d38e2a5db47d020d6e2e7a70019.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:54 +01:00
Jan Kiszka
e348caef8b x86/platform: Control warm reset setup via legacy feature flag
Allow to turn off the setup of BIOS-managed warm reset via a new flag in
x86_legacy_features. Besides the UV1, the upcoming jailhose guest support
needs this switched off.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/44376558129d70a2c1527959811371ef4b82e829.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:53 +01:00
Jan Kiszka
32c9c801a8 x86/apic: Install an empty physflat_init_apic_ldr
As the comment already stated, there is no need for setting up LDR (and
DFR) in physflat mode as it remains unused (see SDM, 10.6.2.1).
flat_init_apic_ldr only served as a placeholder for a nop operation so
far, causing no harm.

That will change when running over the Jailhouse hypervisor. Here we
must not touch LDR in a way that destroys the mapping originally set up
by the Linux root cell. Jailhouse enforces this setting in order to
efficiently validate any IPI requests sent by a cell.

Avoid a needless clash caused by flat_init_apic_ldr by installing a true
nop handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/f9867d294cdae4d45ed89d3a2e6adb524f4f6794.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:53 +01:00
Max R. P. Grossmann
a9445e47d8 posix-cpu-timers: Make set_process_cpu_timer() more robust
Because the return value of cpu_timer_sample_group() is not checked,
compilers and static checkers can legitimately warn about a potential use
of the uninitialized variable 'now'. This is not a runtime issue as all call
sites hand in valid clock ids.

Also cpu_timer_sample_group() is invoked unconditionally even when the
result is not used because *oldval is NULL.

Make the invocation conditional and check the return value.

[ tglx: Massage changelog ]

Signed-off-by: Max R. P. Grossmann <m@max.pm>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john.stultz@linaro.org
Link: https://lkml.kernel.org/r/20180108190157.10048-1-m@max.pm
2018-01-14 20:50:59 +01:00
Peter Zijlstra
aa83c45762 x86/tsc: Introduce early tsc clocksource
Without TSC_KNOWN_FREQ the TSC clocksource is registered so late that the
kernel first switches to the HPET. Using HPET on large CPU count machines is
undesirable.

Therefore register a tsc-early clocksource using the preliminary tsc_khz
from quick calibration. Then when the final TSC calibration is done, it
can switch to the tuned frequency.

The only notably problem is that the real tsc clocksource must be marked
with CLOCK_SOURCE_VALID_FOR_HRES, otherwise it will not be selected when
unregistering tsc-early. tsc-early cannot be left registered, because then
the clocksource code would fall back to it when we tsc clocksource is
marked unstable later.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20171222092243.431585460@infradead.org
2018-01-14 20:18:23 +01:00
Peter Zijlstra
6d671e1b85 x86/time: Unconditionally register legacy timer interrupt
Even without a PIC/PIT the legacy timer interrupt is required for HPET in
legacy replacement mode.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Link: https://lkml.kernel.org/r/20171222092243.382623763@infradead.org
2018-01-14 20:18:23 +01:00
Peter Zijlstra
30c7e5b123 x86/tsc: Allow TSC calibration without PIT
Zhang Rui reported that a Surface Pro 4 will fail to boot with
lapic=notscdeadline. Part of the problem is that that machine doesn't have
a PIT.

If, for some reason, the TSC init has to fall back to TSC calibration, it
relies on the PIT to be present.

Allow TSC calibration to reliably fall back to HPET.

The below results in an accurate TSC measurement when forced on a IVB:

  tsc: Unable to calibrate against PIT
  tsc: No reference (HPET/PMTIMER) available
  tsc: Unable to calibrate against PIT
  tsc: using HPET reference calibration
  tsc: Detected 2792.451 MHz processor

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Link: https://lkml.kernel.org/r/20171222092243.333145937@infradead.org
2018-01-14 20:18:23 +01:00
Andi Kleen
327867faa4 x86/idt: Mark IDT tables __initconst
const variables must use __initconst, not __initdata.

Fix this up for the IDT tables, which got it consistently wrong.

Fixes: 16bc18d895 ("x86/idt: Move 32-bit idt_descr to C code")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171222001821.2157-7-andi@firstfloor.org
2018-01-14 20:09:45 +01:00
Andi Kleen
80a3e3949b x86/extable: Mark exception handler functions visible
Mark the C exception handler functions that are directly called through
exception tables visible. LTO needs to know they are accessed from assembler.

[ tglx: Mopped up the wrecked argument alignment. Sigh.... ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20171222001821.2157-6-andi@firstfloor.org
2018-01-14 20:04:16 +01:00
Andi Kleen
7cf1aaa2ad x86/timer: Don't inline __const_udelay
__const_udelay is marked inline, and LTO will happily inline it everywhere

Dropping the inline saves ~44k text in a LTO build.

13999560        1740864 1499136 17239560        1070e08 vmlinux-with-udelay-inline
13954764        1736768 1499136 17190668        1064f0c vmlinux-wo-udelay-inline

Inlining it has no advantage in general, so its the right thing to do.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20171222001821.2157-2-andi@firstfloor.org
2018-01-14 20:03:49 +01:00
Stephen Boyd
bb48711800 arm64: cpu_errata: Add Kryo to Falkor 1003 errata
The Kryo CPUs are also affected by the Falkor 1003 errata, so
we need to do the same workaround on Kryo CPUs. The MIDR is
slightly more complicated here, where the PART number is not
always the same when looking at all the bits from 15 to 4. Drop
the lower 8 bits and just look at the top 4 to see if it's '2'
and then consider those as Kryo CPUs. This covers all the
combinations without having to list them all out.

Fixes: 38fd94b027 ("arm64: Work around Falkor erratum 1003")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:52 +00:00
Steve Capper
0370b31e48 arm64: Extend early page table code to allow for larger kernels
Currently the early assembler page table code assumes that precisely
1xpgd, 1xpud, 1xpmd are sufficient to represent the early kernel text
mappings.

Unfortunately this is rarely the case when running with a 16KB granule,
and we also run into limits with 4KB granule when building much larger
kernels.

This patch re-writes the early page table logic to compute indices of
mappings for each level of page table, and if multiple indices are
required, the next-level page table is scaled up accordingly.

Also the required size of the swapper_pg_dir is computed at link time
to cover the mapping [KIMAGE_ADDR + VOFFSET, _end]. When KASLR is
enabled, an extra page is set aside for each level that may require extra
entries at runtime.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:52 +00:00
Steve Capper
1e1b8c04fa arm64: entry: Move the trampoline to be before PAN
The trampoline page tables are positioned after the early page tables in
the kernel linker script.

As we are about to change the early page table logic to resolve the
swapper size at link time as opposed to compile time, the
SWAPPER_DIR_SIZE variable (currently used to locate the trampline)
will be rendered unsuitable for low level assembler.

This patch solves this issue by moving the trampoline before the PAN
page tables. The offset to the trampoline from ttbr1 can then be
expressed by: PAGE_SIZE + RESERVED_TTBR0_SIZE, which is available to the
entry assembler.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:51 +00:00
Steve Capper
9dfe4828aa arm64: Re-order reserved_ttbr0 in linker script
Currently one resolves the location of the reserved_ttbr0 for PAN by
taking a positive offset from swapper_pg_dir. In a future patch we wish
to extend the swapper s.t. its size is determined at link time rather
than comile time, rendering SWAPPER_DIR_SIZE unsuitable for such a low
level calculation.

In this patch we re-arrange the order of the linker script s.t. instead
one computes reserved_ttbr0 by subtracting RESERVED_TTBR0_SIZE from
swapper_pg_dir.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:51 +00:00
James Morse
79e9aa59dc arm64: sdei: Add trampoline code for remapping the kernel
When CONFIG_UNMAP_KERNEL_AT_EL0 is set the SDEI entry point and the rest
of the kernel may be unmapped when we take an event. If this may be the
case, use an entry trampoline that can switch to the kernel page tables.

We can't use the provided PSTATE to determine whether to switch page
tables as we may have interrupted the kernel's entry trampoline, (or a
normal-priority event that interrupted the kernel's entry trampoline).
Instead test for a user ASID in ttbr1_el1.

Save a value in regs->addr_limit to indicate whether we need to restore
the original ASID when returning from this event. This value is only used
by do_page_fault(), which we don't call with the SDEI regs.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:50 +00:00
James Morse
83f8ee3a73 arm64: mmu: add the entry trampolines start/end section markers into sections.h
SDEI needs to calculate an offset in the trampoline page too. Move
the extern char[] to sections.h.

This patch just moves code around.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:50 +00:00
James Morse
677a60bd20 firmware: arm_sdei: Discover SDEI support via ACPI
SDEI defines a new ACPI table to indicate the presence of the interface.
The conduit is discovered in the same way as PSCI.

For ACPI we need to create the platform device ourselves as SDEI doesn't
have an entry in the DSDT.

The SDEI platform device should be created after ACPI has been initialised
so that we can parse the table, but before GHES devices are created, which
may register SDE events if they use SDEI as their notification type.

Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:49 +00:00
James Morse
fa31ab77ce arm64: acpi: Remove __init from acpi_psci_use_hvc() for use by SDEI
SDEI inherits the 'use hvc' bit that is also used by PSCI. PSCI does all
its initialisation early, SDEI does its late.

Remove the __init annotation from acpi_psci_use_hvc().

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-14 18:49:49 +00:00