Commit graph

753663 commits

Author SHA1 Message Date
Bjorn Helgaas
64ae499cf2 Merge branch 'pci/portdrv'
- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
    Lawler)

  - merge pcieport_if.h into portdrv.h (Bjorn Helgaas)

  - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
    Helgaas)

  - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)

  - remove portdrv link order dependency (Bjorn Helgaas)

  - remove support for unused VC portdrv service (Bjorn Helgaas)

  - simplify portdrv feature permission checking (Bjorn Helgaas)

  - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
    Helgaas)

  - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)

  - use cached AER capability offset (Frederick Lawler)

  - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)

  - rename pcie-dpc.c to dpc.c (Bjorn Helgaas)

* pci/portdrv:
  PCI/DPC: Rename from pcie-dpc.c to dpc.c
  PCI/DPC: Do not enable DPC if AER control is not allowed by the BIOS
  PCI/AER: Use cached AER Capability offset
  PCI/portdrv: Rename and reverse sense of pcie_ports_auto
  PCI/portdrv: Encapsulate pcie_ports_auto inside the port driver
  PCI/portdrv: Remove unnecessary "pcie_ports=auto" parameter
  PCI/portdrv: Remove "pcie_hp=nomsi" kernel parameter
  PCI/portdrv: Remove unnecessary include of <linux/pci-aspm.h>
  PCI/portdrv: Simplify PCIe feature permission checking
  PCI/portdrv: Remove unused PCIE_PORT_SERVICE_VC
  PCI/portdrv: Remove pcie_port_bus_type link order dependency
  PCI/portdrv: Disable port driver in compat mode
  PCI/PM: Clear PCIe PME Status bit for Root Complex Event Collectors
  PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver
  PCI/PM: Move pcie_clear_root_pme_status() to core
  PCI/portdrv: Merge pcieport_if.h into portdrv.h
  PCI/portdrv: Move pcieport_if.h to drivers/pci/pcie/

Conflicts:
	drivers/pci/pcie/Makefile
	drivers/pci/pcie/portdrv.h
2018-04-04 13:27:58 -05:00
Bjorn Helgaas
ac30aa5969 Merge branch 'pci/msi'
- don't set up INTx if MSI or MSI-X is enabled to align cris, frv, ia64,
    and mn10300 with x86 (Bjorn Helgaas)

* pci/msi:
  PCI/MSI: Don't set up INTx if MSI or MSI-X is enabled
2018-04-04 13:27:46 -05:00
Bjorn Helgaas
43b90eaed5 Merge branch 'pci/misc'
- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)

  - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
    (Shawn Lin)

  - report quirk timings with dev_info (Bjorn Helgaas)

  - report quirks that take longer than 10ms (Bjorn Helgaas)

  - add and use Altera Vendor ID (Johannes Thumshirn)

  - tidy Makefiles and comments (Bjorn Helgaas)

* pci/misc:
  PCI: Always define the of_node helpers
  PCI: Tidy comments
  PCI: Tidy Makefiles
  mcb: Add Altera PCI ID to mcb-pci
  PCI: Add Altera vendor ID
  PCI: Report quirks that take more than 10ms
  PCI: Report quirk timings with pci_info() instead of pr_debug()
  PCI: Fix NULL pointer dereference in of_pci_bus_find_domain_nr()
  rapidio/tsi721: use PCI_EXP_DEVCTL2_COMP_TIMEOUT macro
2018-04-04 13:27:45 -05:00
Bjorn Helgaas
3da1b6174b Merge branch 'pci/lpc'
- add support for PCI I/O port space that's neither directly accessible
    via CPU in/out instructions nor directly mapped into CPU physical
    memory space (Zhichang Yuan)

  - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
    John Garry)

* pci/lpc:
  MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
  HISI LPC: Add ACPI support
  ACPI / scan: Do not enumerate Indirect IO host children
  ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
  HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
  of: Add missing I/O range exception for indirect-IO devices
  PCI: Apply the new generic I/O management on PCI IO hosts
  PCI: Add fwnode handler as input param of pci_register_io_range()
  PCI: Remove __weak tag from pci_register_io_range()
  lib: Add generic PIO mapping method
2018-04-04 13:27:43 -05:00
Bjorn Helgaas
a5c6ad7840 Merge branch 'pci/hotplug'
- fix possible cpqphp NULL pointer dereference (Shawn Lin)

  - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
    hotplug (Mika Westerberg)

* pci/hotplug:
  ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
  PCI: cpqphp: Fix possible NULL pointer dereference
2018-04-04 13:27:42 -05:00
Bjorn Helgaas
315271b0ff Merge branch 'pci/enumeration'
- add decoding for 16 GT/s link speed (Jay Fang)

  - add interfaces to get max link speed and width (Tal Gilboa)

  - add pcie_bandwidth_capable() to compute max supported link bandwidth
    (Tal Gilboa)

  - add pcie_bandwidth_available() to compute bandwidth available to device
    (Tal Gilboa)

  - add pcie_print_link_status() to log link speed and whether it's limited
    (Tal Gilboa)

  - use PCI core interfaces to report when device performance may be
    limited by its slot instead of doing it in each driver (Tal Gilboa)

* pci/enumeration:
  fm10k: Report PCIe link properties with pcie_print_link_status()
  net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
  net/mlx5: Report PCIe link properties with pcie_print_link_status()
  net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
  PCI: Add pcie_print_link_status() to log link speed and whether it's limited
  PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
  PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth
  PCI: Add pcie_get_width_cap() to find max supported link width
  PCI: Add pcie_get_speed_cap() to find max supported link speed
  PCI: Add decoding for 16 GT/s link speed
2018-04-04 13:27:40 -05:00
Bjorn Helgaas
db7a726ea8 Merge branch 'pci/deprecate-get-bus-and-slot'
- remove last user of pci_get_bus_and_slot() and the function itself
    (Sinan Kaya)

* pci/deprecate-get-bus-and-slot:
  PCI: Remove pci_get_bus_and_slot() function
  drm/i915: Deprecate pci_get_bus_and_slot()
2018-04-04 13:27:39 -05:00
Bjorn Helgaas
09baca9841 Merge branch 'pci/aspm'
- skip ASPM common clock warning if BIOS already configured it (Sinan
    Kaya)

  - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)

* pci/aspm:
  PCI/ASPM: Don't warn if already in common clock mode
  PCI/ASPM: Declare threshold_ns as u32, not u64
2018-04-04 13:27:37 -05:00
Bjorn Helgaas
63d5ce5fc8 Merge branch 'pci/aer'
- move pci_uevent_ers() out of pci.h (Michael Ellerman)

* pci/aer:
  PCI/AER: Move pci_uevent_ers() out of pci.h
2018-04-04 13:27:36 -05:00
Ard Biesheuvel
730b35e08c dt-bindings: i2c: add binding for Socionext SynQuacer I2C
Add a binding for the I2C controller that can be found in the Socionext
SynQuacer SoC.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-04-04 20:23:37 +02:00
Matan Barak
2d93fc8569 IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
When a Raw Ethernet QP is created, we actually create a few objects.
One of these objects is a TIR. Currently, a TIR could hash (and spread
the traffic) by IP or port only. Adding a hashing by IPSec SPI to TIR
creation with the required UAPI bit.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Matan Barak
c03faa562d IB/mlx5: Add information for querying IPsec capabilities
Users should be able to query for IPSec support. Adding a few
capabilities bits as part of the driver specific part in
alloc_ucontext:
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_REQ_METADATA
	Payload's header is returned with metadata representing the
	IPSec decryption state.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_RX
	Support ESP_AES_GCM in ingress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_TX
	Support ESP_AES_GCM in egress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_SPI_RSS_ONLY
	Hardware doesn't support matching SPI in flow steering rules
	but just hashing and spreading the traffic accordingly.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Aviad Yehezkel
802c212568 IB/mlx5: Add IPsec support for egress and ingress
This commit introduces support for the esp_aes_gcm flow
specification for the Innova device. To that end we add
support for egress steering and some validations that an
IPsec rule is indeed valid.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Aviad Yehezkel
363c5a570d {net,IB}/mlx5: Add ipsec helper
Simple wrapper to understand if we are dealing with IPsec flow.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Matan Barak
349705c193 IB/mlx5: Add modify_flow_action_esp verb
Adding implementation in mlx5 driver to modify action_xfrm object. This
merely call the accel layer. Currently a user can modify only the
ESN parameters.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Aviad Yehezkel
c6475a0bca IB/mlx5: Add implementation for create and destroy action_xfrm
Adding implementation in mlx5 driver to create and destroy action_xfrm
object. This merely call the accel layer.

A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states
that [s]he expects a metadata header to be added to the payload. This
header represents information regarding the transformation's state.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Matan Barak
56ab0b38b8 IB/uverbs: Introduce ESP steering match filter
Adding a new ESP steering match filter that could match against
spi and seq used in IPSec protocol.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Matan Barak
7d12f8d5a1 IB/uverbs: Add modify ESP flow_action
flow_actions of ESP type could be modified during runtime. This could be
common for example when ESN should be changed. Adding a new
UVERBS_FLOW_ACTION_ESP_MODIFY method for changing ESP parameters of an
existing ESP flow_action.
The new method uses the UVERBS_FLOW_ACTION_ESP_CREATE attributes, but
adds a new IB_FLOW_ACTION_ESP_FLAGS_MOD_ESP_ATTRS which means ESP_ATTRS
should be changed.
In addition, we add a new FLOW_ACTION_ESP_REPLAY_NONE replay type that
could be used when one wants to disable a replay protection over a
specific flow_action.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Boris Pismenny
21e82d3e1d IB/uverbs: Introduce egress flow steering
The egress flag indicates that this flow steering rule is for egress
traffic. The scope of an egress rule is port-wide, meaning all packets
originated from that port, which match the steering rule specification
will be effected by this steering rule's action.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:25 -06:00
Matan Barak
9b82844197 IB/uverbs: Add action_handle flow steering specification
Binding a flow_action to flow steering rule requires using a new
specification. Therefore, adding such an IB_FLOW_SPEC_ACTION_HANDLE flow
specification.

Flow steering rules could use flow_action(s) and as of that we need to
avoid deleting flow_action(s) as long as they're being used.
Moreover, when the attached rules are deleted, action_handle reference
count should be decremented. Introducing a new mechanism of flow
resources to keep track on the attached action_handle(s). Later on, this
mechanism should be extended to other attached flow steering resources
like flow counters.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:25 -06:00
Matan Barak
2eb9beaee5 IB/uverbs: Add flow_action create and destroy verbs
A verbs application may receive and transmits packets using a data
path pipeline. Sometimes, the first stage in the receive pipeline or
the last stage in the transmit pipeline involves transforming a
packet, either in order to make it easier for later stages to process
it or to prepare it for transmission over the wire. Such transformation
could be stripping/encapsulating the packet (i.e. vxlan),
decrypting/encrypting it (i.e. ipsec), altering headers, doing some
complex FPGA changes, etc.

Some hardware could do such transformations without software data path
intervention at all. The flow steering API supports steering a
packet (either to a QP or dropping it) and some simple packet
immutable actions (i.e. tagging a packet). Complex actions, that may
change the packet, could bloat the flow steering API extensively.
Sometimes the same action should be applied to several flows.
In this case, it's easier to bind several flows to the same action and
modify it than change all matching flows.

Introducing a new flow_action object that abstracts any packet
transformation (out of a standard and well defined set of actions).
This flow_action object could be tied to a flow steering rule via a
new specification.

Currently, we support esp flow_action, which encrypts or decrypts a
packet according to the given parameters. However, we present a
flexible schema that could be used to other transformation actions tied
to flow rules.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:25 -06:00
Matan Barak
766d8551ad IB/uverbs: Refactor kern_spec_to_ib_spec_filter
The current implementation of kern_spec_to_ib_spec_filter, which takes
a uAPI based flow steering specification and creates the respective kernel
API flow steering structure, gets a ib_uverbs_flow_spec structure.
The new flow_action uAPI gets a match mask and filter from user-space
which aren't encoded in the flow steering's ib_uverbs_flow_spec structure.
Exporting the logic out of kern_spec_to_ib_spec_filter to get user-space
blobs rather than ib_uverbs_flow_spec structure.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:24 -06:00
Boris Pismenny
8510020d29 IB/mlx4: Check for egress flow steering
ConnectX3 doesn't support egress flow steering. Return an EOPNOTSUPP
error when such a flow is being created.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:24 -06:00
Matan Barak
494c5580aa IB/uverbs: Add enum attribute type to ioctl() interface
Methods sometimes need to get one attribute out of a group of
pre-defined attributes. This is an enum-like behavior. Since
this is a common requirement, we add a new ENUM attribute to the
generic uverbs ioctl() layer. This attribute is embedded in methods,
like any other attributes we currently have. ENUM attributes point to
an array of standard UVERBS_ATTR_PTR_IN. The user-space encodes the
enum's attribute id in the id field and the internal PTR_IN attr id in
the enum_data.elem_id field. This ENUM attribute could be shared by
several attributes and it can get UVERBS_ATTR_SPEC_F_MANDATORY flag,
stating this attribute must be supported by the kernel, like any other
attribute.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:24 -06:00
Matan Barak
8c84660bb4 IB/mlx5: Initialize the parsing tree root without the help of uverbs
In order to have a custom parsing tree, a provider driver needs to
assign its parsing tree to ib_device specs_tree field. Otherwise, the
uverbs client assigns a common default parsing tree for it.
In downstream patches, the mlx5_ib driver gains a custom parsing tree,
which contains both the common objects and a new flags field for the
UVERBS_FLOW_ACTION_ESP_CREATE command.
This patch makes mlx5_ib assign its own tree to specs_root, which
later on will be extended.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:23 -06:00
Mike Marshall
8e9ba5c48e Orangefs: documentation updates
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-04 14:05:48 -04:00
Paolo Bonzini
6089ae0bd5 kvm: selftests: add sync_regs_test
This includes the infrastructure to map the test into the guest and
run code from the test program inside a VM.

Signed-off-by: Ken Hofsass <hofsass@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 19:11:00 +02:00
Paolo Bonzini
783e9e5126 kvm: selftests: add API testing infrastructure
Testsuite contributed by Google and cleaned up by myself for
inclusion in Linux.

Signed-off-by: Ken Hofsass <hofsass@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 19:11:00 +02:00
Peng Hao
3140c156e9 kvm: x86: fix a compile warning
fix a "warning: no previous prototype".

Cc: stable@vger.kernel.org
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 19:10:29 +02:00
Wanpeng Li
6c86eedc20 KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
There is no easy way to force KVM to run an instruction through the emulator
(by design as that will expose the x86 emulator as a significant attack-surface).
However, we do wish to expose the x86 emulator in case we are testing it
(e.g. via kvm-unit-tests). Therefore, this patch adds a "force emulation prefix"
that is designed to raise #UD which KVM will trap and it's #UD exit-handler will
match "force emulation prefix" to run instruction after prefix by the x86 emulator.
To not expose the x86 emulator by default, we add a module parameter that should
be off by default.

A simple testcase here:

    #include <stdio.h>
    #include <string.h>

    #define HYPERVISOR_INFO 0x40000000

    #define CPUID(idx, eax, ebx, ecx, edx) \
        asm volatile (\
        "ud2a; .ascii \"kvm\"; cpuid" \
        :"=b" (*ebx), "=a" (*eax), "=c" (*ecx), "=d" (*edx) \
            :"0"(idx) );

    void main()
    {
        unsigned int eax, ebx, ecx, edx;
        char string[13];

        CPUID(HYPERVISOR_INFO, &eax, &ebx, &ecx, &edx);
        *(unsigned int *)(string + 0) = ebx;
        *(unsigned int *)(string + 4) = ecx;
        *(unsigned int *)(string + 8) = edx;

        string[12] = 0;
        if (strncmp(string, "KVMKVMKVM\0\0\0", 12) == 0)
            printf("kvm guest\n");
        else
            printf("bare hardware\n");
    }

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Correctly handle usermode exits. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 19:09:40 +02:00
Wanpeng Li
082d06edab KVM: X86: Introduce handle_ud()
Introduce handle_ud() to handle invalid opcode, this function will be
used by later patches.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 19:03:58 +02:00
Paolo Bonzini
4fde8d57cf KVM: vmx: unify adjacent #ifdefs
vmx_save_host_state has multiple ifdefs for CONFIG_X86_64 that have
no other code between them.  Simplify by reducing them to a single
conditional.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 18:58:59 +02:00
Arnd Bergmann
51e8a8cc2f x86: kvm: hide the unused 'cpu' variable
The local variable was newly introduced but is only accessed in one
place on x86_64, but not on 32-bit:

arch/x86/kvm/vmx.c: In function 'vmx_save_host_state':
arch/x86/kvm/vmx.c:2175:6: error: unused variable 'cpu' [-Werror=unused-variable]

This puts it into another #ifdef.

Fixes: 35060ed6a1 ("x86/kvm/vmx: avoid expensive rdmsr for MSR_GS_BASE")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 18:57:40 +02:00
Arnd Bergmann
87248d31d1 netdevsim: remove incorrect __net_initdata annotations
The __net_initdata section cannot currently be used for structures that
get cleaned up in an exitcall using unregister_pernet_operations:

WARNING: vmlinux.o(.text+0x868c34): Section mismatch in reference from the function nsim_devlink_exit() to the (unknown reference) .init.data:(unknown)
The function nsim_devlink_exit() references
the (unknown reference) __initdata (unknown).
This is often because nsim_devlink_exit lacks a __initdata
annotation or the annotation of (unknown) is wrong.
WARNING: vmlinux.o(.text+0x868c64): Section mismatch in reference from the function nsim_devlink_init() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x8692bc): Section mismatch in reference from the function nsim_fib_exit() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x869300): Section mismatch in reference from the function nsim_fib_init() to the (unknown reference) .init.data:(unknown)

As that warning tells us, discarding the structure after a module is
loaded would lead to a undefined behavior when that module is removed.

It might be possible to change that annotation so it has no effect for
loadable modules, but I have not figured out exactly how to do that, and
we want this to be fixed in -rc1.

This just removes the annotations, just like we do for all other such
modules.

Fixes: 37923ed6b8 ("netdevsim: Add simple FIB resource controller via devlink")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 12:53:37 -04:00
Mike Snitzer
5bd5e8d891 dm: remove fmode_t argument from .prepare_ioctl hook
Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.

All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04 12:12:39 -04:00
Mike Snitzer
971888c469 dm: hold DM table for duration of ioctl rather than use blkdev_get
Commit 519049afea ("dm: use blkdev_get rather than bdgrab when issuing
pass-through ioctl") inadvertantly introduced a regression relative to
users of device cgroups that issue ioctls (e.g. libvirt).  Using
blkdev_get() in DM's passthrough ioctl support implicitly introduced a
cgroup permissions check that would fail unless care were taken to add
all devices in the IO stack to the device cgroup.  E.g. rather than just
adding the top-level DM multipath device to the cgroup all the
underlying devices would need to be allowed.

Fix this, to no longer require allowing all underlying devices, by
simply holding the live DM table (which includes the table's original
blkdev_get() reference on the blockdevice that the ioctl will be issued
to) for the duration of the ioctl.

Also, bump the DM ioctl version so a user can know that their device
cgroup allow workaround is no longer needed.

Reported-by: Michal Privoznik <mprivozn@redhat.com>
Suggested-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Cc: stable@vger.kernel.org # 4.16
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04 12:12:38 -04:00
Heinz Mauelshagen
13bc62d4a6 dm raid: fix parse_raid_params() variable range issue
parse_raid_params() compares variable "int value" with INT_MAX.

E.g. related Coverity report excerpt:
   CID 1364818 (#2 of 3): Operands don't affect result (CONSTANT_EXPRESSION_RESULT) [select issue]
1433                        if (value > INT_MAX) {

Fix by changing checks to avoid INT_MAX.

Whilst on it, avoid unnecessary checks against constants
and add check for sane recovery speed min/max.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04 12:12:37 -04:00
weiyongjun (A)
d4b1aaf53c dm verity: make verity_for_io_block static
Fixes the following sparse warning:

drivers/md/dm-verity-target.c:375:6: warning:
 symbol 'verity_for_io_block' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04 12:12:36 -04:00
Bert Kenward
458bd99e49 sfc: remove ctpio_dmabuf_start from stats
The ctpio_dmabuf_start entry is not actually a stat and shouldn't
be exposed to ethtool.

Fixes: 2c0b6ee837 ("sfc: expose CTPIO stats on NICs that support them")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 12:07:22 -04:00
Eric Dumazet
3d23401283 inet: frags: fix ip6frag_low_thresh boundary
Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.

ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.

Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.

Fixes: 6e00f7dd5e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 12:04:59 -04:00
Sean Christopherson
c75d0edc8e KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Remove the WARN_ON in handle_ept_misconfig() as it is unnecessary
and causes false positives.  Return the unmodified result of
kvm_mmu_page_fault() instead of converting a system error code to
KVM_EXIT_UNKNOWN so that userspace sees the error code of the
actual failure, not a generic "we don't know what went wrong".

  * kvm_mmu_page_fault() will WARN if reserved bits are set in the
    SPTEs, i.e. it covers the case where an EPT misconfig occurred
    because of a KVM bug.

  * The WARN_ON will fire on any system error code that is hit while
    handling the fault, e.g. -ENOMEM from mmu_topup_memory_caches()
    while handling a legitmate MMIO EPT misconfig or -EFAULT from
    kvm_handle_bad_page() if the corresponding HVA is invalid.  In
    either case, userspace should receive the original error code
    and firing a warning is incorrect behavior as KVM is operating
    as designed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 18:00:40 +02:00
Sean Christopherson
2c151b2544 Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
The bug that led to commit 95e057e258
was a benign warning (no adverse affects other than the warning
itself) that was detected by syzkaller.  Further inspection shows
that the WARN_ON in question, in handle_ept_misconfig(), is
unnecessary and flawed (this was also briefly discussed in the
original patch: https://patchwork.kernel.org/patch/10204649).

  * The WARN_ON is unnecessary as kvm_mmu_page_fault() will WARN
    if reserved bits are set in the SPTEs, i.e. it covers the case
    where an EPT misconfig occurred because of a KVM bug.

  * The WARN_ON is flawed because it will fire on any system error
    code that is hit while handling the fault, e.g. -ENOMEM can be
    returned by mmu_topup_memory_caches() while handling a legitmate
    MMIO EPT misconfig.

The original behavior of returning -EFAULT when userspace munmaps
an HVA without first removing the memslot is correct and desirable,
i.e. KVM is letting userspace know it has generated a bad address.
Returning RET_PF_EMULATE masks the WARN_ON in the EPT misconfig path,
but does not fix the underlying bug, i.e. the WARN_ON is bogus.

Furthermore, returning RET_PF_EMULATE has the unwanted side effect of
causing KVM to attempt to emulate an instruction on any page fault
with an invalid HVA translation, e.g. a not-present EPT violation
on a VM_PFNMAP VMA whose fault handler failed to insert a PFN.

  * There is no guarantee that the fault is directly related to the
    instruction, i.e. the fault could have been triggered by a side
    effect memory access in the guest, e.g. while vectoring a #DB or
    writing a tracing record.  This could cause KVM to effectively
    mask the fault if KVM doesn't model the behavior leading to the
    fault, i.e. emulation could succeed and resume the guest.

  * If emulation does fail, KVM will return EMULATION_FAILED instead
    of -EFAULT, which is a red herring as the user will either debug
    a bogus emulation attempt or scratch their head wondering why we
    were attempting emulation in the first place.

TL;DR: revert to returning -EFAULT and remove the bogus WARN_ON in
handle_ept_misconfig in a future patch.

This reverts commit 95e057e258.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 18:00:36 +02:00
GhantaKrishnamurthy MohanKrishna
4b2e6877b8 tipc: Fix namespace violation in tipc_sk_fill_sock_diag
To fetch UID info for socket diagnostics, we determine the
namespace of user context using tipc socket instance. This
may cause namespace violation, as the kernel will remap based
on UID.

We fix this by fetching namespace info using the calling userspace
netlink socket.

Fixes: c30b70deb5 (tipc: implement socket diagnostics for AF_TIPC)
Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 11:54:35 -04:00
Paolo Abeni
9e8445a56c net: avoid unneeded atomic operation in ip*_append_data()
After commit 694aba690d ("ipv4: factorize sk_wmem_alloc updates
done by __ip_append_data()") and commit 1f4c6eb240 ("ipv6:
factorize sk_wmem_alloc updates done by __ip6_append_data()"),
when transmitting sub MTU datagram, an addtional, unneeded atomic
operation is performed in ip*_append_data() to update wmem_alloc:
in the above condition the delta is 0.

The above cause small but measurable performance regression in UDP
xmit tput test with packet size below MTU.

This change avoids such overhead updating wmem_alloc only if
wmem_alloc_delta is non zero.

The error path is left intentionally unmodified: it's a slow path
and simplicity is preferred to performances.

Fixes: 694aba690d ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()")
Fixes: 1f4c6eb240 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 11:53:08 -04:00
Stefan Fritsch
29916968c4 kvm: Add emulation for movups/movupd
This is very similar to the aligned versions movaps/movapd.

We have seen the corresponding emulation failures with openbsd as guest
and with Windows 10 with intel HD graphics pass through.

Signed-off-by: Christian Ehrhardt <christian_ehrhardt@genua.de>
Signed-off-by: Stefan Fritsch <sf@sfritsch.de>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04 17:52:46 +02:00
Sean Christopherson
add5ff7a21 KVM: VMX: raise internal error for exception during invalid protected mode state
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
an exception in Protected Mode while emulating guest due to invalid
guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
in PM, i.e. PM exceptions are always injected via the VMCS.  Because
we will never do VMRESUME due to emulation_required, the exception is
never realized and we'll keep emulating the faulting instruction over
and over until we receive a signal.

Exit to userspace iff there is a pending exception, i.e. don't exit
simply on a requested event. The purpose of this check and exit is to
aid in debugging a guest that is in all likelihood already doomed.
Invalid guest state in PM is extremely limited in normal operation,
e.g. it generally only occurs for a few instructions early in BIOS,
and any exception at this time is all but guaranteed to be fatal.
Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
handled/emulated, while checking for vectored interrupts, e.g. INTR
and NMI, without hitting false positives would add a fair amount of
complexity for almost no benefit (getting hit by lightning seems
more likely than encountering this specific scenario).

Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
exception via the VMCS and emulation_required is true.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-04-04 17:51:55 +02:00
Arnd Bergmann
2a37ce25d9 nvmem: disallow modular CONFIG_NVMEM
The new of_get_nvmem_mac_address() helper function causes a link error
with CONFIG_NVMEM=m:

drivers/of/of_net.o: In function `of_get_nvmem_mac_address':
of_net.c:(.text+0x168): undefined reference to `of_nvmem_cell_get'
of_net.c:(.text+0x19c): undefined reference to `nvmem_cell_read'
of_net.c:(.text+0x1a8): undefined reference to `nvmem_cell_put'

I could not come up with a good solution for this, as the code is always
built-in. Using an #if IS_REACHABLE() check around it would solve the
link time issue but then stop it from working in that configuration.
Making of_nvmem_cell_get() an inline function could also solve that, but
seems a bit ugly since it's somewhat larger than most inline functions,
and it would just bring that problem into the callers.  Splitting the
function into a separate file might be an alternative.

This uses the big hammer by making CONFIG_NVMEM itself a 'bool' symbol,
which avoids the problem entirely but makes the vmlinux larger for anyone
that might use NVMEM support but doesn't need it built-in otherwise.

Fixes: 9217e566bd ("of_net: Implement of_get_nvmem_mac_address helper")
Cc: Mike Looijmans <mike.looijmans@topic.nl>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mike Looijmans
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 11:47:50 -04:00
Tan Xiaojun
48d154e7f5 net: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length is u16, it will overflow. So change it
to u32.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 11:46:14 -04:00
Dirk van der Merwe
1489bbd10e nfp: use full 40 bits of the NSP buffer address
The NSP default buffer is a piece of NFP memory where additional
command data can be placed.  Its format has been copied from
host buffer, but the PCIe selection bits do not make sense in
this case.  If those get masked out from a NFP address - writes
to random place in the chip memory may be issued and crash the
device.

Even in the general NSP buffer case, it doesn't make sense to have the
PCIe selection bits there anymore. These are unused at the moment, and
when it becomes necessary, the PCIe selection bits should rather be
moved to another register to utilise more bits for the buffer address.

This has never been an issue because the buffer used to be
allocated in memory with less-than-38-bit-long address but that
is about to change.

Fixes: 1a64821c6a ("nfp: add support for service processor access")
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 11:45:24 -04:00
Alexander Graf
92571a1aae lan78xx: Connect phy early
When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:

    Unable to handle kernel NULL pointer dereference at virtual address 0000039c
    pgd = ffff800035b30000
    [0000039c] *pgd=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    Modules linked in: [...]
    Supported: Yes
    CPU: 3 PID: 638 Comm: wickedd Tainted: G            E      4.12.14-0-default #1
    Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
    task: ffff800035e74180 task.stack: ffff800036718000
    PC is at phy_ethtool_ksettings_get+0x20/0x98
    LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
    pc : [<ffff0000086f7f30>] lr : [<ffff000000dcca84>] pstate: 20000005
    sp : ffff80003671bb20
    x29: ffff80003671bb20 x28: ffff800035e74180
    x27: ffff000008912000 x26: 000000000000001d
    x25: 0000000000000124 x24: ffff000008f74d00
    x23: 0000004000114809 x22: 0000000000000000
    x21: ffff80003671bbd0 x20: 0000000000000000
    x19: ffff80003671bbd0 x18: 000000000000040d
    x17: 0000000000000001 x16: 0000000000000000
    x15: 0000000000000000 x14: ffffffffffffffff
    x13: 0000000000000000 x12: 0000000000000020
    x11: 0101010101010101 x10: fefefefefefefeff
    x9 : 7f7f7f7f7f7f7f7f x8 : fefefeff31677364
    x7 : 0000000080808080 x6 : ffff80003671bc9c
    x5 : ffff80003671b9f8 x4 : ffff80002c296190
    x3 : 0000000000000000 x2 : 0000000000000000
    x1 : ffff80003671bbd0 x0 : ffff80003671bc00
    Process wickedd (pid: 638, stack limit = 0xffff800036718000)
    Call trace:
    Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
    b9e0: ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
    ba00: ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
    ba20: fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
    ba40: 0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
    ba60: 0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
    ba80: 0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
    baa0: ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
    bac0: ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
    bae0: ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
    bb00: 0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
    [<ffff0000086f7f30>] phy_ethtool_ksettings_get+0x20/0x98
    [<ffff000000dcca84>] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
    [<ffff0000087cbc40>] ethtool_get_settings+0x68/0x210
    [<ffff0000087cc0d4>] dev_ethtool+0x214/0x2180
    [<ffff0000087e5008>] dev_ioctl+0x400/0x630
    [<ffff00000879dd00>] sock_do_ioctl+0x70/0x88
    [<ffff00000879f5f8>] sock_ioctl+0x208/0x368
    [<ffff0000082cde10>] do_vfs_ioctl+0xb0/0x848
    [<ffff0000082ce634>] SyS_ioctl+0x8c/0xa8
    Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
    bec0: 0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
    bee0: 0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
    bf00: 000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
    bf20: 0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
    bf40: 0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
    bf60: 0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
    bf80: 0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
    bfa0: 0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
    bfc0: 0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
    bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.

The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.

With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-04 11:41:03 -04:00