Commit graph

781637 commits

Author SHA1 Message Date
Toshiaki Makita
af87a3aa1b veth: Add ndo_xdp_xmit
This allows NIC's XDP to redirect packets to veth. The destination veth
device enqueues redirected packets to the napi ring of its peer, then
they are processed by XDP on its peer veth device.
This can be thought as calling another XDP program by XDP program using
REDIRECT, when the peer enables driver XDP.

Note that when the peer veth device does not set driver xdp, redirected
packets will be dropped because the peer is not ready for NAPI.

v4:
- Don't use xdp_ok_fwd_dev() because checking IFF_UP is not necessary.
  Add comments about it and check only MTU.

v2:
- Drop the part converting xdp_frame into skb when XDP is not enabled.
- Implement bulk interface of ndo_xdp_xmit.
- Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:12:21 +02:00
Toshiaki Makita
9fc8d518d9 veth: Handle xdp_frames in xdp napi ring
This is preparation for XDP TX and ndo_xdp_xmit.
This allows napi handler to handle xdp_frames through xdp ring as well
as sk_buff.

v8:
- Don't use xdp_frame pointer address to calculate skb->head and
  headroom.

v7:
- Use xdp_scrub_frame() instead of memset().

v3:
- Revert v2 change around rings and use a flag to differentiate skb and
  xdp_frame, since bulk skb xmit makes little performance difference
  for now.

v2:
- Use another ring instead of using flag to differentiate skb and
  xdp_frame. This approach makes bulk skb transmit possible in
  veth_xmit later.
- Clear xdp_frame feilds in skb->head.
- Implement adjust_tail.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:12:21 +02:00
Toshiaki Makita
a8d5b4ab35 xdp: Helper function to clear kernel pointers in xdp_frame
xdp_frame has kernel pointers which should not be readable from bpf
programs. When we want to reuse xdp_frame region but it may be read by
bpf programs later, we can use this helper to clear kernel pointers.
This is more efficient than calling memset() for the entire struct.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:12:20 +02:00
Toshiaki Makita
dc2248220a veth: Avoid drops by oversized packets when XDP is enabled
Oversized packets including GSO packets can be dropped if XDP is
enabled on receiver side, so don't send such packets from peer.

Drop TSO and SCTP fragmentation features so that veth devices themselves
segment packets with XDP enabled. Also cap MTU accordingly.

v4:
- Don't auto-adjust MTU but cap max MTU.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:12:20 +02:00
Toshiaki Makita
948d4f214f veth: Add driver XDP
This is the basic implementation of veth driver XDP.

Incoming packets are sent from the peer veth device in the form of skb,
so this is generally doing the same thing as generic XDP.

This itself is not so useful, but a starting point to implement other
useful veth XDP features like TX and REDIRECT.

This introduces NAPI when XDP is enabled, because XDP is now heavily
relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function
enqueues packets to the ring and peer NAPI handler drains the ring.

Currently only one ring is allocated for each veth device, so it does
not scale on multiqueue env. This can be resolved by allocating rings
on the per-queue basis later.

Note that NAPI is not used but netif_rx is used when XDP is not loaded,
so this does not change the default behaviour.

v6:
- Check skb->len only when allocation is needed.
- Add __GFP_NOWARN to alloc_page() as it can be triggered by external
  events.

v3:
- Fix race on closing the device.
- Add extack messages in ndo_bpf.

v2:
- Squashed with the patch adding NAPI.
- Implement adjust_tail.
- Don't acquire consumer lock because it is guarded by NAPI.
- Make poll_controller noop since it is unnecessary.
- Register rxq_info on enabling XDP rather than on opening the device.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:12:20 +02:00
Toshiaki Makita
b0768a8658 net: Export skb_headers_offset_update
This is needed for veth XDP which does skb_copy_expand()-like operation.

v2:
- Drop skb_copy_header part because it has already been exported now.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:12:20 +02:00
Daniel Borkmann
c4c2021754 Merge branch 'bpf-sample-cpumap-lb'
Jesper Dangaard Brouer says:

====================
Background: cpumap moves the SKB allocation out of the driver code,
and instead allocate it on the remote CPU, and invokes the regular
kernel network stack with the newly allocated SKB.

The idea behind the XDP CPU redirect feature, is to use XDP as a
load-balancer step in-front of regular kernel network stack.  But the
current sample code does not provide a good example of this.  Part of
the reason is that, I have implemented this as part of Suricata XDP
load-balancer.

Given this is the most frequent feature request I get.  This patchset
implement the same XDP load-balancing as Suricata does, which is a
symmetric hash based on the IP-pairs + L4-protocol.

The expected setup for the use-case is to reduce the number of NIC RX
queues via ethtool (as XDP can handle more per core), and via
smp_affinity assign these RX queues to a set of CPUs, which will be
handling RX packets.  The CPUs that runs the regular network stack is
supplied to the sample xdp_redirect_cpu tool by specifying
the --cpu option multiple times on the cmdline.

I do note that cpumap SKB creation is not feature complete yet, and
more work is coming.  E.g. given GRO is not implemented yet, do expect
TCP workloads to be slower.  My measurements do indicate UDP workloads
are faster.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:07:51 +02:00
Jesper Dangaard Brouer
1bca4e6b18 samples/bpf: xdp_redirect_cpu load balance like Suricata
This implement XDP CPU redirection load-balancing across available
CPUs, based on the hashing IP-pairs + L4-protocol.  This equivalent to
xdp-cpu-redirect feature in Suricata, which is inspired by the
Suricata 'ippair' hashing code.

An important property is that the hashing is flow symmetric, meaning
that if the source and destination gets swapped then the selected CPU
will remain the same.  This is helps locality by placing both directions
of a flows on the same CPU, in a forwarding/routing scenario.

The hashing INITVAL (15485863 the 10^6th prime number) was fairly
arbitrary choosen, but experiments with kernel tree pktgen scripts
(pktgen_sample04_many_flows.sh +pktgen_sample05_flow_per_thread.sh)
showed this improved the distribution.

This patch also change the default loaded XDP program to be this
load-balancer.  As based on different user feedback, this seems to be
the expected behavior of the sample xdp_redirect_cpu.

Link: 796ec08dd7
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:07:49 +02:00
Jesper Dangaard Brouer
1139568658 samples/bpf: add Paul Hsieh's (LGPL 2.1) hash function SuperFastHash
Adjusted function call API to take an initval. This allow the API
user to set the initial value, as a seed. This could also be used for
inputting the previous hash.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:07:49 +02:00
Björn Töpel
eb91e4d4db Revert "xdp: add NULL pointer check in __xdp_return()"
This reverts commit 36e0f12bbf.

The reverted commit adds a WARN to check against NULL entries in the
mem_id_ht rhashtable. Any kernel path implementing the XDP (generic or
driver) fast path is required to make a paired
xdp_rxq_info_reg/xdp_rxq_info_unreg call for proper function. In
addition, a driver using a different allocation scheme than the
default MEM_TYPE_PAGE_SHARED is required to additionally call
xdp_rxq_info_reg_mem_model.

For MEM_TYPE_ZERO_COPY, an xdp_rxq_info_reg_mem_model call ensures
that the mem_id_ht rhashtable has a properly inserted allocator id. If
not, this would be a driver bug. A NULL pointer kernel OOPS is
preferred to the WARN.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-10 16:06:48 +02:00
Michael Ellerman
f7a6947cd4 powerpc/uaccess: Enable get_user(u64, *p) on 32-bit
Currently if you build a 32-bit powerpc kernel and use get_user() to
load a u64 value it will fail to build with eg:

  kernel/rseq.o: In function `rseq_get_rseq_cs':
  kernel/rseq.c:123: undefined reference to `__get_user_bad'

This is hitting the check in __get_user_size() that makes sure the
size we're copying doesn't exceed the size of the destination:

  #define __get_user_size(x, ptr, size, retval)
  do {
  	retval = 0;
  	__chk_user_ptr(ptr);
  	if (size > sizeof(x))
  		(x) = __get_user_bad();

Which doesn't immediately make sense because the size of the
destination is u64, but it's not really, because __get_user_check()
etc. internally create an unsigned long and copy into that:

  #define __get_user_check(x, ptr, size)
  ({
  	long __gu_err = -EFAULT;
  	unsigned long  __gu_val = 0;

The problem being that on 32-bit unsigned long is not big enough to
hold a u64. We can fix this with a trick from hpa in the x86 code, we
statically check the type of x and set the type of __gu_val to either
unsigned long or unsigned long long.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:26:10 +10:00
Aneesh Kumar K.V
f405b510c9 powerpc/mm/hash: Remove unnecessary do { } while(0) loop
Avoid coverity false warnings like:

  *** CID 187347:  Control flow issues  (UNREACHABLE)
  /arch/powerpc/mm/hash_native_64.c: 819 in native_flush_hash_range()
  813        slot += hidx & _PTEIDX_GROUP_IX;
  814        hptep = htab_address + slot;
  815        want_v = hpte_encode_avpn(vpn, psize, ssize);
  816        hpte_v = hpte_get_old_v(hptep);
  817
  818        if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
  >>>     CID 187347:  Control flow issues  (UNREACHABLE)

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:25:57 +10:00
Nicholas Piggin
e7e8184747 powerpc/64s: move machine check SLB flushing to mm/slb.c
The machine check code that flushes and restores bolted segments in
real mode belongs in mm/slb.c. This will also be used by pseries
machine check and idle code in future changes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:39 +10:00
Aneesh Kumar K.V
ae24ce5e12 powerpc/powernv/idle: Fix build error
Fix the below build error using strlcpy instead of strncpy

In function 'pnv_parse_cpuidle_dt',
    inlined from 'pnv_init_idle_states' at arch/powerpc/platforms/powernv/idle.c:840:7,
    inlined from '__machine_initcall_powernv_pnv_init_idle_states' at arch/powerpc/platforms/powernv/idle.c:870:1:
arch/powerpc/platforms/powernv/idle.c:820:3: error: 'strncpy' specified bound 16 equals destination size [-Werror=stringop-truncation]
   strncpy(pnv_idle_states[i].name, temp_string[i],
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PNV_IDLE_NAME_LEN);

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:39 +10:00
Aneesh Kumar K.V
0b6aa1a20a powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range
This patch makes sure we update the mmu_gather page size even if we are
requesting for a fullmm flush. This avoids triggering VM_WARN_ON in code
paths like __tlb_remove_page_size that explicitly check for removing range page
size to be same as mmu gather page size.

Fixes: 5a6099346c ("powerpc/64s/radix: tlb do not flush on page size when fullmm")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:39 +10:00
Mathieu Malaterre
fce278af81 powerpc/mm: remove warning about ‘type’ being set
‘type’ is only used when CONFIG_DEBUG_HIGHMEM is set. So add a possibly
unused tag to variable. Remove warning treated as error with W=1:

  arch/powerpc/mm/highmem.c:59:6: error: variable ‘type’ set but not used [-Werror=unused-but-set-variable]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:38 +10:00
Mathieu Malaterre
f2c6d0d109 powerpc/32: Include setup.h header file to fix warnings
Make sure to include setup.h to provide the following prototypes:

  - irqstack_early_init
  - setup_power_save
  - initialize_cache_info

Fix the following warnings (treated as error in W=1):

  arch/powerpc/kernel/setup_32.c:198:13: error: no previous prototype for ‘irqstack_early_init’
  arch/powerpc/kernel/setup_32.c:238:13: error: no previous prototype for ‘setup_power_save’
  arch/powerpc/kernel/setup_32.c:253:13: error: no previous prototype for ‘initialize_cache_info’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:38 +10:00
Mathieu Malaterre
eab00a208e powerpc: Move path variable inside DEBUG_PROM
Add gcc attribute unused for two variables. Fix warnings treated as errors
with W=1:

  arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not used [-Werror=unused-but-set-variable]

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:38 +10:00
Mathieu Malaterre
618a89d738 powerpc/powermac: Make some functions static
These functions can all be static, make it so. Fix warnings treated as
errors with W=1:

  arch/powerpc/platforms/powermac/pci.c:1022:6: error: no previous prototype for ‘pmac_pci_fixup_ohci’
  arch/powerpc/platforms/powermac/pci.c:1057:6: error: no previous prototype for ‘pmac_pci_fixup_cardbus’
  arch/powerpc/platforms/powermac/pci.c:1094:6: error: no previous prototype for ‘pmac_pci_fixup_pciata’

Remove has_address declaration and assignment since it's not used.
Also add gcc attribute unused to fix a warning treated as error with
W=1:

  arch/powerpc/platforms/powermac/pci.c:784:19: error: variable ‘has_address’ set but not used
  arch/powerpc/platforms/powermac/pci.c:907:22: error: variable ‘ht’ set but not used

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:37 +10:00
Mathieu Malaterre
8921305c1e powerpc/powermac: Remove variable x that's never read
Since the value of x is never intended to be read, remove it. Fix
warning treated as error with W=1:

  arch/powerpc/platforms/powermac/udbg_scc.c:76:9: error: variable ‘x’ set but not used

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:37 +10:00
Mathieu Malaterre
e4ecafb14f cxl: remove a dead branch
In commit 14baf4d9c7 ("cxl: Add guest-specific code") the following code
was added:

	if (afu->crs_len < 0) {
		dev_err(&afu->dev, "Unexpected configuration record size value\n");
		return -EINVAL;
	}

However the variable `crs_len` is of type u64 and cannot be compared < 0.
Remove the dead code section. Fix the following warning treated as error
with W=1:

../drivers/misc/cxl/guest.c:919:19: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:36 +10:00
Mathieu Malaterre
2fff0f07b8 powerpc/powermac: Add missing include of header pmac.h
The header `pmac.h` was not included, leading to the following warnings,
treated as error with W=1:

  arch/powerpc/platforms/powermac/time.c:69:13: error: no previous prototype for ‘pmac_time_init’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:207:15: error: no previous prototype for ‘pmac_get_boot_time’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:222:6: error: no previous prototype for ‘pmac_get_rtc_time’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:240:5: error: no previous prototype for ‘pmac_set_rtc_time’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:259:12: error: no previous prototype for ‘via_calibrate_decr’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/powermac/time.c:311:13: error: no previous prototype for ‘pmac_calibrate_decr’ [-Werror=missing-prototypes]

The function `via_calibrate_decr` was made static to silence a warning.

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:36 +10:00
Markus Elfring
baedcdf505 powerpc/kexec: Use common error handling code in setup_new_fdt()
Add a jump target so that a bit of exception handling can be better
reused at the end of this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:36 +10:00
Boqun Feng
302c7b0c4f powerpc/xmon: Add address lookup for percpu symbols
Currently, in xmon, there is no obvious way to get an address for a
percpu symbol for a particular cpu. Having such an ability would be
good for debugging the system when percpu variables got involved.

Therefore, this patch introduces a new xmon command "lp" to lookup the
address for percpu symbols. Usage of "lp" is similar to "ls", except
that we could add a cpu number to choose the variable of which cpu we
want to lookup. If no cpu number is given, lookup for current cpu.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:35 +10:00
Christophe Leroy
646dbe40fa powerpc/mm: remove huge_pte_offset_and_shift() prototype
huge_pte_offset_and_shift() has never existed

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:35 +10:00
Christophe Leroy
fa54a981ea powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled
The symbol memcpy_nocache_branch defined in order to allow patching
of memset function once cache is enabled leads to confusing reports
by perf tool.

Using the new patch_site functionality solves this issue.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:35 +10:00
Mahesh Salgaonkar
cd813e1cd7 powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.

Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.

  Severe Machine check interrupt [Recovered]
    NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: d00000000ca70000
  cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
      pc: c0000000009694c0: vsnprintf+0x80/0x480
      lr: c0000000009698e0: vscnprintf+0x20/0x60
      sp: c0000000fc777830
     msr: 8000000002009033
     dar: a803a30c000000d0
    current = 0xc00000000bc9ef00
    paca    = 0xc00000001eca5c00	 softe: 3	 irq_happened: 0x01
      pid   = 8860, comm = insmod
  vscnprintf+0x20/0x60
  vprintk_emit+0xb4/0x4b0
  vprintk_func+0x5c/0xd0
  printk+0x38/0x4c
  init_module+0x1c0/0x338 [bork_kernel]
  do_one_initcall+0x54/0x230
  do_init_module+0x8c/0x248
  load_module+0x12b8/0x15b0
  sys_finit_module+0xa8/0x110
  system_call+0x58/0x6c
  --- Exception: c00 (System Call) at 00007fff8bda0644
  SP (7fffdfbfe980) is in userspace

This patch fixes this issue.

Fixes: a08a53ea4c ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org # v3.15+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:34 +10:00
Hari Bathini
ced1bf52f4 powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements
With dynamic memory allocation support for crash memory ranges array,
there is no hard limit on the no. of crash memory ranges kernel could
export, but program headers count could overflow in the /proc/vmcore
ELF file while exporting each memory range as PT_LOAD segment. Reduce
the likelihood of a such scenario, by folding adjacent crash memory
ranges which minimizes the total number of PT_LOAD segments.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:34 +10:00
Hari Bathini
1bd6a1c4b8 powerpc/fadump: handle crash memory ranges array index overflow
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e ("memblock: Add array resizing support").

On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:

  task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000
  NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180
  REGS: c00000000b73b570 TRAP: 0300   Tainted: G          L   X  (4.4.140+)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22004484  XER: 20000000
  CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0
  ...
  NIP [c000000000047df4] smp_send_reschedule+0x24/0x80
  LR [c0000000000f9e58] resched_curr+0x138/0x160
  Call Trace:
    resched_curr+0x138/0x160 (unreliable)
    check_preempt_curr+0xc8/0xf0
    ttwu_do_wakeup+0x38/0x150
    try_to_wake_up+0x224/0x4d0
    __wake_up_common+0x94/0x100
    ep_poll_callback+0xac/0x1c0
    __wake_up_common+0x94/0x100
    __wake_up_sync_key+0x70/0xa0
    sock_def_readable+0x58/0xa0
    unix_stream_sendmsg+0x2dc/0x4c0
    sock_sendmsg+0x68/0xa0
    ___sys_sendmsg+0x2cc/0x2e0
    __sys_sendmsg+0x5c/0xc0
    SyS_socketcall+0x36c/0x3f0
    system_call+0x3c/0x100

as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.

Fixes: 2df173d9e8 ("fadump: Initialize elfcore header and add PT_LOAD program headers.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Just use PAGE_SIZE directly, fixup variable placement]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:34 +10:00
Christophe Leroy
6bd6d86722 powerpc/cpm1: fix compilation error with CONFIG_PPC_EARLY_DEBUG_CPM
commit e8cb7a55eb ("powerpc: remove superflous inclusions of
asm/fixmap.h") removed inclusion of asm/fixmap.h from files not
including objects from that file.

However, asm/mmu-8xx.h includes  call to __fix_to_virt(). The proper
way would be to include asm/fixmap.h in asm/mmu-8xx.h but it creates
an inclusion loop.

So we have to leave asm/fixmap.h in sysdep/cpm_common.c for
CONFIG_PPC_EARLY_DEBUG_CPM

  CC      arch/powerpc/sysdev/cpm_common.o
In file included from ./arch/powerpc/include/asm/mmu.h:340:0,
                 from ./arch/powerpc/include/asm/reg_8xx.h:8,
                 from ./arch/powerpc/include/asm/reg.h:29,
                 from ./arch/powerpc/include/asm/processor.h:13,
                 from ./arch/powerpc/include/asm/thread_info.h:28,
                 from ./include/linux/thread_info.h:38,
                 from ./arch/powerpc/include/asm/ptrace.h:159,
                 from ./arch/powerpc/include/asm/hw_irq.h:12,
                 from ./arch/powerpc/include/asm/irqflags.h:12,
                 from ./include/linux/irqflags.h:16,
                 from ./include/asm-generic/cmpxchg-local.h:6,
                 from ./arch/powerpc/include/asm/cmpxchg.h:537,
                 from ./arch/powerpc/include/asm/atomic.h:11,
                 from ./include/linux/atomic.h:5,
                 from ./include/linux/mutex.h:18,
                 from ./include/linux/kernfs.h:13,
                 from ./include/linux/sysfs.h:16,
                 from ./include/linux/kobject.h:20,
                 from ./include/linux/device.h:16,
                 from ./include/linux/node.h:18,
                 from ./include/linux/cpu.h:17,
                 from ./include/linux/of_device.h:5,
                 from arch/powerpc/sysdev/cpm_common.c:21:
arch/powerpc/sysdev/cpm_common.c: In function ‘udbg_init_cpm’:
./arch/powerpc/include/asm/mmu-8xx.h:218:25: error: implicit declaration of function ‘__fix_to_virt’ [-Werror=implicit-function-declaration]
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
                         ^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
       VIRT_IMMR_BASE);
       ^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: error: ‘FIX_IMMR_BASE’ undeclared (first use in this function)
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
                                       ^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
       VIRT_IMMR_BASE);
       ^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: note: each undeclared identifier is reported only once for each function it appears in
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
                                       ^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
       VIRT_IMMR_BASE);
       ^
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/sysdev/cpm_common.o] Error 1

Fixes: e8cb7a55eb ("powerpc: remove superflous inclusions of asm/fixmap.h")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:33 +10:00
Dan Carpenter
c42d3be0c0 powerpc: Fix size calculation using resource_size()
The problem is the the calculation should be "end - start + 1" but the
plus one is missing in this calculation.

Fixes: 8626816e90 ("powerpc: add support for MPIC message register API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:33 +10:00
Rashmica Gupta
24576a70e7 Documentation: Update documentation on ppc-memtrace
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:32 +10:00
Rashmica Gupta
d3da701d33 powerpc/powernv: Allow memory that has been hot-removed to be hot-added
This patch allows the memory removed by memtrace to be readded to the
kernel. So now you don't have to reboot your system to add the memory
back to the kernel or to have a different amount of memory removed.

Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:31 +10:00
Bartosz Golaszewski
563a53f390
spi: davinci: fix a NULL pointer dereference
On non-OF systems spi->controlled_data may be NULL. This causes a NULL
pointer derefence on dm365-evm.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
2018-08-10 11:48:37 +01:00
Josh Poimboeuf
07d981ad4c x86/microcode: Allow late microcode loading with SMT disabled
The kernel unnecessarily prevents late microcode loading when SMT is
disabled.  It should be safe to allow it if all the primary threads are
online.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2018-08-10 08:32:15 +01:00
David S. Miller
e91e218946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-08-10

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix cpumap and devmap on teardown as they're under RCU context
   and won't have same assumption as running under NAPI protection,
   from Jesper.

2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
   a bug where socket error was not propagated correctly, from Daniel.

3) Fix incompatible libbpf header license for BTF code and match it
   before it gets officially released with the rest of libbpf which
   is LGPL-2.1, from Martin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 23:18:29 -07:00
Steve French
e02789a53d smb3: enumerating snapshots was leaving part of the data off end
When enumerating snapshots, the last few bytes of the final
snapshot could be left off since we were miscalculating the
length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY)
See MS-SMB2 section 2.2.32.2. In addition fixup the length used
to allow smaller buffer to be passed in, in order to allow
returning the size of the whole snapshot array more easily.

Sample userspace output with a kernel patched with this
(mounted to a Windows volume with two snapshots).
Before this patch, the second snapshot would be missing a
few bytes at the end.

~/cifs-2.6# ~/enum-snapshots /mnt/file
press enter to issue the ioctl to retrieve snapshot information ...

size of snapshot array = 102
Num snapshots: 2 Num returned: 2 Array Size: 102

Snapshot 0:@GMT-2018.06.30-19.34.17
Snapshot 1:@GMT-2018.06.30-19.33.37

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09 21:20:01 -05:00
Ronnie Sahlberg
730928c8f4 cifs: update smb2_queryfs() to use compounding
Change smb2_queryfs() to use a Create/QueryInfo/Close compound request.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09 21:19:56 -05:00
Ronnie Sahlberg
b24df3e30c cifs: update receive_encrypted_standard to handle compounded responses
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09 21:19:45 -05:00
Dave Airlie
557ce95051 Merge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-next
More fixes for 4.19:
- Fixes for scheduler
- Fix for SR-IOV
- Fixes for display

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180809200052.2777-1-alexander.deucher@amd.com
2018-08-10 11:43:02 +10:00
Dave Airlie
8511b7da18 drm/imx: ipu-v3 plane offset and IPU id fixes
- Fix U/V plane offsets for odd vertical offsets. Due to wrong operator
   order, the y offset was not rounded down properly for vertically
   chroma subsampled planar formats.
 - Fix IPU id number for boards that don't have an OF alias for their
   single IPU in the device tree. This is necessary to support imx-media
   on i.MX51 and i.MX53 SoCs.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQRRO6F6WdpH1R0vGibVhaclGDdiwAUCW2RimhcccC56YWJlbEBw
 ZW5ndXRyb25peC5kZQAKCRDVhaclGDdiwM9hAP928KiDOQQNq4pYDTaq5pL3Tqnq
 Co2evjrCqq8DJpbmPwD/cilKVCZ7rY24U2kV+X/nkNvemHg0uYwnmnfLeolKMAQ=
 =uoGQ
 -----END PGP SIGNATURE-----

Merge tag 'imx-drm-fixes-2018-08-03' of git://git.pengutronix.de/git/pza/linux into drm-next

drm/imx: ipu-v3 plane offset and IPU id fixes

- Fix U/V plane offsets for odd vertical offsets. Due to wrong operator
  order, the y offset was not rounded down properly for vertically
  chroma subsampled planar formats.
- Fix IPU id number for boards that don't have an OF alias for their
  single IPU in the device tree. This is necessary to support imx-media
  on i.MX51 and i.MX53 SoCs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1533552680.4204.14.camel@pengutronix.de
2018-08-10 11:37:35 +10:00
Dave Airlie
4abfe15e2a drm/imx: use suspend/resume helpers, add ipu-v3 V4L2 XRGB32/XBGR32 support
- Convert imx_drm_suspend/resume to use the
   drm_mode_config_helper_suspend/ resume functions.
 - Add support for V4L2_PIX_FMT_XRGB32/XBGR32, corresponding to
   DRM_FORMAT_XRGB8888/BGRX8888, respectively.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQRRO6F6WdpH1R0vGibVhaclGDdiwAUCW2RkHxcccC56YWJlbEBw
 ZW5ndXRyb25peC5kZQAKCRDVhaclGDdiwLx+AQD9lfaYMq06ETXudTE3or832+eL
 tyLrRLdZ/2I2B1weCQD/ez8+0nYzCwNi85dFDAvu0XieaC0E7my+QmfvsOQBKQA=
 =U3BN
 -----END PGP SIGNATURE-----

Merge tag 'imx-drm-next-2018-08-03' of git://git.pengutronix.de/git/pza/linux into drm-next

drm/imx: use suspend/resume helpers, add ipu-v3 V4L2 XRGB32/XBGR32 support

- Convert imx_drm_suspend/resume to use the
  drm_mode_config_helper_suspend/ resume functions.
- Add support for V4L2_PIX_FMT_XRGB32/XBGR32, corresponding to
  DRM_FORMAT_XRGB8888/BGRX8888, respectively.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1533552701.4204.15.camel@pengutronix.de
2018-08-10 11:13:36 +10:00
Logan Gunthorpe
10dbc9fedc PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
Intel Sunrise Point PCH hardware has an implementation of the ACS bits that
does not comply with the PCIe standard.  Add a device-specific quirk,
pci_quirk_disable_intel_spt_pch_acs_redir() to disable ACS Redirection on
this system.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: changelog, split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-09 17:59:07 -05:00
Logan Gunthorpe
73c47ddef2 PCI: Add device-specific ACS Redirect disable infrastructure
Intel Sunrise Point (SPT) PCH hardware has an implementation of the ACS
bits that does not comply with the PCIe standard.  To deal with this we
need device-specific quirks to disable ACS redirection.

Add a new pci_dev_specific_disable_acs_redir() quirk and a new
.disable_acs_redir() function pointer for use by non-compliant devices.  No
functional change intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: split to separate patch, move
pci_dev_specific_disable_acs_redir() declarations to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-09 17:48:28 -05:00
Logan Gunthorpe
3b269185c1 PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
Convert the search for device-specific ACS enable quirks from searching a
NULL-terminated array to iterating through the array, which is always
fixed-size anyway.  No functional change intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: changelog, split to separate patch for reviewability]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-09 17:47:44 -05:00
Logan Gunthorpe
aaca43fda7 PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
disable the ACS redirect bits for select PCI bridges.  The bridges must be
selected before the devices are discovered by the kernel and the IOMMU
groups created.  Therefore, add a kernel command line parameter to specify
devices which must have their ACS bits disabled.

The new parameter takes a list of devices separated by a semicolon.  Each
device specified will have its ACS redirect bits disabled.  This is
similar to the existing 'resource_alignment' parameter.

The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled, which is sufficient to always allow
passing P2P traffic uninterrupted.  The bits are set after the kernel
(optionally) enables the ACS bits itself.  It is also done regardless of
whether the kernel or platform firmware sets the bits.

If the user tries to disable the ACS redirect for a device without the ACS
capability, print a warning to dmesg.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: reorder to add the generic code first and move the
device-specific quirk to subsequent patches]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
2018-08-09 17:37:19 -05:00
Al Viro
4c0d7cd5c8 make sure that __dentry_kill() always invalidates d_seq, unhashed or not
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.

Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09 18:07:15 -04:00
Al Viro
119e1ef80e fix __legitimize_mnt()/mntput() race
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped.  Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there.  The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...

Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09 17:51:32 -04:00
Jason Gunthorpe
922983c2a1 IB/uverbs: Fix reading of 32 bit flags
This is missing a zeroing of the high bits of flags, and is also not
correct for big endian machines. Properly zero extend the 32 bit flags
into the 64 bit stack variable.

Reported-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Fixes: bccd06223f ("IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-08-09 15:46:07 -06:00
Paul Burton
22f20a1103
MIPS: Remove remnants of UASM_ISA
Commit 33679a5037 ("MIPS: uasm: Remove needless ISA abstraction")
removed use of the MIPS_ISA preprocessor macro, but left a couple of
unused definitions of it behind.

Remove the dead code.

Signed-off-by: Paul Burton <paul.burton@mips.com>
2018-08-09 14:45:00 -07:00