Commit graph

737516 commits

Author SHA1 Message Date
Steven Rostedt (VMware)
841a915d20 vsprintf: Do not have bprintf dereference pointers
When trace_printk() was introduced, it was discussed that making it be as
low overhead as possible, that the processing of the format string should be
delayed until it is read. That is, a "trace_printk()" should not convert
the %d into numbers and so on, but instead, save the fmt string and all the
args in the buffer at the time of recording. When the trace_printk() data is
read, it would then parse the format string and do the conversions of the
saved arguments in the tracing buffer.

The code to perform this was added to vsprintf where vbin_printf() would
save the arguments of a specified format string in a buffer, then
bstr_printf() could be used to convert the buffer with the same format
string into the final output, as if vsprintf() was called in one go.

The issue arises when dereferenced pointers are used. The problem is that
something like %*pbl which reads a bitmask, will save the pointer to the
bitmask in the buffer. Then the reading of the buffer via bstr_printf() will
then look at the pointer to process the final output. Obviously the value of
that pointer could have changed since the time it was recorded to the time
the buffer is read. Worse yet, the bitmask could be unmapped, and the
reading of the trace buffer could actually cause a kernel oops.

Another problem is that user space tools such as perf and trace-cmd do not
have access to the contents of these pointers, and they become useless when
the tracing buffer is extracted.

Instead of having vbin_printf() simply save the pointer in the buffer for
later processing, have it perform the formatting at the time bin_printf() is
called. This will fix the issue of dereferencing pointers at a later time,
and has the extra benefit of having user space tools understand these
values.

Since perf and trace-cmd already can handle %p[sSfF] via saving kallsyms,
their pointers are saved and not processed during vbin_printf(). If they
were converted, it would break perf and trace-cmd, as they would not know
how to deal with the conversion.

Link: http://lkml.kernel.org/r/20171228204025.14a71d8f@gandalf.local.home

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:30 -05:00
Andi Kleen
dd3dad0d71 ftrace: Mark function tracer test functions noinline/noclone
The ftrace function tracer self tests calls some functions to verify
the get traced. This relies on them not being inlined. Previously
this was ensured by putting them into another file, but with LTO
the compiler can inline across files, which makes the tests fail.

Mark these functions as noinline and noclone.

Link: http://lkml.kernel.org/r/20171221233732.31896-1-andi@firstfloor.org

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:29 -05:00
Ravi Bangoria
0e4d819d08 trace_uprobe: Display correct offset in uprobe_events
Recently, how the pointers being printed with %p has been changed
by commit ad67b74d24 ("printk: hash addresses printed with %p").
This is causing a regression while showing offset in the
uprobe_events file. Instead of %p, use %px to display offset.

Before patch:

  # perf probe -vv -x /tmp/a.out main
  Opening /sys/kernel/debug/tracing//uprobe_events write=1
  Writing event: p:probe_a/main /tmp/a.out:0x58c

  # cat /sys/kernel/debug/tracing/uprobe_events
  p:probe_a/main /tmp/a.out:0x0000000049a0f352

After patch:

  # cat /sys/kernel/debug/tracing/uprobe_events
  p:probe_a/main /tmp/a.out:0x000000000000058c

Link: http://lkml.kernel.org/r/20180106054246.15375-1-ravi.bangoria@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: ad67b74d24 ("printk: hash addresses printed with %p")
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:29 -05:00
Changbin Du
f4d0706cde tracing: Make sure the parsed string always terminates with '\0'
Always mark the parsed string with a terminated nul '\0' character. This removes
the need for the users to have to append the '\0' before using the parsed string.

Link: http://lkml.kernel.org/r/1516093350-12045-4-git-send-email-changbin.du@intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:28 -05:00
Changbin Du
76638d9650 tracing: Clear parser->idx if only spaces are read
If only spaces were read while parsing the next string, then parser->idx should be
cleared in order to make trace_parser_loaded() return false.

Link: http://lkml.kernel.org/r/1516093350-12045-3-git-send-email-changbin.du@intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:28 -05:00
Changbin Du
921a7acd85 tracing: Detect the string nul character when parsing user input string
User space can pass in a C nul character '\0' along with its input. The
function trace_get_user() will try to process it as a normal character,
and that will fail to parse.

open("/sys/kernel/debug/tracing//set_ftrace_pid", O_WRONLY|O_TRUNC) = 3
write(3, " \0", 2)                      = -1 EINVAL (Invalid argument)

while parse can handle spaces, so below works.

$ echo "" > set_ftrace_pid
$ echo " " > set_ftrace_pid
$ echo -n " " > set_ftrace_pid

Have the parser stop on '\0' and cease any further parsing. Only process
the characters up to the nul '\0' character and do not process it.

Link: http://lkml.kernel.org/r/1516093350-12045-2-git-send-email-changbin.du@intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:27 -05:00
Steven Rostedt (VMware)
2ee5b92a25 tracing: Update stack trace skipping for ORC unwinder
With the addition of ORC unwinder and FRAME POINTER unwinder, the stack
trace skipping requirements have changed.

I went through the tracing stack trace dumps with ORC and with frame
pointers and recalculated the proper values.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:00 -05:00
Steven Rostedt (VMware)
6be7fa3c74 ftrace, orc, x86: Handle ftrace dynamically allocated trampolines
The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:56:55 -05:00
Jay Cornwall
430a23689d PCI: Add pci_enable_atomic_ops_to_root()
The Atomic Operations feature (PCIe r4.0, sec 6.15) allows atomic
transctions to be requested by, routed through and completed by PCIe
components. Routing and completion do not require software support.
Component support for each is detectable via the DEVCAP2 register.

A Requester may use AtomicOps only if its PCI_EXP_DEVCTL2_ATOMIC_REQ is
set. This should be set only if the Completer and all intermediate routing
elements support AtomicOps.

A concrete example is the AMD Fiji-class GPU (which is capable of making
AtomicOp requests), below a PLX 8747 switch (advertising AtomicOp routing)
with a Haswell host bridge (advertising AtomicOp completion support).

Add pci_enable_atomic_ops_to_root() for per-device control over AtomicOp
requests. This checks to be sure the Root Port supports completion of the
desired AtomicOp sizes and the path to the Root Port supports routing the
AtomicOps.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[bhelgaas: changelog, comments, whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-01-23 14:46:50 -06:00
Linus Torvalds
1f07476ec1 pci-v4.15-fixes-3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaZ5GHAAoJEFmIoMA60/r8UmAP/AjrS+C+OSMHzM/U/h+UHmou
 DRg5Tg7gG3PjnnKULEQno4BSGM5VJeXpM+QC8Ypqa2I1T4GTHVHiSn6qtOi4yn0R
 k9CAGyeRD1zYH4Mtw4dyUOVE5j0ANoBuji0KklawqaxcPFJDh6FhSCbChPq0WjIA
 ZarpJZ4YKeJF5ZYXFs4G5W6EjnXokYogjZL3lzCOVXoTC9aVfo+NoJ9Hg9P38VA3
 WmYaLUEbIzV40JXkDieZq7nqO8m3mFBtxi7+74BvsjtgozW0og1tPNI6Hige6/U3
 dUAzrdNGQg52T3pTMm8r0+efagDV11UO0IBhVhktNzTTVaUggxluBEza//FaQ85m
 PLm7vsnwtIALujnp74VcFnTsVM2yYa9NeaPNIQ9FQ6kOB7TS0/ELtyHaPLNS8JoM
 lhssdhT3jmVNg86UG2pOi9MJZhyYb6SQmWAGmYopzwEn/idBTOcLsTdcbptF5LfP
 0hbc9BLI0wCF8sv0JXD4IBQFWN264z1vPGNDWD4cnkEMPiAJ23h1ySpS3S0HjZe3
 c6AMEMNj5E1b6unwIebBHfeSj4DUUnfyypb4/oICNxCThQBIzPyXtnlk07DNFMOl
 9pRZBebfUl5xAZeWz2Sxxvs0PqMc8QEDa5j8YzLS3cgz9y1i80Hn6fGxSeHptgC+
 Zb4SYRr5S82kU3SbAtnY
 =tRVW
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Fix AMD regression due to not re-enabling the big window on resume
  (Christian König)"

* tag 'pci-v4.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Enable AMD 64-bit window on resume
2018-01-23 12:45:40 -08:00
Stuart Hayes
0077a845f7 PCI: Expose ari_enabled in sysfs
Some multifunction PCI devices with more than 8 functions use "alternative
routing-ID interpretation" (ARI), which means the 8-bit device/function
number field will be interpreted as 8 bits specifying the function number
(the device number is 0 implicitly), rather than the upper 5 bits
specifying the device number and the lower 3 bits specifying the function
number. The kernel can enable and use this.

Expose in a sysfs attribute whether the kernel has enabled ARI, so that a
program in userspace won't have to parse PCI devices and PCI configuration
space to figure out if it is enabled. This will allow better predictable
network naming using PCI function numbers without using PCI bus or device
numbers, which is desirable because bus and device numbers can change with
system configuration but function numbers will not.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-01-23 14:39:24 -06:00
Lukas Wunner
493fb50e95 PCI: pciehp: Assume NoCompl+ for Thunderbolt ports
Certain Thunderbolt 1 controllers claim to support Command Completed events
(value of 0b in the No Command Completed Support field of the Slot
Capabilities register) but in reality they neither set the Command
Completed bit in the Slot Status register nor signal a Command Completed
interrupt:

  8086:1513  CV82524  [Light Ridge 4C  2010]
  8086:151a  DSL2310  [Eagle Ridge 2C  2011]
  8086:151b  CVL2510  [Light Peak 2C   2010]
  8086:1547  DSL3510  [Cactus Ridge 4C 2012]
  8086:1548  DSL3310  [Cactus Ridge 2C 2012]
  8086:1549  DSL2210  [Port Ridge 1C   2011]

All known newer chips (Redwood Ridge and onwards) set No Command Completed
Support, indicating that they do not support Command Completed events.

The user-visible impact is that after unplugging such a device, 2 seconds
elapse until pciehp is unbound.  That's because on ->remove,
pcie_write_cmd() is called via pcie_disable_notification() and every call
to pcie_write_cmd() takes 2 seconds (1 second for each invocation of
pcie_wait_cmd()):

  [  337.942727] pciehp 0000:0a:00.0:pcie204: Timeout on hotplug command 0x1038 (issued 21176 msec ago)
  [  340.014735] pciehp 0000:0a:00.0:pcie204: Timeout on hotplug command 0x0000 (issued 2072 msec ago)

That by itself has always been unpleasant, but the situation has become
worse with commit cc27b735ad ("PCI/portdrv: Turn off PCIe services during
shutdown"):  Now pciehp is unbound on ->shutdown.  Because Thunderbolt
controllers typically have 4 hotplug ports, every reboot and shutdown is
now delayed by 8 seconds, plus another 2 seconds for every attached
Thunderbolt 1 device.

Thunderbolt hotplug slots are not physical slots that one inserts cards
into, but rather logical hotplug slots implemented in silicon.  Devices
appear beyond those logical slots once a PCI tunnel is established on top
of the Thunderbolt Converged I/O switch.  One would expect commands written
to the Slot Control register to be executed immediately by the silicon, so
for simplicity we always assume NoCompl+ for Thunderbolt ports.

Fixes: cc27b735ad ("PCI/portdrv: Turn off PCIe services during shutdown")
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org	# v4.12+
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Yehezkel Bernat <yehezkel.bernat@intel.com>
Cc: Michael Jamet <michael.jamet@intel.com>
Cc: Andreas Noever <andreas.noever@gmail.com>
2018-01-23 14:28:41 -06:00
Jayachandran C
0ba2e29c7f arm64: Turn on KPTI only on CPUs that need it
Whitelist Broadcom Vulcan/Cavium ThunderX2 processors in
unmap_kernel_at_el0(). These CPUs are not vulnerable to
CVE-2017-5754 and do not need KPTI when KASLR is off.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-23 19:59:49 +00:00
Jayachandran C
f3d795d9b3 arm64: Branch predictor hardening for Cavium ThunderX2
Use PSCI based mitigation for speculative execution attacks targeting
the branch predictor. We use the same mechanism as the one used for
Cortex-A CPUs, we expect the PSCI version call to have a side effect
of clearing the BTBs.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-23 19:59:20 +00:00
Trond Myklebust
8f39fce84a NFS-over-RDMA client updates for Linux 4.16
New features:
 - xprtrdma tracepoints
 
 Bugfixes and cleanups:
 - Fix memory leak if rpcrdma_buffer_create() fails
 - Fix allocating extra rpcrdma_reps for the backchannel
 - Remove various unused and redundant variables and lock cycles
 - Fix IPv6 support in xprt_rdma_set_port()
 - Fix memory leak by calling buf_free for callback replies
 - Fix "bytes registered" accounting
 - Fix kernel-doc comments
 - SUNRPC tracepoint cleanups for consistent information
 - Optimizations for __rpc_execute()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlpniEEACgkQ18tUv7Cl
 QOsoGQ//ctg5iKTLlnjobnPdtS4zCQCgD8upx/FLawo/LCkDRiRD+lPNbSWy25db
 Z/YcYslUuwli3J9PoF36DxBAoep+N5LvD2IIwifr72tbwaRR+gK/xRdxdbiFXYod
 9HK+qBN5qRcgKpxPoHHGj9CoF/6ATcqyLhy7bP30UjNhydmlkaYEF1KXeO/l4NQu
 CQUyumdy0M9T8QW4EkXY2vkjS9leC4byI8P4iMaJnq157sYYVxLMryDGayK4ZJNH
 kmeAZoDhvNm+HKyD7vX90+w8/LPQutJ/A7m309KyMHiZZ23KxDW7r9ZclzcOswd1
 NY8HjyUHjl6gvFN9DmtOz95eKmmh1SpCQstw6lKTtNBVzP3+Dw9Lp2tLQFwuCe9y
 uzD/GYcFJECyBoHU4xsm9kV0PuzwAfvXahnylLjf7yV4MlDpjOZQT/vMvA96XHo6
 oOpT3+zikwCHz2FZM0cj5fQ0uQrxsx2zxAcmr8v5StC17Ng94LQH5ByAiOleJ14p
 Az+mAYATNgmSDD07+7UrEH7LD1b2h2xmdbaOa7UXeK2jJaYewGlUEKhKXIhjKJy9
 t2yecGKj4Y7omkNf/0YHle2t6WhNyFqjpISiF1KwuGHuqU5krPCpgKBOnuABcJTx
 D1BuSw4YQ5ZDfb2w0HAbPuwwRnU5FY1K79MIm7VkdE9ozWfmShA=
 =67UR
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-4.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

NFS-over-RDMA client updates for Linux 4.16

New features:
- xprtrdma tracepoints

Bugfixes and cleanups:
- Fix memory leak if rpcrdma_buffer_create() fails
- Fix allocating extra rpcrdma_reps for the backchannel
- Remove various unused and redundant variables and lock cycles
- Fix IPv6 support in xprt_rdma_set_port()
- Fix memory leak by calling buf_free for callback replies
- Fix "bytes registered" accounting
- Fix kernel-doc comments
- SUNRPC tracepoint cleanups for consistent information
- Optimizations for __rpc_execute()
2018-01-23 14:55:50 -05:00
Amritha Nambiar
bbf0bdd41f i40e: Fix channel addition in reset flow
Fix recreating the channel VSIs during the reset flow to reconfigure
the Tx rings and the queue context associated with the channel VSI.
Also update the next_base_queue for the VSI while rebuilding the
channel VSIs after a reset.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Alan Brady
e0346f9fcb i40evf: ignore link up if not running
If we receive the link status message from PF with link up before queues
are actually enabled, it will trigger a TX hang.  This fixes the issue
by ignoring a link up message if the VF state is not yet in RUNNING
state.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Markus Elfring
557450c301 i40e: Delete an error message for a failed memory allocation in i40e_init_interrupt_scheme()
Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Shiraz Saleem
7b0b1a6d0a i40e: Disable iWARP VSI PETCP_ENA flag on netdev down events
Client close is overloaded to handle both un-registration and
netdev down event. On netdev down, i40iw client close is called
which unregisters the RDMA dev and this is too destructive
since the netdev is still registered.

Do not call client close/open on netdev down/up events. Instead
disable the PE TCP_ENA flag during a netdev down event. This
blocks all TCP traffic to the RDMA Protocol Engine. On netdev up,
re-enable the flag.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Mitch Williams
bc73234bcd i40e: simplify pointer dereferences
Now that i40e_vsi_config_tc() has the pf and hw variable defined, use
them, instead of dereferencing vsi->back. Much easier to read.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Mitch Williams
d8a8785660 i40e: check for invalid DCB config
The driver (and the entire netdev layer for that matter) assumes
that TC0 will always be present in our DCB configuration.
Unfortunately, this isn't always the case. Rather than fail to
configure the VSI, let's go ahead and try to make it work, even
though DCB will end up being disabled by the kernel.

If the driver fails to configure DCB, the driver queries what's
valid, then writes that back to the hardware, always forcing TC0.

This fixes a bug where the driver could fail to adhere to ETS BW
allocations if 8 TCs were configured on the switch.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Sudheer Mogilappagari
07d44190a3 i40e/i40evf: Detect and recover hung queue scenario
In VFs, there is a known issue which can cause writebacks
to not occur when interrupts are disabled and there are
less than 4 descriptors resulting in TX timeout. Timeout
can also occur due to lost interrupt.

The current implementation for detecting and recovering
from hung queues in the PF is problematic because it actually
actively encourages lost interrupts.  By triggering a SW
interrupt, interrupts are forced on.  If we are already in
napi_poll and an interrupt fires, napi_poll will not be
rescheduled and the interrupt is effectively lost; thereby
potentially *causing* hung queues.

This patch checks whether packets are being processed between
every watchdog cycle and determine potential hung queue and
fires triggers SW interrupt only for that particular queue.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Michal Kuchta
d95cd48060 i40e: Fix for blinking activity instead of link LEDs
This fix solves an issue occurring while calling i40e_led_set function
from the driver with "blink" parameter set as TRUE. This call resulted
in Activity LED blinking instead of Link LED, which may lead to errors
in physically identifying the port, since Activity LED may be blinking
for different reasons as well.

Signed-off-by: Michal Kuchta <michal.kuchta@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Avinash Dayanand
06aa040f03 i40evf: Don't schedule reset_task when device is being removed
When a host disables and enables a PF device, all the associated
VFs are removed and added back in. It also generates a PFR which in turn
resets all the connected VFs. This behaviour is different from that of
Linux guest on Linux host. Hence we end up in a situation where there's
a PFR and device removal at the same time. And watchdog doesn't have a
clue about this and schedules a reset_task. This patch adds code to send
signal to reset_task that the device is currently being removed.

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Sudheer Mogilappagari
a558566bef i40evf: remove flush_scheduled_work call in i40evf_remove
flush_schedule_work blocks until completion of all scheduled
work items in global work-queue. This can cause deadlock in some
cases. i40evf_remove() cleans up necessary work items with
cancel_delayed_work_sync and cancel_work_sync. This fix removes
flush_schedule_work call inside i40evf_remove().

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Mitch Williams
b356dac8ab i40e: avoid divide by zero
In some weird circumstances with DCB enabled, the firmware can fail to
configure the VSI, leaving us with zero traffic classes. Check for this
state when we configure RSS to avoid a panic.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Pawel Jablonski
e3a5d6e6fa i40e/i40evf: Enable NVMUpdate to retrieve AdminQ and add preservation flags for NVM update
This patch adds new I40E_NVMUPD_GET_AQ_EVENT state to allow
retrieval of AdminQ events as a result of AdminQ commands sent
to firmware.

Add preservation flags support on X722 devices for NVM update
AdminQ function wrapper. Add new parameter and handling to
nvmupdate admin queue function intended to allow nvmupdate tool
to configure the preservation flags in the AdminQ command.

This is required to implement FlatNVM on X722 devices.

Signed-off-by: Pawel Jablonski <pawel.jablonski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
Keith Busch
62314e405f nvme-pci: Fix queue double allocations
The queue count says the highest queue that's been allocated, so don't
reallocate a queue lower than that.

Fixes: 147b27e4bd ("nvme-pci: allocate device queues storage space at probe")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-23 20:15:34 +01:00
David S. Miller
5ca114400d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in
'net'.

The esp{4,6}_offload.c conflicts were overlapping changes.
The 'out' label is removed so we just return ERR_PTR(-EINVAL)
directly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23 13:51:56 -05:00
Luis de Bethencourt
1c47a645ba device-dax: Fix trailing semicolon
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-23 10:49:03 -08:00
Josh Poimboeuf
e2ac83d74a x86/ftrace: Fix ORC unwinding from ftrace handlers
Steven Rostedt discovered that the ftrace stack tracer is broken when
it's used with the ORC unwinder.  The problem is that objtool is
instructed by the Makefile to ignore the ftrace_64.S code, so it doesn't
generate any ORC data for it.

Fix it by making the asm code objtool-friendly:

- Objtool doesn't like the fact that save_mcount_regs pushes RBP at the
  beginning, but it's never restored (directly, at least).  So just skip
  the original RBP push, which is only needed for frame pointers anyway.

- Annotate some functions as normal callable functions with
  ENTRY/ENDPROC.

- Add an empty unwind hint to return_to_handler().  The return address
  isn't on the stack, so there's nothing ORC can do there.  It will just
  punt in the unlikely case it tries to unwind from that code.

With all that fixed, remove the OBJECT_FILES_NON_STANDARD Makefile
annotation so objtool can read the file.

Link: http://lkml.kernel.org/r/20180123040746.ih4ep3tk4pbjvg7c@treble

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 13:24:19 -05:00
Benjamin Coddington
0be283f676 SUNRPC: Fix null rpc_clnt dereference in rpc_task_queued tracepoint
Backchannel tasks will not have a reference to the rpc_clnt.  Return -1 for
cl_clid in that case.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trondmy@gmail.com>
2018-01-23 13:18:11 -05:00
Eric W. Biederman
c0f45555b8 signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
Delegate filling out struct siginfo to functions in kernel/signal.c
to simplify the code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-23 12:17:48 -06:00
Eric W. Biederman
83b57531c5 mm/memory_failure: Remove unused trapno from memory_failure
Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc,
powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO
(alpha, metag, sparc, and tile).  These two sets of architectures do
not interesect so remove the trapno paramater to remove confusion.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-23 12:17:42 -06:00
Prashant Bhole
783687810e bpf: test_maps: cleanup sockmaps when test ends
Bug: BPF programs and maps related to sockmaps test exist
in memory even after test_maps ends.

This patch fixes it as a short term workaround (sockmap
kernel side needs real fixing) by empyting sockmaps when
test ends.

Fixes: 6f6d33f3b3 ("bpf: selftests add sockmap tests")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
[ daniel: Note on workaround. ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23 19:10:09 +01:00
Shannon Nelson
85bc2663a5 ixgbe: register ipsec offload with the xfrm subsystem
With all the support code in place we can now link in the ipsec
offload operations and set the ESP feature flag for the XFRM
subsystem to see.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 10:09:12 -08:00
Shannon Nelson
a8a43fda27 ixgbe: ipsec offload stats
Add a simple statistic to count the ipsec offloads.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 10:07:18 -08:00
Shannon Nelson
5925947047 ixgbe: process the Tx ipsec offload
If the skb has a security association referenced in the skb, then
set up the Tx descriptor with the ipsec offload bits.  While we're
here, we fix an oddly named field in the context descriptor struct.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 10:02:30 -08:00
Shannon Nelson
92103199f1 ixgbe: process the Rx ipsec offload
If the chip sees and decrypts an ipsec offload, set up the skb
sp pointer with the ralated SA info.  Since the chip is rude
enough to keep to itself the table index it used for the
decryption, we have to do our own table lookup, using the
hash for speed.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 09:52:57 -08:00
Alexei Starovoitov
8e6875250a selftests/bpf: fix test_dev_cgroup
The test incorrectly doing
mkdir /mnt/cgroup-test-work-dirtest-bpf-based-device-cgroup
instead of
mkdir /mnt/cgroup-test-work-dir/test-bpf-based-device-cgroup

somehow such mkdir succeeds and new directory appears:
/mnt/cgroup-test-work-dir/cgroup-test-work-dirtest-bpf-based-device-cgroup

Later cleanup via nftw("/mnt/cgroup-test-work-dir", ...);
doesn't walk this directory.
"rmdir /mnt/cgroup-test-work-dir" succeeds, but bpf program and
dangling cgroup stays in memory.
That's a separate issue on a cgroup side.
For now fix the test.

Fixes: 37f1ba0909 ("selftests/bpf: add a test for device cgroup controller")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23 18:42:12 +01:00
Shannon Nelson
6d73a1540b ixgbe: restore offloaded SAs after a reset
On a chip reset most of the table contents are lost, so must be
restored.  This scans the driver's ipsec tables and restores both
the filled and empty table slots to their pre-reset values.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 09:37:09 -08:00
Alexei Starovoitov
1a97cf1fe5 selftests/bpf: speedup test_maps
test_hashmap_walk takes very long time on debug kernel with kasan on.
Reduce the number of iterations in this test without sacrificing
test coverage.
Also add printfs as progress indicator.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23 18:28:03 +01:00
Shannon Nelson
63a67fe229 ixgbe: add ipsec offload add and remove SA
Add the functions for setting up and removing offloaded SAs (Security
Associations) with the x540 hardware.  We set up the callback structure
but we don't yet set the hardware feature bit to be sure the XFRM service
won't actually try to use us for an offload yet.

The software tables are made up to mimic the hardware tables to make it
easier to track what's in the hardware, and the SA table index is used
for the XFRM offload handle.  However, there is a hashing field in the
Rx SA tracking that will be used to facilitate faster table searches in
the Rx fast path.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 09:19:02 -08:00
Christian König
a35f2f34b5 dma-buf: make returning the exclusive fence optional
Change reservation_object_get_fences_rcu to make the exclusive fence
pointer optional.

If not specified the exclusive fence is put into the fence array as
well.

This is helpful for a couple of cases where we need all fences in a
single array.

Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180110125341.3618-1-christian.koenig@amd.com
2018-01-23 12:14:26 -05:00
Shannon Nelson
34c822e2fb ixgbe: add ipsec data structures
Set up the data structures to be used by the ipsec offload.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 09:11:27 -08:00
Shannon Nelson
49a94d74d9 ixgbe: add ipsec engine start and stop routines
Add in the code for running and stopping the hardware ipsec
encryption/decryption engine.  It is good to keep the engine
off when not in use in order to save on the power draw.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 09:08:57 -08:00
Shannon Nelson
8bbbc5e90b ixgbe: add ipsec register access routines
Add a few routines to make access to the ipsec registers just a little
easier, and throw in the beginnings of an initialization.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 09:00:18 -08:00
Linus Torvalds
a84a8ab94e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix divide by zero in mlx5, from Talut Batheesh.

 2) Guard against invalid GSO packets coming from untrusted guests and
    arriving in qdisc_pkt_len_init(), from Eric Dumazet.

 3) Similarly add such protection to the various protocol GSO handlers.
    From Willem de Bruijn.

 4) Fix regression added to IGMP source address checking for IGMPv3
    reports, from Felix Feitkau.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  tls: Correct length of scatterlist in tls_sw_sendpage
  be2net: restore properly promisc mode after queues reconfiguration
  net: igmp: fix source address check for IGMPv3 reports
  gso: validate gso_type in GSO handlers
  net: qdisc_pkt_len_init() should be more robust
  ibmvnic: Allocate and request vpd in init_resources
  ibmvnic: Revert to previous mtu when unsupported value requested
  ibmvnic: Modify buffer size and number of queues on failover
  rds: tcp: compute m_ack_seq as offset from ->write_seq
  usbnet: silence an unnecessary warning
  cxgb4: fix endianness for vlan value in cxgb4_tc_flower
  cxgb4: set filter type to 1 for ETH_P_IPV6
  net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
2018-01-23 08:52:55 -08:00
Shannon Nelson
beca815403 ixgbe: clean up ipsec defines
Clean up the ipsec/macsec descriptor bit definitions to match the rest
of the defines and file organization.  Also recognise the bit-definition
overlap in the error mask macro.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 08:41:25 -08:00
Yonghong Song
35136920e1 tools/bpf: fix a test failure in selftests prog test_verifier
Commit 111e6b4531 ("selftests/bpf: make test_verifier run most programs")
enables tools/testing/selftests/bpf/test_verifier unit cases to run
via bpf_prog_test_run command. With the latest code base,
test_verifier had one test case failure:

  ...
  #473/p check deducing bounds from const, 2 FAIL retval 1 != 0
  0: (b7) r0 = 1
  1: (75) if r0 s>= 0x1 goto pc+1
   R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  2: (95) exit

  from 1 to 3: R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  3: (d5) if r0 s<= 0x1 goto pc+1
   R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  4: (95) exit

  from 3 to 5: R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  5: (1f) r1 -= r0
  6: (95) exit
  processed 7 insns (limit 131072), stack depth 0
  ...

The test case does not set return value in the test
structure and hence the return value from the prog run
is assumed to be 0. However, the actual return value is 1.
As a result, the test failed. The fix is to correctly set
the return value in the test structure.

Fixes: 111e6b4531 ("selftests/bpf: make test_verifier run most programs")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23 17:36:09 +01:00