Commit graph

67868 commits

Author SHA1 Message Date
Dexuan Cui
ae20b25430 Drivers: hv: vmbus: enable VMBus protocol version 5.0
With VMBus protocol 5.0, we're able to better support new features, e.g.
running two or more VMBus drivers simultaneously in a single VM -- note:
we can't simply load the current VMBus driver twice, instead, a secondary
VMBus driver must be implemented.

This patch adds the support for the new VMBus protocol, which is available
on new Windows hosts, by:

1) We still use SINT2 for compatibility;
2) We must use Connection ID 4 for the Initiate Contact Message, and for
subsequent messages, we must use the Message Connection ID field in
the host-returned VersionResponse Message.

Notes for developers of the secondary VMBus driver:
1) Must use VMBus protocol 5.0 as well;
2) Must use a different SINT number that is not in use.
3) Must use Connection ID 4 for the Initiate Contact Message, and for
subsequent messages, must use the Message Connection ID field in
the host-returned VersionResponse Message.
4) It's possible that the primary VMBus driver using protocol version 4.0
can work with a secondary VMBus driver using protocol version 5.0, but it's
recommended that both should use 5.0 for new Hyper-V features in the future.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-14 16:06:48 +02:00
Bruce Allan
0fccb85ad2 virtchnl: Whitespace and parenthesis cleanup
Clean up existing instances of unnecessary parentheses in if
statement and change order of conditionals to make it easier to read

The opening /* should be followed by a single space and the closing */
should be preceded with a single space.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-14 07:05:16 -07:00
Al Viro
2220c5b0a7 make xattr_getsecurity() static
many years overdue...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-14 09:51:34 -04:00
Bartlomiej Zolnierkiewicz
e7deb3c774 drm: shmobile: remove unused MERAM support
Since commit a521422ea4  ("ARM: shmobile: mackerel: Remove Legacy C
board code") MERAM functionality is unused. Remove it.

Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-05-14 15:47:30 +02:00
Ben Hutchings
ea739a287f mtd: Fix comparison in map_word_andequal()
Commit 9e343e87d2 ("mtd: cfi: convert inline functions to macros")
changed map_word_andequal() into a macro, but also changed the right
hand side of the comparison from val3 to val2.  Change it back to use
val3 on the right hand side.

Thankfully this did not cause a regression because all callers
currently pass the same argument for val2 and val3.

Fixes: 9e343e87d2 ("mtd: cfi: convert inline functions to macros")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-05-14 14:46:20 +02:00
Matthias Brugger
0afd32c6bb Merge commit 'f15cd6d991' into v.4.17-next/soc-test 2018-05-14 12:22:03 +02:00
Frederic Weisbecker
48bda43eab softirq/s390: Move default mutators of overwritten softirq mask to s390
s390 is now the last architecture that entirely overwrites
local_softirq_pending() and uses the according default definitions of
set_softirq_pending() and or_softirq_pending().

Just move these to s390 to debloat the generic code complexity.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-12-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:25:28 +02:00
Frederic Weisbecker
0fd7d86285 softirq/core: Consolidate default local_softirq_pending() implementations
Consolidate and optimize default softirq mask API implementations.
Per-CPU operations are expected to be faster and a few architectures
already rely on them to implement local_softirq_pending() and related
accessors/mutators. Those will be migrated to the new generic code.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-6-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:25:27 +02:00
Frederic Weisbecker
0f6f47bacb softirq/core: Turn default irq_cpustat_t to standard per-cpu
In order to optimize and consolidate softirq mask accesses, let's
convert the default irq_cpustat_t implementation to per-CPU standard API.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-5-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:25:27 +02:00
Ingo Molnar
4b96583869 Linux 4.17-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlr4xw8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGNYoH/1d5zyMpVJVUKZ0K
 LuEctCGby1PjSvSOhmMuxFVagFAqfBJXmwWTeohLfLG48r/Yk0AsZQ5HH13/8baj
 k/T8UgUvKZKustndCRp+joQ3Pa1ZpcIFaWRvB8pKFCefJ/F/Lj4B4X1HYI7vLq0K
 /ZBXUdy3ry0lcVuypnaARYAb2O7l/nyZIjZ3FhiuyymWe7Jpo+G7VK922LOMSX/y
 VYFZCWa8nxN+yFhO0ao9X5k7ggIiUrEBtbfNrk19VtAn0hx+OYKW2KfJK/eHNey/
 CKrOT+KAxU8VU29AEIbYzlL3yrQmULcEoIDiqJ/6m5m6JwsEbP6EqQHs0TiuQFpq
 A0MO9rw=
 =yjUP
 -----END PGP SIGNATURE-----

Merge tag 'v4.17-rc5' into irq/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:22:59 +02:00
Olof Johansson
71fe67e0e2 ARM: SOC driver update for 4.18
- AEMIF driver update to support board files and remove
    need of mach-davinci aemif code
  - Use percpu counters for qmss datapath stats
  - License update for TI SCI
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJa8hefAAoJEHJsHOdBp5c/Z7gP/RJcEM/bUrmIj+iAf+h/Azp3
 f5KiFrBmwIcRPC4VULRL06uuRgiExWcDY6j3gheKdJzqHOKprRysdRDEkHLKnmoy
 EGUS2HKo6Bbig/G/lMy9YhmrOEqm2tsh008TwSj6V8ZHSXgdyd3R4Kbe0YM5bbVp
 TMUMTuGN6EP1RMAMk4zh9jGxCSDgfzI6FZd2Yf8pxAhsIAa7ssbzGGT85p3lBP3T
 PTQ5h/aMP833gf7Ir7z5wEdqvdmfLVIxyu2bOzbP+rPUnaUGI9E2qaKPgiz8Tw5W
 sCICBv2ELauoSyLLUaJ8BOVU6pI07Cm0DisdUmhuHex3EOm+U1Leg3XX5pyJ/Sh7
 sGeLfFpMYWflinp6owq3J/z4sOQCq6pHugfS+6H3k6+uBX1S4RVkhcLda/Z0zz7p
 LAGpFkSSHP1vFtQLU7phSu7m+v4KqtnUEDZelCLroJIkncpwotYaAjisIOxOif0j
 WgNCHMNdTF/oxGCWBxqk5GH3bvZg53uUK9iy+WjgfKLOe+2RKEM+MbPpmVBWJxag
 ZV9vwRpgmi/5I/cUNt7sTJ8ine66I5N68ps4K6e8I8JNHqBqpcZFQwnZ+x+Ib0l9
 vv3unT92f/7q2XtIGkxkDNgqnsLF/b6sxqmy8jnNme/FIOWPX4iq2TGSnZ151FQ1
 nH3qGLDyn3UAEdBXMnkb
 =uqd7
 -----END PGP SIGNATURE-----

Merge tag 'soc_drivers_for_4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into next/drivers

ARM: SOC driver update for 4.18

 - AEMIF driver update to support board files and remove
   need of mach-davinci aemif code
 - Use percpu counters for qmss datapath stats
 - License update for TI SCI

* tag 'soc_drivers_for_4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone:
  firmware: ti_sci: Switch to SPDX Licensing
  soc: ti: knav_qmss: Use percpu instead atomic for stats counter
  memory: aemif: add support for board files
  memory: aemif: don't rely on kbuild for driver's name

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-05-14 01:27:47 -07:00
Olof Johansson
e5d9875ecd ti-sysc driver related changes for omap variants
This series improves the ti-sysc interconnect target module driver to
 the point where a most of SoC can be booted with interconnect target
 module data configured in device tree instead of legacy platform data.
 The related device tree changes need some more work though, and can
 wait for v4.19. Also some drivers using nested interconnects like DSS
 need more work.
 
 We can now remove the unused pm-noop code that is not doing anything
 any longer. And we can now initialize things for PM and display pdata
 later to prepare things for using ti-sysc driver.
 
 We also need to add  some more quirk handling so we can boot both with
 platform data and dts data.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAlrsg9QRHHRvbnlAYXRv
 bWlkZS5jb20ACgkQG9Q+yVyrpXNmwxAAzPt1GpHQSw/XhhtK8+DLlqo9fdPQ9C65
 Iw+PyvQYy2bTj0y64VkZ4Msmi3SOhfr5zKhIrwBHEG59/LW81oXqnb9JZHPP+YC+
 /A6ZufXIt8X+Nd1L6id2OD9ItJoLXk7llBvDckwb/zUgcVib9cA79GvWgexfRWBE
 w/bjZfUibddYdhKoCGmvWcZBapDKHMfv8MdN8h0QUyofTIefZZeykRvb1Pmn7Ntl
 vz3QfPUq3oyRfG9PMRI7mjHrW7jxEKgjvWANbUg64UQJN7s1tfa8ICpzycc4/X/a
 pdetH7G+BPaRdeqDCmGrcGHfO4b5HyD7nkTD3R6yzV+Dw8nWl+aWGJHAsPYRUJkd
 o/BroflhqK2ICfEkeK6AWebbicOSlF5P+EEFwp6pHSd/9JiEqR1IkhcCvTdV8CB1
 qyUQxD+iKof+rY5f1EicaGq8HXhkV+9aIOoqBH6C0qObEJDUWvVoGIzDdN2vwVAu
 C2w9WqdQII3R4g2ZX1SmdEqFO/f6PkAoKiyNt+WGBGBUfYo1sfwpkFAEeGU50moJ
 5m9TtLcAbbvgMwy2ttfWcHPn5z3p4Ocf7aN93TZ6RPk6A6R57PzCcYqJ2bXsumeV
 5yaP9w4pbFj+FQuu8jA8s/cSwhIP8SwqwFWKCi2JcU3ugEdJfwF555y5bm0R9MDz
 7W82aAicw+M=
 =jYZ6
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v4.18/ti-sysc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/soc

ti-sysc driver related changes for omap variants

This series improves the ti-sysc interconnect target module driver to
the point where a most of SoC can be booted with interconnect target
module data configured in device tree instead of legacy platform data.
The related device tree changes need some more work though, and can
wait for v4.19. Also some drivers using nested interconnects like DSS
need more work.

We can now remove the unused pm-noop code that is not doing anything
any longer. And we can now initialize things for PM and display pdata
later to prepare things for using ti-sysc driver.

We also need to add  some more quirk handling so we can boot both with
platform data and dts data.

* tag 'omap-for-v4.18/ti-sysc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  bus: ti-sysc: Show module information for suspend if DEBUG is enabled
  bus: ti-sysc: Tag sdio and wdt with legacy mode for suspend
  bus: ti-sysc: Detect UARTs for SYSC_QUIRK_LEGACY_IDLE quirk on omap4
  bus: ti-sysc: Detect omap4 type timers for quirk
  bus: ti-sysc: Add initial support for external resets
  bus: ti-sysc: Improve suspend and resume handling
  bus: ti-sysc: Tag some modules resource providers for noirq suspend
  bus: ti-sysc: Add handling for clkctrl opt clocks
  bus: ti-sysc: Make child clock alias handling more generic
  bus: ti-sysc: Handle simple-bus for nested children
  ARM: OMAP2+: Make display related init into device_initcall
  ARM: OMAP2+: Initialize SoC PM later
  ARM: OMAP2+: Only probe SDMA via ti-sysc if configured in dts
  ARM: OMAP2+: Use signed value for sysc register offsets
  ARM: OMAP2+: Allow using ti-sysc for system timers
  ARM: OMAP2+: Drop unused pm-noop

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-05-14 01:18:44 -07:00
Rohit Jain
943d355d7f sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu()
In the following commit:

  247f2f6f3c ("sched/core: Don't schedule threads on pre-empted vCPUs")

... we distinguish between idle_cpu() when the vCPU is not running for
scheduling threads.

However, the idle_cpu() function is used in other places for
actually checking whether the state of the CPU is idle or not.

Hence split the use of that function based on the desired return value,
by introducing the available_idle_cpu() function.

This fixes a (slight) regression in that initial vCPU commit, because
some code paths (like the load-balancer) don't care and shouldn't care
if the vCPU is preempted or not, they just want to know if there's any
tasks on the CPU.

Signed-off-by: Rohit Jain <rohit.k.jain@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dhaval.giani@oracle.com
Cc: linux-kernel@vger.kernel.org
Cc: matt@codeblueprint.co.uk
Cc: steven.sistare@oracle.com
Cc: subhra.mazumdar@oracle.com
Link: http://lkml.kernel.org/r/1525883988-10356-1-git-send-email-rohit.k.jain@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 09:12:26 +02:00
Sebastian Andrzej Siewior
a59a68fee0 sched/wait: Include <linux/wait.h> in <linux/swait.h>
kbuild bot reported against an intermediate RT patch that the build
fails with:

>    In file included from include/linux/completion.h:12:0,
>                     from include/linux/rcupdate_wait.h:10,
>                     from kernel/rcu/srcutiny.c:27:
>    kernel/rcu/srcutiny.c: In function 'srcu_drive_gp':
> >> include/linux/swait.h:172:7: error: implicit declaration of function '___wait_is_interruptible'; did you mean '__swait_event_interruptible'?
>       if (___wait_is_interruptible(state) && __int) {  \

That error vanishes a few patches later (in the RT queue) because wait.h
is then pulled in by other means. It does not seem to surface on !RT.
I think that swait should include a header file for a function/macro
(___wait_is_interruptible()) it is using.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504104224.20218-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 09:12:25 +02:00
Ard Biesheuvel
cb0ba79352 efi: Align efi_pci_io_protocol typedefs to type naming convention
In order to use the helper macros that perform type mangling with the
EFI PCI I/O protocol struct typedefs, align their Linux typenames with
the convention we use for definitionns that originate in the UEFI spec,
and add the trailing _t to each.

Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-14-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:57:48 +02:00
Yazen Ghannam
f9e1bdb9f3 efi: Decode IA32/X64 Processor Error Section
Recognize the IA32/X64 Processor Error Section.

Do the section decoding in a new "cper-x86.c" file and add this to the
Makefile depending on a new "UEFI_CPER_X86" config option.

Print the Local APIC ID and CPUID info from the Processor Error Record.

The "Processor Error Info" and "Processor Context" fields will be
decoded in following patches.

Based on UEFI 2.7 Table 252. Processor Error Record.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:57:47 +02:00
Yazen Ghannam
742632d237 efi: Fix IA32/X64 Processor Error Record definition
Based on UEFI 2.7 Table 255. Processor Error Record, the "Local APIC_ID"
field is 8 bytes but Linux defines this field as 1 byte.

Fix this in the struct cper_sec_proc_ia definition.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-4-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:57:47 +02:00
Ard Biesheuvel
0b3225ab94 efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
Mixed mode allows a kernel built for x86_64 to interact with 32-bit
EFI firmware, but requires us to define all struct definitions carefully
when it comes to pointer sizes.

'struct efi_pci_io_protocol_32' currently uses a 'void *' for the
'romimage' field, which will be interpreted as a 64-bit field
on such kernels, potentially resulting in bogus memory references
and subsequent crashes.

Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 08:56:29 +02:00
Amir Goldstein
0c8e3fe35d vfs: add the sb_start_intwrite_trylock() helper
Needed by ext4 to test frozen fs before updating s_last_mounted.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-13 22:40:30 -04:00
Linus Torvalds
66e1c94db3 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "A mixed bag of fixes and updates for the ghosts which are hunting us.

  The scheduler fixes have been pulled into that branch to avoid
  conflicts.

   - A set of fixes to address a khread_parkme() race which caused lost
     wakeups and loss of state.

   - A deadlock fix for stop_machine() solved by moving the wakeups
     outside of the stopper_lock held region.

   - A set of Spectre V1 array access restrictions. The possible
     problematic spots were discuvered by Dan Carpenters new checks in
     smatch.

   - Removal of an unused file which was forgotten when the rest of that
     functionality was removed"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Remove unused file
  perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
  perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
  perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
  perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
  perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
  sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
  sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
  sched/core: Introduce set_special_state()
  kthread, sched/wait: Fix kthread_parkme() completion issue
  kthread, sched/wait: Fix kthread_parkme() wait-loop
  sched/fair: Fix the update of blocked load when newly idle
  stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
2018-05-13 10:53:08 -07:00
Marc Zyngier
505287525c irqchip/gic-v3: Add support for Message Based Interrupts as an MSI controller
GICv3 offers the possibility to signal SPIs using a pair of doorbells
(SETPI, CLRSPI) under the name of Message Based Interrupts (MBI).
They can be used as either traditional (edge) MSIs, or the more exotic
level-triggered flavour.

Let's implement support for platform MSI, which is the original intent
for this feature.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lkml.kernel.org/r/20180508121438.11301-8-marc.zyngier@arm.com
2018-05-13 15:59:01 +02:00
Marc Zyngier
6461934371 irqdomain: Let irq_find_host default to DOMAIN_BUS_WIRED
At the beginning of times, irq_find_host() was simple. Each device node
implemented at most one irq domain, and we were happy. Over time, things
have become more complex, and we now have nodes implementing a plurality
of domains, tagged by "bus_token".

Crutially, users of irq_find_host() all expect the most basic domain
to be returned, and not any other domain such as a bus-specific MSI
domain.

So let's change irq_find_host() to first look for a DOMAIN_BUS_WIRED
domain, and only if this fails fallback to DOMAIN_BUS_ANY. Note that
this is consistent with what irq_create_fwspec_mapping is already
doing, see 530cbe100e ("irqdomain: Allow domain lookup with
DOMAIN_BUS_WIRED token").

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lkml.kernel.org/r/20180508121438.11301-6-marc.zyngier@arm.com
2018-05-13 15:59:00 +02:00
Marc Zyngier
8a22a3e1e7 dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
Inclusion of include/dma-iommu.h when CONFIG_IOMMU_DMA is not selected
results in the following splat:

In file included from drivers/irqchip/irq-gic-v3-mbi.c:20:0:
./include/linux/dma-iommu.h:95:69: error: unknown type name ‘dma_addr_t’
 static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
                                                                     ^~~~~~~~~~
./include/linux/dma-iommu.h:108:74: warning: ‘struct list_head’ declared inside parameter list will not be visible outside of this definition or declaration
 static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
                                                                          ^~~~~~~~~
scripts/Makefile.build:312: recipe for target 'drivers/irqchip/irq-gic-v3-mbi.o' failed

Fix it by including linux/types.h.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lkml.kernel.org/r/20180508121438.11301-5-marc.zyngier@arm.com
2018-05-13 15:59:00 +02:00
Marc Zyngier
6988e0e0d2 genirq/msi: Limit level-triggered MSI to platform devices
Nobody would be insane enough to try and use level triggered
MSIs on PCI, but let's make sure it doesn't happen. Also,
let's mandate that the irqchip backing the platform MSI domain
is providing the IRQCHIP_SUPPORTS_LEVEL_MSI flag.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lkml.kernel.org/r/20180508121438.11301-3-marc.zyngier@arm.com
2018-05-13 15:58:59 +02:00
Marc Zyngier
0be8153cbc genirq/msi: Allow level-triggered MSIs to be exposed by MSI providers
So far, MSIs have been used to signal edge-triggered interrupts, as
a write is a good model for an edge (you can't "unwrite" something).
On the other hand, routing zillions of wires in an SoC because you
need level interrupts is a bit extreme.

People have come up with a variety of schemes to support this, which
involves sending two messages: one to signal the interrupt, and one
to clear it. Since the kernel cannot represent this, we've ended up
with side-band mechanisms that are pretty awful.

Instead, let's acknoledge the requirement, and ensure that, under the
right circumstances, the irq_compose_msg and irq_write_msg can take
as a parameter an array of two messages instead of a pointer to a
single one. We also add some checking that the compose method only
clobbers the second message if the MSI domain has been created with
the MSI_FLAG_LEVEL_CAPABLE flags.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lkml.kernel.org/r/20180508121438.11301-2-marc.zyngier@arm.com
2018-05-13 15:58:59 +02:00
Greg Kroah-Hartman
176c2572cd soundwire streaming
This contains:
  - Support for SoundWire Streaming
  - Documentation updates for streaming
  - Cadence and Intel driver updates for streaming
  - ASoC API for programming soundwire stream
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJa98rIAAoJEHwUBw8lI4NHrU4P/2h6FLtl1RETg28dcUiGUXj4
 ALgMej7IbkRXaqEClfA6sU9qe9NCi1UrWD4bEnwetjgMki8/WeJP1pSgoRwZT/YA
 38lle3KoGaptBRf2uZe9hfiKKyvp3oI9XPNHEsk6Qqw0qSSUr8t6k5FAWBiupkiV
 s8R/dYDW/qSEmoWODodFFuJFYr1Xok5L95vt4dyCg4FXglr8Sym8Sk6DmjzcqEy/
 w4YsWKvg62RbOZ4hbuZcwY3NunIvvMAx0+A3xweZaLobpCwJ0KB3ApYZeOybF5px
 ODDRNuh76scgz+UMBr796vfvzPAztbUUwcCmXVJrZkWC98OCSOUXfQSZOKnIRmFL
 AZICxsSPOmip++O3C221bKeI65ldXiPHBFzzmzAHOI5IyMfZREu3bbE69x+cwTUi
 1UKrQb0uQRPCQJtaR9hMc4CadRA/SwhOIk42G5aNJ0R4CRpVpVhixuXUHv9+Huaj
 AHSYr3BKCiAW5FshcDcfl9PUvmgULgO66Nk5YURSEdaY39E+1NElHSeKxbbSYRqc
 SNr+gAUKP9Pqfpa8ToOUvb7yg4X72Tw4U9tUbj0i18vb7robkJD4BwlIikUpyBK/
 w8Ew/6WqatmHvX8GpQPRIuE3IbbGKYJquQjTF9tRwzuJBzWFr3JYRU3dZ8g1q+I7
 o9amLP9RcMVwD+qYFKtA
 =KZwI
 -----END PGP SIGNATURE-----

Merge tag 'soundwire-streaming' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire into char-misc-next

Vinod writes:

soundwire streaming

This contains:
 - Support for SoundWire Streaming
 - Documentation updates for streaming
 - Cadence and Intel driver updates for streaming
 - ASoC API for programming soundwire stream
2018-05-13 11:53:18 +02:00
Mathieu Malaterre
63fab6977a ACPI: Add missing prototype_for arch_post_acpi_subsys_init()
In commit e7ff3a4763 (x86/amd: Check for the C1E bug post ACPI subsystem
init) a new function arch_post_acpi_subsys_init() was introduced. This
weak function can potentially be overridden on a per arch basis, introduce
the prototype for clarity.

Silence the following gcc warning (W=1):

  init/main.c:484:20: warning: no previous prototype for ‘arch_post_acpi_subsys_init’ [-Wmissing-prototypes]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-13 11:17:13 +02:00
Vadim Pasternak
98004a78bb platform_data/mlxreg: Document fixes for hotplug device
Remove redunadant description of label in struct mlxreg_hotplug_device.

Change location of access_mode in struct mlxreg_hotplug_device.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-05-12 15:38:40 -07:00
Brian Masney
c06c4d7935 staging: iio: tsl2x7x/tsl2772: move out of staging
Move the tsl2772 driver out of staging and into mainline.

Signed-off-by: Brian Masney <masneyb@onstation.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2018-05-12 12:40:04 +01:00
Linus Torvalds
f0ab773f5c Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  rbtree: include rcu.h
  scripts/faddr2line: fix error when addr2line output contains discriminator
  ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
  mm, oom: fix concurrent munlock and oom reaper unmap, v3
  mm: migrate: fix double call of radix_tree_replace_slot()
  proc/kcore: don't bounds check against address 0
  mm: don't show nr_indirectly_reclaimable in /proc/vmstat
  mm: sections are not offlined during memory hotremove
  z3fold: fix reclaim lock-ups
  init: fix false positives in W+X checking
  lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
  KASAN: prohibit KASAN+STRUCTLEAK combination
  MAINTAINERS: update Shuah's email address
2018-05-11 18:04:12 -07:00
David S. Miller
b2d6cee117 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The bpf syscall and selftests conflicts were trivial
overlapping changes.

The r8169 change involved moving the added mdelay from 'net' into a
different function.

A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts.  I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 20:53:22 -04:00
Sebastian Andrzej Siewior
2075b16e32 rbtree: include rcu.h
Since commit c1adf20052 ("Introduce rb_replace_node_rcu()")
rbtree_augmented.h uses RCU related data structures but does not include
the header file.  It works as long as it gets somehow included before
that and fails otherwise.

Link: http://lkml.kernel.org/r/20180504103159.19938-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
David Rientjes
27ae357fa8 mm, oom: fix concurrent munlock and oom reaper unmap, v3
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.

This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma.  Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.

This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.

Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP.  The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap().  It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.

This issue fixes CVE-2018-1000200.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes: 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Linus Torvalds
4bc871984f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Verify lengths of keys provided by the user is AF_KEY, from Kevin
    Easton.

 2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.

 3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R.
    Silva.

 4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most
    grateful for this fix.

 5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we
    do appreciate.

 6) Use after free in TLS code. Once again we are blessed by the
    honorable Eric Dumazet with this fix.

 7) Fix regression in TLS code causing stalls on partial TLS records.
    This fix is bestowed upon us by Andrew Tomt.

 8) Deal with too small MTUs properly in LLC code, another great gift
    from Eric Dumazet.

 9) Handle cached route flushing properly wrt. MTU locking in ipv4, to
    Hangbin Liu we give thanks for this.

10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
    Paolo Abeni, he gave us this.

11) Range check coalescing parameters in mlx4 driver, thank you Moshe
    Shemesh.

12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother
    David Howells.

13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens,
    you're the best!

14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata
    Benerjee saved us!

15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship
    is now water tight, thanks to Andrey Ignatov.

16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes
    everywhere!

17) Fix error path in tcf_proto_create, Jiri Pirko what would we do
    without you!

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
  net sched actions: fix refcnt leak in skbmod
  net: sched: fix error path in tcf_proto_create() when modules are not configured
  net sched actions: fix invalid pointer dereferencing if skbedit flags missing
  ixgbe: fix memory leak on ipsec allocation
  ixgbevf: fix ixgbevf_xmit_frame()'s return type
  ixgbe: return error on unsupported SFP module when resetting
  ice: Set rq_last_status when cleaning rq
  ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
  mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
  bonding: send learning packets for vlans on slave
  bonding: do not allow rlb updates to invalid mac
  net/mlx5e: Err if asked to offload TC match on frag being first
  net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
  net/mlx5: Free IRQs in shutdown path
  rxrpc: Trace UDP transmission failure
  rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
  rxrpc: Fix the min security level for kernel calls
  rxrpc: Fix error reception on AF_INET6 sockets
  rxrpc: Fix missing start of call timeout
  qed: fix spelling mistake: "taskelt" -> "tasklet"
  ...
2018-05-11 14:14:46 -07:00
Jens Axboe
28361c4036 libata: add extra internal command
Bump the internal tag to 32, instead of stealing the last tag in
our regular command space. This works just fine, since we don't
actually need a separate hardware tag for this. Internal commands
cannot coexist with NCQ commands.

As a bonus, we get rid of the special casing of what tag to use
for the internal command.

This is in preparation for utilizing all 32 commands for normal IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-11 13:10:44 -07:00
Jens Axboe
2e2cc676ce libata: use ata_tag_internal() consistently
Some check for the value directly, use the provided helper instead.
Also make it return a bool, since that's what it does.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-11 13:10:43 -07:00
Jens Axboe
e3ed893964 libata: bump ->qc_active to a 64-bit type
This is in preparation for allowing full usage of the tag space,
which means that our reserved error handling command will be
using an internal tag value of 32. This doesn't fit in a u32, so
move to a u64.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-11 13:10:43 -07:00
Jens Axboe
5ac40790b4 libata: introduce notion of separate hardware tags
Rigth now these are the same, but drivers should be using ->hw_tag
for their command setup and issue.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-05-11 13:10:42 -07:00
Chuck Lever
51cc257a11 svcrdma: Remove unused svc_rdma_op_ctxt
Clean up: Eliminate a structure that is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
99722fe4d5 svcrdma: Persistently allocate and DMA-map Send buffers
While sending each RPC Reply, svc_rdma_sendto allocates and DMA-
maps a separate buffer where the RPC/RDMA transport header is
constructed. The buffer is unmapped and released in the Send
completion handler. This is significant per-RPC overhead,
especially for small RPCs.

Instead, allocate and DMA-map a buffer, and cache it in each
svc_rdma_send_ctxt. This buffer and its mapping can be re-used
for each RPC, saving the cost of memory allocation and DMA
mapping.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
986b78894b svcrdma: Remove post_send_wr
Clean up: Now that the send_wr is part of the svc_rdma_send_ctxt,
svc_rdma_post_send_wr is nearly empty.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
25fd86eca1 svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt
Receive buffers are always the same size, but each Send WR has a
variable number of SGEs, based on the contents of the xdr_buf being
sent.

While assembling a Send WR, keep track of the number of SGEs so that
we don't exceed the device's maximum, or walk off the end of the
Send SGE array.

For now the Send path just fails if it exceeds the maximum.

The current logic in svc_rdma_accept bases the maximum number of
Send SGEs on the largest NFS request that can be sent or received.
In the transport layer, the limit is actually based on the
capabilities of the underlying device, not on properties of the
Upper Layer Protocol.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
4201c74647 svcrdma: Introduce svc_rdma_send_ctxt
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
Introduce a replacement to svc_rdma_op_ctxt's that is built
especially for the svcrdma Send path.

Subsequent patches will take advantage of this new structure by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.

I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and send_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.

Additional clean ups:
- Handle svc_rdma_send_ctxt_get allocation failure at each call
  site, rather than pre-allocating and hoping we guessed correctly
- All send_ctxt_put call-sites request page freeing, so remove
  the @free_pages argument
- All send_ctxt_put call-sites unmap SGEs, so fold that into
  svc_rdma_send_ctxt_put

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
232627905f svcrdma: Clean up Send SGE accounting
Clean up: Since there's already a svc_rdma_op_ctxt being passed
around with the running count of mapped SGEs, drop unneeded
parameters to svc_rdma_post_send_wr().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
f016f305f9 svcrdma: Refactor svc_rdma_dma_map_buf
Clean up: svc_rdma_dma_map_buf does mostly the same thing as
svc_rdma_dma_map_page, so let's fold these together.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
eb5d7a622e svcrdma: Allocate recv_ctxt's on CPU handling Receives
There is a significant latency penalty when processing an ingress
Receive if the Receive buffer resides in memory that is not on the
same NUMA node as the the CPU handling completions for a CQ.

The system administrator and the device driver determine which CPU
handles completions. This CPU does not change during life of the CQ.
Further the Upper Layer does not have any visibility of which CPU it
is.

Allocating Receive buffers in the Receive completion handler
guarantees that Receive buffers are allocated on the preferred NUMA
node for that CQ.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
3316f06311 svcrdma: Persistently allocate and DMA-map Receive buffers
The current Receive path uses an array of pages which are allocated
and DMA mapped when each Receive WR is posted, and then handed off
to the upper layer in rqstp::rq_arg. The page flip releases unused
pages in the rq_pages pagelist. This mechanism introduces a
significant amount of overhead.

So instead, kmalloc the Receive buffer, and leave it DMA-mapped
while the transport remains connected. This confers a number of
benefits:

* Each Receive WR requires only one receive SGE, no matter how large
  the inline threshold is. This helps the server-side NFS/RDMA
  transport operate on less capable RDMA devices.

* The Receive buffer is left allocated and mapped all the time. This
  relieves svc_rdma_post_recv from the overhead of allocating and
  DMA-mapping a fresh buffer.

* svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
  It has to DMA sync only the number of bytes that were received.

* svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
  for each page in the Receive buffer, making it a constant-time
  function.

* The Receive buffer is now plugged directly into the rq_arg's
  head[0].iov_vec, and can be larger than a page without spilling
  over into rq_arg's page list. This enables simplification of
  the RDMA Read path in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
1e5f416074 svcrdma: Simplify svc_rdma_recv_ctxt_put
Currently svc_rdma_recv_ctxt_put's callers have to know whether they
want to free the ctxt's pages or not. This means the human
developers have to know when and why to set that free_pages
argument.

Instead, the ctxt should carry that information with it so that
svc_rdma_recv_ctxt_put does the right thing no matter who is
calling.

We want to keep track of the number of pages in the Receive buffer
separately from the number of pages pulled over by RDMA Read. This
is so that the correct number of pages can be freed properly and
that number is well-documented.

So now, rc_hdr_count is the number of pages consumed by head[0]
(ie., the page index where the Read chunk should start); and
rc_page_count is always the number of pages that need to be released
when the ctxt is put.

The @free_pages argument is no longer needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
2c577bfea8 svcrdma: Remove sc_rq_depth
Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is
used only in svc_rdma_accept().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00
Chuck Lever
ecf85b2384 svcrdma: Introduce svc_rdma_recv_ctxt
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
To reduce contention further, separate the use of these objects in
the Receive and Send paths in svcrdma.

Subsequent patches will take advantage of this separation by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.

I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.

As a final clean up, helpers related to recv_ctxt are moved closer
to the functions that use them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11 15:48:57 -04:00