Commit graph

65589 commits

Author SHA1 Message Date
Linus Torvalds
d32e5f44a5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - fixes for two long standing issues (lock up and a crash) in force
   feedback handling in uinput driver

 - tweak to firmware update timing in Elan I2C touchpad driver.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elan_i2c - extend Flash-Write delay
  Input: uinput - avoid crash when sending FF request to device going away
  Input: uinput - avoid FF flush when destroying device
2017-09-22 17:23:41 -10:00
Linus Torvalds
c0a3a64e72 Major additions:
- sysctl and seccomp operation to discover available actions. (tyhicks)
 - new per-filter configurable logging infrastructure and sysctl. (tyhicks)
 - SECCOMP_RET_LOG to log allowed syscalls. (tyhicks)
 - SECCOMP_RET_KILL_PROCESS as the new strictest possible action.
 - self-tests for new behaviors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJZxVbTAAoJEIly9N/cbcAmvIAQALR9aVQQXjma4lLhZxwTsLtG
 rJm8t/o4y/2aBV8vzpFbMPT5gfN/PAkHJpCoxVPssx0k4PH2M7HjpnR6E1OC+erg
 RNom3uNdNqZeFlDpdX1qriYiCTB9p6rHe0DPwgG9iGqgDxsJ+G3W+x1sMZ1C+A0M
 shxA3fwt+Qpivo8Zq44xjMFjK+Zeor9V3yPc51QoZktWHlM16ID3HvHVnUtzqAUb
 nTWF6ZlmZlJ/lp4Dq8/55lytVcXPo240G3H0Odai+SNFakK6p5UO//BRBV209bmb
 05jpAOH6uym1sxVz00TQXCtDqOEzs2mQgomtTSShHg8SrLFX7nFkEFtAVA6tEri2
 FqDYce9KX7ZtOYiq83C7pnpAFCouc0z31dQl9USHiAiexXklwBIX+OsVv98omWGi
 pW43uLE2ovY0cpOsN50xI4mnxiGh6MhFcdbor2VLRJwLIFSw3XjjgNCCLyK4AJxs
 N514252qi70c9cWyAHYDLy077yTVxu3JUlsVQKtRTMfoFUq6bX1jPXVXE8qkVrui
 bc/Ay54pPrUwM854IpQ9ZBOuMfs6I5opocGIsBvMaND45U4o2B0ANCsxhuZ0zEtM
 E55DhK5OgjukNemQmlWK2foDckYdtkJXCj2yMBNQady0Uynr2BWZ6VDBP7vFcnRB
 UihRlFZRZleu8383uHsc
 =sKeC
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "Major additions:

   - sysctl and seccomp operation to discover available actions
     (tyhicks)

   - new per-filter configurable logging infrastructure and sysctl
     (tyhicks)

   - SECCOMP_RET_LOG to log allowed syscalls (tyhicks)

   - SECCOMP_RET_KILL_PROCESS as the new strictest possible action

   - self-tests for new behaviors"

[ This is the seccomp part of the security pull request during the merge
  window that was nixed due to unrelated problems   - Linus ]

* tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  samples: Unrename SECCOMP_RET_KILL
  selftests/seccomp: Test thread vs process killing
  seccomp: Implement SECCOMP_RET_KILL_PROCESS action
  seccomp: Introduce SECCOMP_RET_KILL_PROCESS
  seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD
  seccomp: Action to log before allowing
  seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
  seccomp: Selftest for detection of filter flag support
  seccomp: Sysctl to configure actions that are allowed to be logged
  seccomp: Operation for checking if an action is available
  seccomp: Sysctl to display available actions
  seccomp: Provide matching filter for introspection
  selftests/seccomp: Refactor RET_ERRNO tests
  selftests/seccomp: Add simple seccomp overhead benchmark
  selftests/seccomp: Add tests for basic ptrace actions
2017-09-22 16:16:41 -10:00
Linus Walleij
a9a1d2a782 pinctrl/gpio: Unify namespace for cross-calls
The pinctrl_request_gpio() and pinctrl_free_gpio() break the nice
namespacing in the other cross-calls like pinctrl_gpio_foo().
Just rename them and all references so we have one namespace
with all cross-calls under pinctrl_gpio_*().

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-22 11:02:10 +02:00
Dmitry Torokhov
9c4089e87a Immutable branch between MFD, Input and RTC due for the v3.14 merge window
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZkp3BAAoJEFGvii+H/Hdhtl8QAITZ/6R+aRQC8eFilvo9/9qv
 HTJVvEOZBa5afupmxQlNwuzFW5Zdou21Cl3Y0360hbRE0L3uOFGPBS4iBNIc9FLj
 cMFw4yuSVI98rLXZlmfSrRS/ZZuMbih0uu9/By7WDbgPOv86C8SrLt4DJln0ujjg
 4GmXbEzJlrULDvQQofaXtfVpfjTORjeiscgYVF4wv7rPYu9+4+wn3aXBi51I2gLb
 pkztJ0eAhyIAVxgPGldfa+1y4B1R3edJMfpFztHMVBaIg1aOkqcNC8Cs2MZpqxml
 jI3QpYFDw3q0iQ3UF8G1omcmGOItv2Jr2BGvJuyTApAe0mS47jlI+2djWlEWgHEd
 P3qhInsYX85xL4AYRUxnYxW7qmTq99DffvjHa2YzHMNw1c2q29ops+sVxvrsuHyw
 WBjBzKOXdzbDPdiVuaexP1jP/TPxw+V9l8hgkuqavONKwIN3aLeIRET1fZvO3eTG
 TeQgP37wJSC9uK4mSzcx32K9q1+XCYZCyPU9XZS+08Hq7Lxi3chqdfhRX6AUBCxW
 DiR1lOU1uIK8uZJHOpUDWuOiWIOWeWmkR82CXo88WivWiPW0MHXrfPPZMfxsOAa9
 /PAbobKpSXPZpD3mh1lm7x8f1mhgevZ9jlYVTu5LcVMnNSsLQP0r3p5ytGj7sMuz
 ltd2v7liFJ4CWOwpQPZu
 =j+Q1
 -----END PGP SIGNATURE-----

Merge tag 'ib-mfd-input-rtc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next

Merge "Immutable branch between MFD, Input and RTC due for the v3.14
merge window" to have dm355evm_msp.h header moved into right place.
2017-09-21 16:41:15 -07:00
Dmitry Torokhov
95a0c7c2d6 Immutable branch between MFD and many other subsystems due for the v4.14 merge window
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZrVgvAAoJEFGvii+H/Hdhw9AP/04kewyz1WwobD+0SZsujELP
 768c92WNmMqux8OsRVUpjqcCsepi+W8ppM2tmV9jJ+rwb9SeT+cOxnaUPrlHMuK+
 zQyc4F4PRlyxfrFZ0tr/VrTHvhsmdmhEux34zMrdKggeShwHAkEQhNUFTEo3efKs
 J32H2BuDTcbnbiqz1Lg00NzIFOEhvpSsplgUQtz7NnG1y8T9U0kLupoXkNquIAj9
 SAP9LTNyUlPqlQ0Ku0S77Zr8R9K202T9xi1RGGoscYiM3421WJA/+4S9RTqfAVje
 atAOfS+nNnaxkeBYJT/wZ71zINdbhj0NKsGa/aah4hGIpbvuwouWPy+8PqyugKYy
 M6uBpjo1uk1gu+kYruzNXYmKLH+F8W8bTMNiovJ2bx4qP08FlB/4X+BCL9Hy00/Z
 btOz1cBTEjY2aUND84b2qZLkmGbH4VTGFS3TAr0TqsM2hQH8ThxP2f+tM7Hseupl
 SvaahUYXiqTNexErLQD/Oya6QKZgoJvUmboGGO65BQmdXeHXoA3hZBltp7+aEBb4
 IYG3eWwY5Shj3jpB16jDAioC43B4hHiLUWDGJtquFuscXJr5WkfeKMBg7PrY46Rh
 reHsYAMhLVmOUBe77NyAEyVjQcBtpHlpkETvsCjM3tP/GqWlHLR+1jhech3Ip3LA
 X7ODA7pC9iGSY2ePTiCj
 =nb+l
 -----END PGP SIGNATURE-----

Merge tag 'ib-mfd-many-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next

Merge "Immutable branch between MFD and many other subsystems due for
the v4.14 merge window" to get the TWL headers moved to the right place.
2017-09-21 16:38:09 -07:00
Dmitry Torokhov
e8b95728f7 Input: uinput - avoid FF flush when destroying device
Normally, when input device supporting force feedback effects is being
destroyed, we try to "flush" currently playing effects, so that the
physical device does not continue vibrating (or executing other effects).
Unfortunately this does not work well for uinput as flushing of the effects
deadlocks with the destroy action:

- if device is being destroyed because the file descriptor is being closed,
  then there is noone to even service FF requests;

- if device is being destroyed because userspace sent UI_DEV_DESTROY,
  while theoretically it could be possible to service FF requests,
  userspace is unlikely to do so (they'd need to make sure FF handling
  happens on a separate thread) even if kernel solves the issue with FF
  ioctls deadlocking with UI_DEV_DESTROY ioctl on udev->mutex.

To avoid lockups like the one below, let's install a custom input device
flush handler, and avoid trying to flush force feedback effects when we
destroying the device, and instead rely on uinput to shut off the device
properly.

NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
...
 <<EOE>>  [<ffffffff817a0307>] _raw_spin_lock_irqsave+0x37/0x40
 [<ffffffff810e633d>] complete+0x1d/0x50
 [<ffffffffa00ba08c>] uinput_request_done+0x3c/0x40 [uinput]
 [<ffffffffa00ba587>] uinput_request_submit.part.7+0x47/0xb0 [uinput]
 [<ffffffffa00bb62b>] uinput_dev_erase_effect+0x5b/0x76 [uinput]
 [<ffffffff815d91ad>] erase_effect+0xad/0xf0
 [<ffffffff815d929d>] flush_effects+0x4d/0x90
 [<ffffffff815d4cc0>] input_flush_device+0x40/0x60
 [<ffffffff815daf1c>] evdev_cleanup+0xac/0xc0
 [<ffffffff815daf5b>] evdev_disconnect+0x2b/0x60
 [<ffffffff815d74ac>] __input_unregister_device+0xac/0x150
 [<ffffffff815d75f7>] input_unregister_device+0x47/0x70
 [<ffffffffa00bac45>] uinput_destroy_device+0xb5/0xc0 [uinput]
 [<ffffffffa00bb2de>] uinput_ioctl_handler.isra.9+0x65e/0x740 [uinput]
 [<ffffffff811231ab>] ? do_futex+0x12b/0xad0
 [<ffffffffa00bb3f8>] uinput_ioctl+0x18/0x20 [uinput]
 [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
 [<ffffffff81337553>] ? security_file_ioctl+0x43/0x60
 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71

Reported-by: Rodrigo Rivas Costa <rodrigorivascosta@gmail.com>
Reported-by: Clément VUCHENER <clement.vuchener@gmail.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=193741
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-09-21 16:31:22 -07:00
Paolo Abeni
6e617de84e net: avoid a full fib lookup when rp_filter is disabled.
Since commit 1dced6a854 ("ipv4: Restore accept_local behaviour
in fib_validate_source()") a full fib lookup is needed even if
the rp_filter is disabled, if accept_local is false - which is
the default.

What we really need in the above scenario is just checking
that the source IP address is not local, and in most case we
can do that is a cheaper way looking up the ifaddr hash table.

This commit adds a helper for such lookup, and uses it to
validate the src address when rp_filter is disabled and no
'local' routes are created by the user space in the relevant
namespace.

A new ipv4 netns flag is added to account for such routes.
We need that to preserve the same behavior we had before this
patch.

It also drops the checks to bail early from __fib_validate_source,
added by the commit 1dced6a854 ("ipv4: Restore accept_local
behaviour in fib_validate_source()") they do not give any
measurable performance improvement: if we do the lookup with are
on a slower path.

This improves UDP performances for unconnected sockets
when rp_filter is disabled by 5% and also gives small but
measurable performance improvement for TCP flood scenarios.

v1 -> v2:
 - use the ifaddr lookup helper in __ip_dev_find(), as suggested
   by Eric
 - fall-back to full lookup if custom local routes are present

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:15:22 -07:00
Stefan Brüns
c2cbd4276e dmaengine: Mark struct dma_slave_caps kernel-doc correctly, clarify
struct dma_slave_caps documentation omitted the correct kernel-doc
opening comment mark.

Document byte granularity and interpretation of the src/dst_addr_widths
bit flag fields used by struct dma_slave_caps and struct dma_device.

Add punctuation to their "directions" member documentations, and cleanup
wording of the description.

Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-09-21 22:34:28 +05:30
Stefan Brüns
3f7632e1ba dmaengine: List all allowed values for src/dst_addr_width in kernel doc
Commit 93c6ee94c1 ("dma: Support for 3 bytes word size") and
commit 534a729866 ("dmaengine: Add 16 bytes, 32 bytes and 64 bytes
bus widths") added additional values for the allowed word size, but
omitted these from the struct dma_slave_config documentation.

Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-09-21 22:34:28 +05:30
Thomas Gleixner
0551968add Revert "genirq: Restrict effective affinity to interrupts actually using it"
This reverts commit 74def747bc.

The change to the helper function is only correct for the /proc/irq/
readout usage, but breaks the existing x86 usage of that function.

Reported-by: Yanko Kaneti <yaneti@declera.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
2017-09-21 11:54:44 +02:00
Yonghong Song
ec9dd352d5 bpf: one perf event close won't free bpf program attached by another perf event
This patch fixes a bug exhibited by the following scenario:
  1. fd1 = perf_event_open with attr.config = ID1
  2. attach bpf program prog1 to fd1
  3. fd2 = perf_event_open with attr.config = ID1
     <this will be successful>
  4. user program closes fd2 and prog1 is detached from the tracepoint.
  5. user program with fd1 does not work properly as tracepoint
     no output any more.

The issue happens at step 4. Multiple perf_event_open can be called
successfully, but only one bpf prog pointer in the tp_event. In the
current logic, any fd release for the same tp_event will free
the tp_event->prog.

The fix is to free tp_event->prog only when the closing fd
corresponds to the one which registered the program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 14:10:29 -07:00
Charles Keepax
85e7dd3f87 ASoC: arizona: Add support for setting the output volume limits
The output volume limits allow signals to be limited to specific levels
appropriate for the hardware attached. As this is a property of the
hardware itself these will be configured through device tree.

Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-09-20 17:34:35 +01:00
Eric Dumazet
bffa72cf7f net: sk_buff rbnode reorg
skb->rbnode shares space with skb->next, skb->prev and skb->tstamp

Current uses (TCP receive ofo queue and netem) need to save/restore
tstamp, while skb->dev is either NULL (TCP) or a constant for a given
queue (netem).

Since we plan using an RB tree for TCP retransmit queue to speedup SACK
processing with large BDP, this patch exchanges skb->dev and
skb->tstamp.

This saves some overhead in both TCP and netem.

v2: removes the swtstamp field from struct tcp_skb_cb

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 15:20:22 -07:00
Al Viro
3968cf6238 get_compat_sigset()
similar to put_compat_sigset()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-19 17:56:01 -04:00
Al Viro
b8e8e1aa9f get rid of {get,put}_compat_itimerspec()
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-19 17:56:00 -04:00
Dmitry V. Levin
f454322efb signal: replace sigset_to_compat() with put_compat_sigset()
There are 4 callers of sigset_to_compat() in the entire kernel.  One is
in sparc compat rt_sigaction(2), the rest are in kernel/signal.c itself.
All are followed by copy_to_user(), and all but the sparc one are under
"if it's big-endian..." ifdefs.

Let's transform sigset_to_compat() into put_compat_sigset() that also
calls copy_to_user().

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-19 17:55:54 -04:00
Arnd Bergmann
aa767cfb75 of: provide inline helper for of_find_device_by_node
The ipmmu-vmsa driver fails in compile-testing on non-OF platforms:

drivers/iommu/ipmmu-vmsa.o: In function `ipmmu_of_xlate':
ipmmu-vmsa.c:(.text+0x740): undefined reference to `of_find_device_by_node'

It would be reasonable to assume that this interface works but
returns failure on non-OF builds, like it does on machines that
have been booted in another way, so this adds another inline
function helper.

Fixes: 7b2d59611f ("iommu/ipmmu-vmsa: Replace local utlb code with fwspec ids")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-09-19 12:20:44 -05:00
Robert Jarzmik
a5c6951c49 mfd: wm97xx-core: core support for wm97xx Codec
The WM9705, WM9712 and WM9713 are highly integrated codecs, with an
audio codec, DAC and ADC, GPIO unit and a touchscreen interface.

Historically the support was spread across drivers/input/touchscreen and
sound/soc/codecs. The sharing was done through ac97 bus sharing. This
model will not withstand the new AC97 bus model, where codecs are
discovered on runtime.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-09-19 17:07:24 +01:00
Greg Kroah-Hartman
850fdec8d2 driver core: remove DRIVER_ATTR
DRIVER_ATTR is no longer in use, and driver authors should be using
DRIVER_ATTR_RW() or DRIVER_ATTR_RO() or DRIVER_ATTR_WO() instead in
order to always get the permissions correct.  So remove it so that no
one can use it anymore.

Acked-by: Alan Tull <atull@kernel.org>
Reviewed-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-19 09:20:33 +02:00
Hans de Goede
f4ac647694 libata: Add new med_power_with_dipm link_power_management_policy setting
As described by Matthew Garret quite a while back:
https://mjg59.dreamwidth.org/34868.html

Intel CPUs starting with the Haswell generation need SATA links to power
down for the "package" part of the CPU to reach low power-states like
PC7 / P8 which bring a significant power-saving with them.

The default max_performance lpm policy does not allow for these high
PC states, both the medium_power and min_power policies do allow this.

The min_power policy saves significantly more power, but there are some
reports of some disks / SSDs not liking min_power leading to system
crashes and in some cases even data corruption has been reported.

Matthew has found a document documenting the default settings of
Intel's IRST Windows driver with which most laptops ship:
https://www-ssl.intel.com/content/dam/doc/reference-guide/sata-devices-implementation-recommendations.pdf

Matthew wrote a patch changing med_power to match those defaults, but
that never got anywhere as some people where reporting issues with the
patch-set that patch was a part of.

This commit is another attempt to make the default IRST driver settings
available under Linux, but instead of changing medium_power and
potentially introducing regressions, this commit adds a new
med_power_with_dipm setting which is identical to the existing
medium_power accept that it enables dipm on top, which makes it match
the Windows IRST driver settings, which should hopefully be safe to
use on most devices.

The med_power_with_dipm setting is close to min_power, except that:
a) It does not use host-initiated slumber mode (ASP not set),
   but it does allow device-initiated slumber
b) It does not enable DevSlp mode

On my T440s test laptop I get the following power savings when idle:
medium_power		0.9W
med_power_with_dipm	1.2W
min_power		1.2W

Suggested-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-09-18 20:22:04 -07:00
Uwe Kleine-König
ef838a81dd serial: Add common rs485 device tree parsing function
Several drivers have the same device tree parsing code. Create
a common helper function for it.

This patch bases on work done by Sascha Hauer.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-18 18:36:25 +02:00
Masahiro Yamada
14157f8614 mtd: nand: introduce NAND_ROW_ADDR_3 flag
Several drivers check ->chipsize to see if the third row address cycle
is needed.  Instead of embedding magic sizes such as 32MB, 128MB in
drivers, introduce a new flag NAND_ROW_ADDR_3 for clean-up.  Since
nand_scan_ident() knows well about the device, it can handle this
properly.  The flag is set if the row address bit width is greater
than 16.

Delete comments such as "One more address cycle for ..." because
intention is now clear enough from the code.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Wenyou Yang <wenyou.yang@microchip.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-09-18 14:55:52 +02:00
Geert Uytterhoeven
74378c5c8c driver core: Fix link to device power management documentation
Correct location as of commit 2728b2d2e5 (PM / core / docs:
Convert sleep states API document to reST).

Fixes: 2728b2d2e5 (PM / core / docs: Convert sleep states API document to reST)
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-09-18 13:22:30 +02:00
Guenter Roeck
4b4e02c831 typec: tcpm: Move out of staging
Move tcpm (USB Type-C Port Manager) out of staging.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-18 10:58:31 +02:00
Thomas Garnier
bf29ed1567 syscalls: Use CHECK_DATA_CORRUPTION for addr_limit_user_check
Use CHECK_DATA_CORRUPTION instead of BUG_ON to provide more flexibility
on address limit failures. By default, send a SIGKILL signal to kill the
current process preventing exploitation of a bad address limit.

Make the TIF_FSCHECK flag optional so ARM can use this function.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Will Drewry <wad@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1504798247-48833-2-git-send-email-keescook@chromium.org
2017-09-17 19:45:32 +02:00
Lars-Peter Clausen
f3ae7d9155 dmaengine: xilinx_dma: Move enum xdma_ip_type to driver file
The enum xdma_ip_type is only used inside the Xilinx DMA driver and not
exported to any consumers (nor should it be). So move it from the global
header to driver file itself.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-09-17 18:59:54 +05:30
Linus Torvalds
48bddb143b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger.

 2) Fix double-free in rmnet driver, from Dan Carpenter.

 3) INET connection socket layer can double put request sockets, fix
    from Eric Dumazet.

 4) Don't match collect metadata-mode tunnels if the device is down,
    from Haishuang Yan.

 5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in
    be2net driver, from Suresh Reddy.

 6) Fix scaling error in gen_estimator, from Eric Dumazet.

 7) Fix 64-bit statistics deadlock in systemport driver, from Florian
    Fainelli.

 8) Fix use-after-free in sctp_sock_dump, from Xin Long.

 9) Reject invalid BPF_END instructions in verifier, from Edward Cree.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
  Documentation: link in networking docs
  tcp: fix data delivery rate
  bpf/verifier: reject BPF_ALU64|BPF_END
  sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
  sctp: fix an use-after-free issue in sctp_sock_dump
  netvsc: increase default receive buffer size
  tcp: update skb->skb_mstamp more carefully
  net: ipv4: fix l3slave check for index returned in IP_PKTINFO
  net: smsc911x: Quieten netif during suspend
  net: systemport: Fix 64-bit stats deadlock
  net: vrf: avoid gcc-4.6 warning
  qed: remove unnecessary call to memset
  tg3: clean up redundant initialization of tnapi
  tls: make tls_sw_free_resources static
  sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
  MAINTAINERS: review Renesas DT bindings as well
  net_sched: gen_estimator: fix scaling error in bytes/packets samples
  nfp: wait for the NSP resource to appear on boot
  nfp: wait for board state before talking to the NSP
  ...
2017-09-16 11:28:59 -07:00
Linus Torvalds
7318413077 Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This is the main pull request for 4.14 for MIPS; below a summary of
  the non-merge commits:

  CM:
   - Rename mips_cm_base to mips_gcr_base
   - Specify register size when generating accessors
   - Use BIT/GENMASK for register fields, order & drop shifts
   - Add cluster & block args to mips_cm_lock_other()

  CPC:
   - Use common CPS accessor generation macros
   - Use BIT/GENMASK for register fields, order & drop shifts
   - Introduce register modify (set/clear/change) accessors
   - Use change_*, set_* & clear_* where appropriate
   - Add CM/CPC 3.5 register definitions
   - Use GlobalNumber macros rather than magic numbers
   - Have asm/mips-cps.h include CM & CPC headers
   - Cluster support for topology functions
   - Detect CPUs in secondary clusters

  CPS:
   - Read GIC_VL_IDENT directly, not via irqchip driver

  DMA:
   - Consolidate coherent and non-coherent dma_alloc code
   - Don't use dma_cache_sync to implement fd_cacheflush

  FPU emulation / FP assist code:
   - Another series of 14 commits fixing corner cases such as NaN
     propgagation and other special input values.
   - Zero bits 32-63 of the result for a CLASS.D instruction.
   - Enhanced statics via debugfs
   - Do not use bools for arithmetic. GCC 7.1 moans about this.
   - Correct user fault_addr type

  Generic MIPS:
   - Enhancement of stack backtraces
   - Cleanup from non-existing options
   - Handle non word sized instructions when examining frame
   - Fix detection and decoding of ADDIUSP instruction
   - Fix decoding of SWSP16 instruction
   - Refactor handling of stack pointer in get_frame_info
   - Remove unreachable code from force_fcr31_sig()
   - Convert to using %pOF instead of full_name
   - Remove the R6000 support.
   - Move FP code from *_switch.S to *_fpu.S
   - Remove unused ST_OFF from r2300_switch.S
   - Allow platform to specify multiple its.S files
   - Add #includes to various files to ensure code builds reliable and
     without warning..
   - Remove __invalidate_kernel_vmap_range
   - Remove plat_timer_setup
   - Declare various variables & functions static
   - Abstract CPU core & VP(E) ID access through accessor functions
   - Store core & VP IDs in GlobalNumber-style variable
   - Unify checks for sibling CPUs
   - Add CPU cluster number accessors
   - Prevent direct use of generic_defconfig
   - Make CONFIG_MIPS_MT_SMP default y
   - Add __ioread64_copy
   - Remove unnecessary inclusions of linux/irqchip/mips-gic.h

  GIC:
   - Introduce asm/mips-gic.h with accessor functions
   - Use new GIC accessor functions in mips-gic-timer
   - Remove counter access functions from irq-mips-gic.c
   - Remove gic_read_local_vp_id() from irq-mips-gic.c
   - Simplify shared interrupt pending/mask reads in irq-mips-gic.c
   - Simplify gic_local_irq_domain_map() in irq-mips-gic.c
   - Drop gic_(re)set_mask() functions in irq-mips-gic.c
   - Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(),
     gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c.
   - Convert remaining shared reg access, local int mask access and
     remaining local reg access to new accessors
   - Move GIC_LOCAL_INT_* to asm/mips-gic.h
   - Remove GIC_CPU_INT* macros from irq-mips-gic.c
   - Move various definitions to the driver
   - Remove gic_get_usm_range()
   - Remove __gic_irq_dispatch() forward declaration
   - Remove gic_init()
   - Use mips_gic_present() in place of gic_present and remove
     gic_present
   - Move gic_get_c0_*_int() to asm/mips-gic.h
   - Remove linux/irqchip/mips-gic.h
   - Inline __gic_init()
   - Inline gic_basic_init()
   - Make pcpu_masks a per-cpu variable
   - Use pcpu_masks to avoid reading GIC_SH_MASK*
   - Clean up mti, reserved-cpu-vectors handling
   - Use cpumask_first_and() in gic_set_affinity()
   - Let the core set struct irq_common_data affinity

  microMIPS:
   - Fix microMIPS stack unwinding on big endian systems

  MIPS-GIC:
   - SYNC after enabling GIC region

  NUMA:
   - Remove the unused parent_node() macro

  R6:
   - Constify r2_decoder_tables
   - Add accessor & bit definitions for GlobalNumber

  SMP:
   - Constify smp ops
   - Allow boot_secondary SMP op to return errors

  VDSO:
   - Drop gic_get_usm_range() usage
   - Avoid use of linux/irqchip/mips-gic.h

  Platform changes:

  Alchemy:
   - Add devboard machine type to cpuinfo
   - update cpu feature overrides
   - Threaded carddetect irqs for devboards

  AR7:
   - allow NULL clock for clk_get_rate

  BCM63xx:
   - Fix ENETDMA_6345_MAXBURST_REG offset
   - Allow NULL clock for clk_get_rate

  CI20:
   - Enable GPIO and RTC drivers in defconfig
   - Add ethernet and fixed-regulator nodes to DTS

  Generic platform:
   - Move Boston and NI 169445 FIT image source to their own files
   - Include asm/bootinfo.h for plat_fdt_relocated()
   - Include asm/time.h for get_c0_*_int()
   - Include asm/bootinfo.h for plat_fdt_relocated()
   - Include asm/time.h for get_c0_*_int()
   - Allow filtering enabled boards by requirements
   - Don't explicitly disable CONFIG_USB_SUPPORT
   - Bump default NR_CPUS to 16

  JZ4700:
   - Probe the jz4740-rtc driver from devicetree

  Lantiq:
   - Drop check of boot select from the spi-falcon driver.
   - Drop check of boot select from the lantiq-flash MTD driver.
   - Access boot cause register in the watchdog driver through regmap
   - Add device tree binding documentation for the watchdog driver
   - Add docs for the RCU DT bindings.
   - Convert the fpi bus driver to a platform_driver
   - Remove ltq_reset_cause() and ltq_boot_select(
   - Switch to a proper reset driver
   - Switch to a new drivers/soc GPHY driver
   - Add an USB PHY driver for the Lantiq SoCs using the RCU module
   - Use of_platform_default_populate instead of __dt_register_buses
   - Enable MFD_SYSCON to be able to use it for the RCU MFD
   - Replace ltq_boot_select() with dummy implementation.

  Loongson 2F:
   - Allow NULL clock for clk_get_rate

  Malta:
   - Use new GIC accessor functions

  NI 169445:
   - Add support for NI 169445 board.
   - Only include in 32r2el kernels

  Octeon:
   - Add support for watchdog of 78XX SOCs.
   - Add support for watchdog of CN68XX SOCs.
   - Expose support for mips32r1, mips32r2 and mips64r1
   - Enable more drivers in config file
   - Add support for accessing the boot vector.
   - Remove old boot vector code from watchdog driver
   - Define watchdog registers for 70xx, 73xx, 78xx, F75xx.
   - Make CSR functions node aware.
   - Allow access to CIU3 IRQ domains.
   - Misc cleanups in the watchdog driver

  Omega2+:
   - New board, add support and defconfig

  Pistachio:
   - Enable Root FS on NFS in defconfig

  Ralink:
   - Add Mediatek MT7628A SoC
   - Allow NULL clock for clk_get_rate
   - Explicitly request exclusive reset control in the pci-mt7620 PCI driver.

  SEAD3:
   - Only include in 32 bit kernels by default

  VoCore:
   - Add VoCore as a vendor t0 dt-bindings
   - Add defconfig file"

* '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits)
  MIPS: Refactor handling of stack pointer in get_frame_info
  MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems
  MIPS: microMIPS: Fix decoding of swsp16 instruction
  MIPS: microMIPS: Fix decoding of addiusp instruction
  MIPS: microMIPS: Fix detection of addiusp instruction
  MIPS: Handle non word sized instructions when examining frame
  MIPS: ralink: allow NULL clock for clk_get_rate
  MIPS: Loongson 2F: allow NULL clock for clk_get_rate
  MIPS: BCM63XX: allow NULL clock for clk_get_rate
  MIPS: AR7: allow NULL clock for clk_get_rate
  MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset
  mips: Save all registers when saving the frame
  MIPS: Add DWARF unwinding to assembly
  MIPS: Make SAVE_SOME more standard
  MIPS: Fix issues in backtraces
  MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree
  MIPS: Ci20: Enable RTC driver
  watchdog: octeon-wdt: Add support for 78XX SOCs.
  watchdog: octeon-wdt: Add support for cn68XX SOCs.
  watchdog: octeon-wdt: File cleaning.
  ...
2017-09-15 20:43:33 -07:00
Linus Torvalds
9db59599ae * PPC bugfixes
* RCU splat fix
 * swait races fix
 * pointless userspace-triggerable BUG() fix
 * misc fixes for KVM_RUN corner cases
 * nested virt correctness fixes + one host DoS
 * some cleanups
 * clang build fix
 * fix AMD AVIC with default QEMU command line options
 * x86 bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZvANtAAoJEL/70l94x66DtcIH/0i4fenYamxdq2xiWtsZdbcy
 yfk7mWKEzWGZhP+2X8SSOeetd5mqnIcf2cc4m68UCXpt0zoPEjY0i0D4xrYJHZ03
 R3ifqvtpHByodfT7dOKQPEisO8PdJ5tvecaCMnK3u6SNaNLjAZfhobuLppQHOwQO
 eBvpm0jROpA7ENlDgXtsti8MEdsoWtnmGGrRBY77EGW+t24OpNuGB1EMC0nvcs65
 eChwZ3u8xeU5Ws3Y/DiC8tK8t628znknd8ay02LTZjA303Ftoe192jPpS33V4v15
 kqS6vUFy2lpr9L6wicZtcnnSLtKv+LqecK6o8cxNjzlkOeaZuo9D8UMYsWQfj6w=
 =Ma23
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 - PPC bugfixes
 - RCU splat fix
 - swait races fix
 - pointless userspace-triggerable BUG() fix
 - misc fixes for KVM_RUN corner cases
 - nested virt correctness fixes + one host DoS
 - some cleanups
 - clang build fix
 - fix AMD AVIC with default QEMU command line options
 - x86 bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
  kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
  kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
  kvm,mips: Fix potential swait_active() races
  kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
  kvm: Serialize wq active checks in kvm_vcpu_wake_up()
  kvm,x86: Fix apf_task_wake_one() wq serialization
  kvm,lapic: Justify use of swait_active()
  kvm,async_pf: Use swq_has_sleeper()
  sched/wait: Add swq_has_sleeper()
  KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
  KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
  kvm: nVMX: Don't allow L2 to access the hardware CR8
  KVM: trace events: update list of exit reasons
  KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
  KVM: X86: Don't block vCPU if there is pending exception
  KVM: SVM: Add irqchip_split() checks before enabling AVIC
  KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
  KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
  KVM: x86: fix clang build
  ...
2017-09-15 15:43:55 -07:00
Davidlohr Bueso
8cd641e3c7 sched/wait: Add swq_has_sleeper()
Which is the equivalent of what we have in regular waitqueues.
I'm not crazy about the name, but this also helps us get both
apis closer -- which iirc comes originally from the -net folks.

We also duplicate the comments for the lockless swait_active(),
from wait.h. Future users will make use of this interface.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15 16:57:10 +02:00
Mimi Zohar
711aab1dbb vfs: constify path argument to kernel_read_file_from_path
This patch constifies the path argument to kernel_read_file_from_path().

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-14 20:18:45 -07:00
Linus Torvalds
e253d98f5b Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull nowait read support from Al Viro:
 "Support IOCB_NOWAIT for buffered reads and block devices"

* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  block_dev: support RFW_NOWAIT on block device nodes
  fs: support RWF_NOWAIT for buffered reads
  fs: support IOCB_NOWAIT in generic_file_buffered_read
  fs: pass iocb to do_generic_file_read
2017-09-14 19:29:55 -07:00
Linus Torvalds
0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Linus Torvalds
581bfce969 Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more set_fs removal from Al Viro:
 "Christoph's 'use kernel_read and friends rather than open-coding
  set_fs()' series"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: unexport vfs_readv and vfs_writev
  fs: unexport vfs_read and vfs_write
  fs: unexport __vfs_read/__vfs_write
  lustre: switch to kernel_write
  gadget/f_mass_storage: stop messing with the address limit
  mconsole: switch to kernel_read
  btrfs: switch write_buf to kernel_write
  net/9p: switch p9_fd_read to kernel_write
  mm/nommu: switch do_mmap_private to kernel_read
  serial2002: switch serial2002_tty_write to kernel_{read/write}
  fs: make the buf argument to __kernel_write a void pointer
  fs: fix kernel_write prototype
  fs: fix kernel_read prototype
  fs: move kernel_read to fs/read_write.c
  fs: move kernel_write to fs/read_write.c
  autofs4: switch autofs4_write to __kernel_write
  ashmem: switch to ->read_iter
2017-09-14 18:13:32 -07:00
Linus Torvalds
cc73fee0ba Merge branch 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ipc compat cleanup and 64-bit time_t from Al Viro:
 "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
  Dinamani"

* 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  utimes: Make utimes y2038 safe
  ipc: shm: Make shmid_kernel timestamps y2038 safe
  ipc: sem: Make sem_array timestamps y2038 safe
  ipc: msg: Make msg_queue timestamps y2038 safe
  ipc: mqueue: Replace timespec with timespec64
  ipc: Make sys_semtimedop() y2038 safe
  get rid of SYSVIPC_COMPAT on ia64
  semtimedop(): move compat to native
  shmat(2): move compat to native
  msgrcv(2), msgsnd(2): move compat to native
  ipc(2): move compat to native
  ipc: make use of compat ipc_perm helpers
  semctl(): move compat to native
  semctl(): separate all layout-dependent copyin/copyout
  msgctl(): move compat to native
  msgctl(): split the actual work from copyin/copyout
  ipc: move compat shmctl to native
  shmctl: split the work from copyin/copyout
2017-09-14 17:37:26 -07:00
Linus Torvalds
e7cdb60fd2 Merge branch 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull zstd support from Chris Mason:
 "Nick Terrell's patch series to add zstd support to the kernel has been
  floating around for a while. After talking with Dave Sterba, Herbert
  and Phillip, we decided to send the whole thing in as one pull
  request.

  zstd is a big win in speed over zlib and in compression ratio over
  lzo, and the compression team here at FB has gotten great results
  using it in production. Nick will continue to update the kernel side
  with new improvements from the open source zstd userland code.

  Nick has a number of benchmarks for the main zstd code in his lib/zstd
  commit:

      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
      of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
      Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
      `silesia.tar` [3], which is 211,988,480 B large. Run the following
      commands for the benchmark:

        sudo modprobe zstd_compress_test
        sudo mknod zstd_compress_test c 245 0
        sudo cp silesia.tar zstd_compress_test

      The time is reported by the time of the userland `cp`.
      The MB/s is computed with

        1,536,217,008 B / time(buffer size, hash)

      which includes the time to copy from userland.
      The Adjusted MB/s is computed with

        1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

      The memory reported is the amount of memory the compressor
      requests.

        | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
        |----------|----------|----------|-------|---------|----------|----------|
        | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
        | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
        | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
        | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
        | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
        | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
        | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
        | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
        | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
        | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
        | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |

      I benchmarked zstd decompression using the same method on the same
      machine. The benchmark file is located in the upstream zstd repo
      under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
      memory reported is the amount of memory required to decompress
      data compressed with the given compression level. If you know the
      maximum size of your input, you can reduce the memory usage of
      decompression irrespective of the compression level.

        | Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
        |----------|----------|---------|---------------|-------------|
        | none     |    0.025 | 8479.54 |             - |           - |
        | zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
        | zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
        | zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
        | zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
        | zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
        | zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
        | zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
        | zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
        | zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
        | zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |

  I ran a long series of tests and benchmarks on the btrfs side and the
  gains are very similar to the core benchmarks Nick ran"

* 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  squashfs: Add zstd support
  btrfs: Add zstd support
  lib: Add zstd modules
  lib: Add xxhash module
2017-09-14 17:30:49 -07:00
Joe Lawrence
439e7271dc livepatch: introduce shadow variable API
Add exported API for livepatch modules:

  klp_shadow_get()
  klp_shadow_alloc()
  klp_shadow_get_or_alloc()
  klp_shadow_free()
  klp_shadow_free_all()

that implement "shadow" variables, which allow callers to associate new
shadow fields to existing data structures.  This is intended to be used
by livepatch modules seeking to emulate additions to data structure
definitions.

See Documentation/livepatch/shadow-vars.txt for a summary of the new
shadow variable API, including a few common use cases.

See samples/livepatch/livepatch-shadow-* for example modules that
demonstrate shadow variables.

[jkosina@suse.cz: fix __klp_shadow_get_or_alloc() comment as spotted by
 Josh]
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-09-14 23:06:12 +02:00
Linus Torvalds
dff4d1f6fe - Some request-based DM core and DM multipath fixes and cleanups
- Constify a few variables in DM core and DM integrity
 
 - Add bufio optimization and checksum failure accounting to DM integrity
 
 - Fix DM integrity to avoid checking integrity of failed reads
 
 - Fix DM integrity to use init_completion
 
 - A couple DM log-writes target fixes
 
 - Simplify DAX flushing by eliminating the unnecessary flush abstraction
   that was stood up for DM's use.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZuo8UAAoJEMUj8QotnQNa5BEIANO4mHh1nrzEbH72a4RCLgxV
 H1Pk1zZx/W1bhOOmcRRhxCSM85dPgsCegc5EmpwLZEMavQrP9UZblHcYOUsyIx7W
 S/lWa+soOq/5N2OveROc4WdoWVs50UFmc1+BcClc4YrEe+15XC3R0VMkjX2b/hUL
 o2eYhPjpMlgaorMtRRU6MAooo2fBRQ9m05aPeVgd35fxibrE7PZm+EYW09wa0STi
 9ufuDXJf8+TtFP/38BD41LbUEskuHUZTSDeAJ+3DBaTtfEZcZYxsst4P9JangsHx
 jqqqI9aYzFD2a27fl9WLhCvm40YFiKp5nwzED0RZjzWxVa/jTShX7a49BdzTTfw=
 =rkSB
 -----END PGP SIGNATURE-----

Merge tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Some request-based DM core and DM multipath fixes and cleanups

 - Constify a few variables in DM core and DM integrity

 - Add bufio optimization and checksum failure accounting to DM
   integrity

 - Fix DM integrity to avoid checking integrity of failed reads

 - Fix DM integrity to use init_completion

 - A couple DM log-writes target fixes

 - Simplify DAX flushing by eliminating the unnecessary flush
   abstraction that was stood up for DM's use.

* tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dax: remove the pmem_dax_ops->flush abstraction
  dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
  dm integrity: make blk_integrity_profile structure const
  dm integrity: do not check integrity for failed read operations
  dm log writes: fix >512b sectorsize support
  dm log writes: don't use all the cpu while waiting to log blocks
  dm ioctl: constify ioctl lookup table
  dm: constify argument arrays
  dm integrity: count and display checksum failures
  dm integrity: optimize writing dm-bufio buffers that are partially changed
  dm rq: do not update rq partially in each ending bio
  dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
  dm mpath: complain about unsupported __multipath_map_bio() return values
  dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
2017-09-14 13:43:16 -07:00
Linus Torvalds
503f04530f fbdev changes for v4.14:
- make fbcon a built-time depency for fbdev (fbcon was tristate option
   before, now it is a bool) - this is a first step in preparations for
   making console_lock usage saner (currently it acts like the BKL for
   all things fbdev/fbcon) (Daniel Vetter)
 
 - add fbcon=margin:<color> command line option to select the fbcon margin
   color (David Lechner)
 
 - add DMI quirk table for x86 systems which need fbcon rotation (devices
   like Asus T100HA, GPD Pocket, the GPD win and the I.T.Works TW891)
   (Hans de Goede)
 
 - fix 1bpp logo support for unusual width (needed by LEGO MINDSTORMS EV3)
   (David Lechner)
 
 - enable Xilinx FB driver for ARM ZynqMP platform (Michal Simek)
 
 - fix use after free in the error path of udlfb driver (Anton Vasilyev)
 
 - fix error return code handling in pxa3xx_gcu driver (Gustavo A. R. Silva)
 
 - fix bootparams.screeninfo arguments checking in vgacon (Jan H. Schönherr)
 
 - do not leak uninitialized padding in clk to userspace in the debug code of
   atyfb driver (Vladis Dronov)
 
 - fix compiler warnings in fbcon code and matroxfb driver (Arnd Bergmann)
 
 - convert fbdev susbsytem to using %pOF instead of full_name (Rob Herring)
 
 - structures constifications (Arvind Yadav, Bhumika Goyal, Gustavo A. R.
   Silva, Julia Lawall)
 
 - misc cleanups (Gustavo A. R. Silva, Hyun Kwon, Julia Lawall, Kuninori
   Morimoto, Lynn Lei)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJZul4fAAoJEH4ztj+gR8ILqasP/3oau30GS0gLZmg76iuBzLdW
 XUI71k0HrSU+5TeSQUPHFXt/u0fkptPZg7rFdWeXNDExOQ86Vi56tGbICRZ0IOqE
 bFjHDnSYtAHTQYNH3326AA8aZ0BoAFd7cfb1oNN3o2xLMG8DLPT9BVl3FWaFMBvQ
 58OwmqbwYlgarC3qE+xDCRPCetrsSYXSmzR8YLkk37sE2pV4Sej4owcm+KJ5WqQk
 kJ6CE+4y9kpLHe3YYdIdXct5CRUa2qT9V8TrMKbio5b0YMsvFudfkpETRK4ebQ5U
 c7JCTAU77Wbl42PvZBlemVkMn1Ugq9tRrTZO4zLdMQfFV4KN0FIoV+mqmikkiYTt
 ON9OYUsOJMXnoP8cZuoELW4VT57jTwmb9YImZsLDnujMF++E9+tVOQ1xlL2SVIL9
 KQ8a7tDlvzaLqpDKyRRXjeeWmvF1xims3L0MiOgBGe+vZlmg2W6zsnMs+/TnomWi
 ytooSQywNS1ZqXVkOcgsv+Be48l/dRTlL0dHzXLYYWLXc0MPSQ57py/lGMfIS5L7
 YAYLvyTvbBXJNTIkvKCKnoo/P4zPYdu4WG9cvbviWoJM9JvVXtuU1CZCU1T4tyCE
 3DjrkJ6tzXVtdKbIN9olr+Nu+cesnmj5CHC1NzfJli7HqUhIxx5QiPOPrkhNCS+V
 22uHfk0Obd4OpUJaUGnF
 =ApgP
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-v4.14' of git://github.com/bzolnier/linux

Pull fbdev updates from Bartlomiej Zolnierkiewicz:

 - make fbcon a built-time depency for fbdev (fbcon was tristate option
   before, now it is a bool) - this is a first step in preparations for
   making console_lock usage saner (currently it acts like the BKL for
   all things fbdev/fbcon) (Daniel Vetter)

 - add fbcon=margin:<color> command line option to select the fbcon
   margin color (David Lechner)

 - add DMI quirk table for x86 systems which need fbcon rotation
   (devices like Asus T100HA, GPD Pocket, the GPD win and the I.T.Works
   TW891) (Hans de Goede)

 - fix 1bpp logo support for unusual width (needed by LEGO MINDSTORMS
   EV3) (David Lechner)

 - enable Xilinx FB driver for ARM ZynqMP platform (Michal Simek)

 - fix use after free in the error path of udlfb driver (Anton Vasilyev)

 - fix error return code handling in pxa3xx_gcu driver (Gustavo A. R.
   Silva)

 - fix bootparams.screeninfo arguments checking in vgacon (Jan H.
   Schönherr)

 - do not leak uninitialized padding in clk to userspace in the debug
   code of atyfb driver (Vladis Dronov)

 - fix compiler warnings in fbcon code and matroxfb driver (Arnd
   Bergmann)

 - convert fbdev susbsytem to using %pOF instead of full_name (Rob
   Herring)

 - structures constifications (Arvind Yadav, Bhumika Goyal, Gustavo A.
   R. Silva, Julia Lawall)

 - misc cleanups (Gustavo A. R. Silva, Hyun Kwon, Julia Lawall, Kuninori
   Morimoto, Lynn Lei)

* tag 'fbdev-v4.14' of git://github.com/bzolnier/linux: (75 commits)
  video/console: Update BIOS dates list for GPD win console rotation DMI quirk
  video/console: Add rotated LCD-panel DMI quirk for the VIOS LTH17
  video: fbdev: sis: fix duplicated code for different branches
  video: fbdev: make fb_var_screeninfo const
  video: fbdev: aty: do not leak uninitialized padding in clk to userspace
  vgacon: Prevent faulty bootparams.screeninfo from causing harm
  video: fbdev: make fb_videomode const
  video/console: Add new BIOS date for GPD pocket to dmi quirk table
  fbcon: remove restriction on margin color
  video: ARM CLCD: constify amba_id
  video: fm2fb: constify zorro_device_id
  video: fbdev: annotate fb_fix_screeninfo with const and __initconst
  omapfb: constify omap_video_timings structures
  video: fbdev: udlfb: Fix use after free on dlfb_usb_probe error path
  fbdev: i810: make fb_ops const
  fbdev: matrox: make fb_ops const
  video: fbdev: pxa3xx_gcu: fix error return code in pxa3xx_gcu_probe()
  video: fbdev: Enable Xilinx FB for ZynqMP
  video: fbdev: Fix multiple style issues in xilinxfb
  video: fbdev: udlfb: constify usb_device_id.
  ...
2017-09-14 13:33:33 -07:00
Linus Torvalds
7a95bdb092 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "A few leftovers"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_owner: skip unnecessary stack_trace entries
  arm64: stacktrace: avoid listing stacktrace functions in stacktrace
  mm: treewide: remove GFP_TEMPORARY allocation flag
  IB/mlx4: fix sprintf format warning
  fscache: fix fscache_objlist_show format processing
  lib/test_bitmap.c: use ULL suffix for 64-bit constants
  procfs: remove unused variable
  drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4
  idr: remove WARN_ON_ONCE() when trying to replace negative ID
2017-09-14 12:25:34 -07:00
Tim Chen
11a19c7b09 sched/wait: Introduce wakeup boomark in wake_up_page_bit
Now that we have added breaks in the wait queue scan and allow bookmark
on scan position, we put this logic in the wake_up_page_bit function.

We can have very long page wait list in large system where multiple
pages share the same wait list. We break the wake up walk here to allow
other cpus a chance to access the list, and not to disable the interrupts
when traversing the list for too long.  This reduces the interrupt and
rescheduling latency, and excessive page wait queue lock hold time.

[ v2: Remove bookmark_wake_function ]

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-14 09:56:18 -07:00
Tim Chen
2554db9165 sched/wait: Break up long wake list walk
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.

We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.

Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.

The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
extending the NMI watch dog timer there helped.

This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.

This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.

[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
  simply access to flags. ]

Reported-by: Kan Liang <kan.liang@intel.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-14 09:56:17 -07:00
Thomas Gleixner
2a1b8ee4f5 watchdog/hardlockup/perf: Implement CPU enable replacement
watchdog_nmi_enable() is an unparseable mess, Provide a clean perf specific
implementation, which will be used when the existing setup/teardown mess is
replaced.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.180215498@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:08 +02:00
Thomas Gleixner
178b9f7a36 watchdog/hardlockup/perf: Implement init time perf validation
The watchdog tries to create perf events even after it figured out that
perf is not functional or the requested event is not supported.

That's braindead as this can be done once at init time and if not supported
the NMI watchdog can be turned off unconditonally.

Implement the perf hardlockup detector functionality for that. This creates
a new event create function, which will replace the unholy mess of the
existing one in later patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.019090547@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:08 +02:00
Thomas Gleixner
6592ad2fcc watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.

     1) Stop all NMIs
     2) Read the new parameters and start NMIs

Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner
7feeb9cd4f watchdog/sysctl: Clean up sysctl variable name space
Reflect that these variables are user interface related and remove the
whitespace damage in the sysctl table while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.783210221@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner
51d4052b01 watchdog/sysctl: Get rid of the #ifdeffery
The sysctl of the nmi_watchdog file prevents writes by setting:

    min = max = 0

if none of the users is enabled. That involves ifdeffery and is competely
non obvious.

If none of the facilities is enabeld, then the file can simply be made read
only. Move the ifdeffery into the header and use a constant for file
permissions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.706073616@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner
3b371b5936 watchdog/core: Clean up header mess
Having the same #ifdef in various places does not make it more
readable. Collect stuff into one place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.627096864@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner
0d85923c7a smpboot/threads, watchdog/core: Avoid runtime allocation
smpboot_update_cpumask_threads_percpu() allocates a temporary cpumask at
runtime. This is suboptimal because the call site needs more code size for
proper error handling than a statically allocated temporary mask requires
data size.

Add static temporary cpumask. The function is globaly serialized, so no
further protection required.

Remove the half baken error handling in the watchdog code and get rid of
the export as there are no in tree modular users of that function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.297288838@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:06 +02:00
Thomas Gleixner
01f0a02701 watchdog/core: Remove the park_in_progress obfuscation
Commit:

  b94f51183b ("kernel/watchdog: prevent false hardlockup on overloaded system")

tries to fix the following issue:

proc_write()
   set_sample_period()    <--- New sample period becoms visible
			  <----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
        watchdog_park_threads()

					thread->park()
					  disable_nmi()
			  <----- Broken ends

The reason why this is broken is that the update of the watchdog threshold
becomes immediately effective and visible for the hrtimer function which
uses that value to rearm the timer. But the NMI/perf side still uses the
old value up to the point where it is disabled. If the rate has been
lowered then the NMI can run fast enough to 'detect' a hard lockup because
the timer has not fired due to the longer period.

The patch 'fixed' this by adding a variable:

proc_write()
   set_sample_period()
					<----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
         watchdog_park_threads()
	  park_in_progress = 1
					<----- Broken ends
				        nmi_watchdog()
					  if (park_in_progress)
					     return;

The only effect of this variable was to make the window where the breakage
can hit small enough that it was not longer observable in testing. From a
correctness point of view it is a pointless bandaid which merily papers
over the root cause: the unsychronized update of the variable.

Looking deeper into the related code pathes unearthed similar problems in
the watchdog_start()/stop() functions.

 watchdog_start()
	perf_nmi_event_start()
	hrtimer_start()

 watchdog_stop()
	hrtimer_cancel()
	perf_nmi_event_stop()

In both cases the call order is wrong because if the tasks gets preempted
or the VM gets scheduled out long enough after the first call, then there is
a chance that the next NMI will see a stale hrtimer interrupt count and
trigger a false positive hard lockup splat.

Get rid of park_in_progress so the code can be gradually deobfuscated and
pruned from several layers of duct tape papering over the root cause,
which has been either ignored or not understood at all.

Once this is removed the underlying problem will be fixed by rewriting the
proc interface to do a proper synchronized update.

Address the start/stop() ordering problem as well by reverting the call
order, so this part is at least correct now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:05 +02:00