Commit graph

767209 commits

Author SHA1 Message Date
Linus Torvalds
b50694381c Merge branch 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
 "One single fix in here: under Xen the DMA32 heap (in the hypervisor)
  would end up looking like swiss cheese.

  The reason being that for every coherent DMA allocation we didn't do
  the proper hypercall to tell Xen to return the page back to the DMA32
  heap. End result was (eventually) no DMA32 space if you (for example)
  continously unloaded and loaded modules"

* 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
2018-05-24 14:42:43 -07:00
Wei Hu(Xavier)
d59fcacc4b RDMA/hns: Increase checking CMQ status timeout value
This patch increases checking CMQ status timeout value and
uses the same value with NIC driver to avoid deficiency of
time.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 15:41:52 -06:00
Wei Hu(Xavier)
5b6eb54f58 RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
This patch modified uar allocation algorithm in hns_roce_uar_alloc
function to avoid bitmap exhaust.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 15:41:52 -06:00
Yossi Kuperman
1dcbc01f73 net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
Sandbox QP Commands are retired in the order they are sent. Outstanding
commands are stored in a linked-list in the order they appear. Once a
response is received and the callback gets called, we pull the first
element off the pending list, assuming they correspond.

Sending a message and adding it to the pending list is not done atomically,
hence there is an opportunity for a race between concurrent requests.

Bind both send and add under a critical section.

Fixes: bebb23e6cb ("net/mlx5: Accel, Add IPSec acceleration interface")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:40:40 -07:00
Eran Ben Elisha
902a545904 net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
When RXFCS feature is enabled, the HW do not strip the FCS data,
however it is not present in the checksum calculated by the HW.

Fix that by manually calculating the FCS checksum and adding it to the SKB
checksum field.

Add helper function to find the FCS data for all SKB forms (linear,
one fragment or more).

Fixes: 102722fc68 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:40:39 -07:00
Huy Nguyen
ecdf2dadee net/mlx5e: Receive buffer support for DCBX
Add dcbnl's set/get buffer configuration callback that allows user to
set/get buffer size configuration and priority to buffer mapping.

By default, firmware controls receive buffer configuration and priority
of buffer mapping based on the changes in pfc settings. When set buffer
call back is triggered, the buffer configuration changes to manual mode.

The manual mode means mlx5 driver will adjust the buffer configuration
accordingly based on the changes in pfc settings.

ConnectX buffer stride is 128 Bytes. If the buffer size is not multiple
of 128, the buffer size will be rounded down to the nearest multiple of
128.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
0696d60853 net/mlx5e: Receive buffer configuration
Add APIs for buffer configuration based on the changes in
pfc configuration, cable len, buffer size configuration,
and priority to buffer mapping.

Note that the xoff fomula is as below
  xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]
  xoff_threshold = buffer_size - xoff
  xon_threshold = xoff_threshold - MTU

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
50b4a3c236 net/mlx5: PPTB and PBMC register firmware command support
Add firmware command interface to read and write PPTB and PBMC
registers.

PPTB register enables mappings priority to a specific receive buffer.

PBMC registers enables changing the receive buffer's configuration such
as buffer size, xon/xoff thresholds, buffer's lossy property and
buffer's shared property.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
df5f1361cc net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
Add pbmc and pptb in the port_access_reg_cap_mask. These two
bits determine if device supports receive buffer configuration.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
2c81bfd5ae net/mlx5e: Move port speed code from en_ethtool.c to en/port.c
Move four below functions from en_ethtool.c to en/port.c. These
functions are used by both en_ethtool.c and en_main.c. Future code
can use these functions without ethtool link mode dependency.
  u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
  int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  u32 mlx5e_port_speed2linkmodes(u32 speed);

Delete the speed field from table mlx5e_build_ptys2ethtool_map. This
table only keeps the mapping between the mlx5e link mode and
ethtool link mode. Add new table mlx5e_link_speed for translation
from mlx5e link mode to actual speed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
e549f6f9c0 net/dcb: Add dcbnl buffer attribute
In this patch, we add dcbnl buffer attribute to allow user
change the NIC's buffer configuration such as priority
to buffer mapping and buffer size of individual buffer.

This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.

The dcb buffer configuration will be controlled by lldptool.
lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
  maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
  sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively

After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.

We present an use case scenario where dcbnl buffer attribute configured
by advance user helps reduce the latency of messages of different sizes.

Scenarios description:
On ConnectX-5, we run latency sensitive traffic with
small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
traffic with large messages sizes 512KB and 1MB. We group small, medium,
and large message sizes to their own pfc enables priorities as follow.
  Priorities 1 & 2 (64B, 256B and 1KB)
  Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
  Priorities 5 & 6 (512KB and 1MB)

By default, ConnectX-5 maps all pfc enabled priorities to a single
lossless fixed buffer size of 50% of total available buffer space. The
other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
we create three equal size lossless buffers. Each buffer has 25% of total
available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
to lossless  buffer mappings are set as follow.
  Priorities 1 & 2 on lossless buffer #1
  Priorities 3 & 4 on lossless buffer #2
  Priorities 5 & 6 on lossless buffer #3

We observe improvements in latency for small and medium message sizes
as follows. Please note that the large message sizes bandwidth performance is
reduced but the total bandwidth remains the same.
  256B message size (42 % latency reduction)
  4K message size (21% latency reduction)
  64K message size (16% latency reduction)

CC: Ido Schimmel <idosch@idosch.org>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Or Gerlitz <gerlitz.or@gmail.com>
CC: Parav Pandit <parav@mellanox.com>
CC: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:22:59 -07:00
Linus Torvalds
34b48b8789 Merge candidates for 4.17-rc
- Remove bouncing addresses from the MAINTAINERS file
 - Kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers
 - Various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers
 - Two fixes for patches already sent during the merge window
 - A long standing bug related to not decreasing the pinned pages count in the right
   MM was found and fixed
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbByPQAAoJEDht9xV+IJsa164P/AihB/vbn9MBdK3pe1OSUGTm
 tKZJ/Y6nY/Q/XTJSeM2wNECk8fOrZbKuLBz2XlPRsB2djp4ugC5WWfK9YbwWMGXG
 I5B/lB8VTorQr8E5i9lqqMDQc8aF8VcGJtdqVE3nD4JsVTrQSGiSnw45/BARDUm3
 OycJJMDOWhDj2wnNSa+JfjPemIMDM1jse7DnsJfDsGfTMS/G+6nyzjKIlEnnFZ8/
 PBxhq0q7C5viNDwwn2GsAVUrATTlW48SY0WYhkgMdSl20d2th9wMZqNMqtniz8NP
 lg87SrhzsAPOTlbSWlYYkAnzE7nEhfJyIfYUp2piNJeYuOohYPtO6w99Tqjl/GmU
 uLIYIXtZCxAK1Zb/znc49HkRVL5YFDsQGXdtYy7tvRZPwwR32kowUtpKIWaZFz8O
 BA/x+Zgqu9AlwqSWwQwxmMbUX42RRwhNJDVyTYlXQSSzhfgFaLIZARqb4K6HxeNN
 vZN0BK+x6pX6FI7hpdsqNRtH1oo4SNUBxiuUsrZ7cy7GqYNdUJ6piygDgmERaJxU
 svIUJof/+OoU1QyErQ0JgUEK/3jOHbjxSPb/rjQeqxAnCqhaGOuNGMtdfsGqgvBU
 x/u3eDcbfi/LBErXR46gYtxnOQ8I2BB+m8erUc/GVvCzWrX+R7ELZYpBrP5Pcu/6
 mr2D7hDqgZHbeU8aB8+D
 =uFZh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is pretty much just the usual array of smallish driver bugs.

   - remove bouncing addresses from the MAINTAINERS file

   - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and
     hns drivers

   - various small LOC behavioral/operational bugs in mlx5, hns, qedr
     and i40iw drivers

   - two fixes for patches already sent during the merge window

   - a long-standing bug related to not decreasing the pinned pages
     count in the right MM was found and fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
  RDMA/hns: Move the location for initializing tmp_len
  RDMA/hns: Bugfix for cq record db for kernel
  IB/uverbs: Fix uverbs_attr_get_obj
  RDMA/qedr: Fix doorbell bar mapping for dpi > 1
  IB/umem: Use the correct mm during ib_umem_release
  iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
  RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
  RDMA/i40iw: Avoid reference leaks when processing the AEQ
  RDMA/i40iw: Avoid panic when objects are being created and destroyed
  RDMA/hns: Fix the bug with NULL pointer
  RDMA/hns: Set NULL for __internal_mr
  RDMA/hns: Enable inner_pa_vld filed of mpt
  RDMA/hns: Set desc_dma_addr for zero when free cmq desc
  RDMA/hns: Fix the bug with rq sge
  RDMA/hns: Not support qp transition from reset to reset for hip06
  RDMA/hns: Add return operation when configured global param fail
  RDMA/hns: Update convert function of endian format
  RDMA/hns: Load the RoCE dirver automatically
  RDMA/hns: Bugfix for rq record db for kernel
  RDMA/hns: Add rq inline flags judgement
  ...
2018-05-24 14:12:05 -07:00
Tom Stellard
c3032fd967 drm/amdgpu: Use dev_info() to report amdkfd is not supported for this ASIC
This is an important message, so it should be visible to users without
having to enable extra debugging.

Signed-off-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-24 14:07:14 -07:00
Peter Rosin
6a0c0d0d00 i2c: robotfuzz-osif: drop pointless test
In the for-loop test, ret will be either 0 or 1. So, the
comparison is pointless. Drop it, and drop the initializer
which is then also pointless.

Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-05-24 22:13:04 +02:00
Peter Rosin
7fb29b958d i2c: robotfuzz-osif: remove pointless local variable
Just use the value directly instead of assigning it to a
variable first. And then drop the unused variable.

Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-05-24 22:12:24 +02:00
Shawn Lin
17dd94796c i2c: rk3x: Don't print visible virtual mapping MMIO address
Now %p doesn't print visible pointer address unless the user
really want it. According to Documentation/core-api/printk-formats.rst,
%px should be used instead, otherwise we could see:

rk3x-i2c ff110000.i2c: Initialized RK3xxx I2C bus at (____ptrval____)
rk3x-i2c ff130000.i2c: Initialized RK3xxx I2C bus at (____ptrval____)
rk3x-i2c ff3c0000.i2c: Initialized RK3xxx I2C bus at (____ptrval____)
rk3x-i2c ff3d0000.i2c: Initialized RK3xxx I2C bus at (____ptrval____)

But I don't really understand why we need dump it in the first place!
Let's remove the whole pointless log.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-05-24 22:09:05 +02:00
Corey Minyard
048f7c3e35 ipmi: Properly release srcu locks on error conditions
When SRCU was added for handling hotplug, some error conditions
were not handled properly.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
2018-05-24 15:08:30 -05:00
Luis Henriques
6d71021ab3 leds: class: ensure workqueue is initialized before setting brightness
An application can try to set brightness before all the initialization is
done, in particular before the workqueue is initialized with the call to
led_init_core().  Here's a WARNING easy to trigger:

[   36.780813] WARNING: CPU: 3 PID: 1411 at ../kernel/workqueue.c:1444 __queue_work+0x37b/0x420
[   36.780815] Modules linked in: ...
[   36.780868] CPU: 3 PID: 1411 Comm: systemd-backlig Not tainted 4.16.9-1-default #1 openSUSE Tumbleweed (unreleased)
[   36.780868] Hardware name: Dell Inc. Precision 5510/0N8J4R, BIOS 1.6.1 12/11/2017
[   36.780870] RIP: 0010:__queue_work+0x37b/0x420
[   36.780871] RSP: 0018:ffffaced048b7d78 EFLAGS: 00010086
[   36.780873] RAX: 0000000000000000 RBX: ffffffffb3f01440 RCX: 0000000000000000
[   36.780873] RDX: ffffffffc05a90d8 RSI: 0000000000000000 RDI: ffff8eac7dce2700
[   36.780874] RBP: ffff8ea547c16400 R08: ffff8ea547800000 R09: ffff8eac7dc22700
[   36.780875] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000003
[   36.780876] R13: 0000000000000200 R14: ffffffffc05a90d0 R15: ffff8eac7dce8600
[   36.780877] FS:  00007f871e61cf40(0000) GS:ffff8eac7dcc0000(0000) knlGS:0000000000000000
[   36.780878] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.780879] CR2: 000055c91115e308 CR3: 0000000883ee0005 CR4: 00000000003606e0
[   36.780880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   36.780880] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   36.780881] Call Trace:
[   36.780886]  queue_work_on+0x81/0x90
[   36.780889]  brightness_store+0x5d/0x90
[   36.780892]  kernfs_fop_write+0x105/0x180
[   36.780894]  __vfs_write+0x26/0x150
[   36.780897]  ? common_file_perm+0x51/0x150
[   36.780900]  ? security_file_permission+0x3c/0xb0
[   36.780901]  vfs_write+0xad/0x1a0
[   36.780903]  SyS_write+0x42/0x90
[   36.780906]  do_syscall_64+0x76/0x140
[   36.780908]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[   36.780910] RIP: 0033:0x7f871dd04c94
[   36.780910] RSP: 002b:00007ffeb3a57d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   36.780912] RAX: ffffffffffffffda RBX: 000055c91115c810 RCX: 00007f871dd04c94
[   36.780912] RDX: 0000000000000001 RSI: 000055c91115c810 RDI: 0000000000000004
[   36.780913] RBP: 00007ffeb3a57e10 R08: 0000000000000003 R09: 0000000000000000
[   36.780914] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[   36.780914] R13: 000055c911158f30 R14: 000055c90f3a9a4e R15: 0000000000000004
[   36.780917] Code: 74 18 e8 49 80 00 00 48 85 c0 74 0e 48 8b 40 20 48 3b 68 08
0f 84 c2 fc ff ff 0f 0b 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9
82 fd ff ff 83 cd 02 49 8d 57 60 e9 69 fd ff ff 80 3d
[   36.780942] ---[ end trace 1fce4edad54c4017 ]---

This patch initializes and acquires the led_access mutex early in the
of_led_classdev_register function, so that any application trying to write
to sysfs to set brightness will block until initialization ends.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2018-05-24 22:08:26 +02:00
Wolfram Sang
05d4707d46 i2c: opal: don't check number of messages in the driver
Since commit 1eace8344c ("i2c: add param sanity check to
i2c_transfer()") and b7f6258402 ("i2c: add quirk checks to core"), the
I2C core does this check now. We can remove it here.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Peter Rosin <peda@axentia.se>
2018-05-24 22:06:17 +02:00
Wolfram Sang
c216b87065 i2c: ibm_iic: don't check number of messages in the driver
Since commit 1eace8344c ("i2c: add param sanity check to
i2c_transfer()"), the I2C core does this check now. We can remove it
from drivers.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Peter Rosin <peda@axentia.se>
2018-05-24 22:05:54 +02:00
Fabio Estevam
ef9fc0bad5 i2c: imx: Switch to SPDX identifier
Adopt the SPDX license identifier headers to ease license compliance
management.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-05-24 22:04:34 +02:00
Joe Perches
5657a819a8 block drivers/block: Use octal not symbolic permissions
Convert the S_<FOO> symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24 13:38:59 -06:00
Heiner Kallweit
87e5808d52 net: phy: replace bool members in struct phy_device with bit-fields
In struct phy_device we have a number of flags being defined as type
bool. Similar to e.g. struct pci_dev we can save some space by using
bit-fields.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 15:35:58 -04:00
Ilia Lin
f50e5ddae8
dt-bindings: qcom_spmi: Document SAW support
Document the DT bindings for the SAW regulators.

The saw-leader is the only property that is configurable in DT.

The saw-slave property allows ganging (grouping) of
several regulators so that their outputs can be combined.

Signed-off-by: Ilia Lin <ilialin@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-05-24 20:23:48 +01:00
Ilia Lin
0caecaa872
regulator: qcom_spmi: Add support for SAW
Add support for SAW controlled regulators.
The regulators defined as SAW controlled in the device tree
will be controlled through special CPU registers instead of direct
SPMI accesses.
This is required especially for CPU supply regulators to synchronize
with clock scaling and for Automatic Voltage Switching.

Signed-off-by: Ilia Lin <ilialin@codeaurora.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-05-24 20:23:43 +01:00
Suman Anna
e759176c7f hwspinlock/u8500: Switch to SPDX license identifier
Use the appropriate SPDX license identifier in the U8500 HWSEM
driver source file and drop the previous boilerplate license text.

Cc: Mathieu J. Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-05-24 12:05:11 -07:00
Suman Anna
0c3e890b1e hwspinlock: sprd: Switch to SPDX license identifier
Use the appropriate SPDX license identifiers in the Spreadtrum hardware
spinlock driver source file and drop the previous boilerplate license text.

Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-05-24 12:04:48 -07:00
Suman Anna
50a522805e hwspinlock/sirf: Switch to SPDX license identifier
Use the appropriate SPDX license identifier in the CSR's SIRF hardware
spinlock driver source file and drop the previous boilerplate license text.

Cc: Wei Chen <wei.chen@csr.com>
Cc: Barry Song <baohua@kernel.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-05-24 12:04:47 -07:00
Suman Anna
8ba01caeec hwspinlock: qcom: Switch to SPDX license identifier
Use the appropriate SPDX license identifier in the Qualcomm Hwspinlock
driver source file and drop the previous boilerplate license text.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-05-24 12:04:46 -07:00
Suman Anna
357ace03f0 hwspinlock/omap: Switch to SPDX license identifier
Use the appropriate SPDX license identifier in the OMAP hwspinlock
driver source file and drop the previous boilerplate license text.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-05-24 12:04:43 -07:00
Suman Anna
eebba71e1c hwspinlock/core: Switch to SPDX license identifier
Use the appropriate SPDX license identifier in the Hwspinlock core
driver source files and drop the previous boilerplate license text.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-05-24 12:04:32 -07:00
Linus Torvalds
d7b66b4ab0 for-4.17-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlsG/ecACgkQxWXV+ddt
 WDvr3w/8D12pwR9sPcEwxD4pvoLv7LP1VRQy2u+ivSifdBD7MueKh3y0igUMyARR
 LERsK0zUsTQGkkC6c7ZYd4cT9PikPpXtO1P9iATFAKqR/YMDIV/haSqT8DwbI/qb
 7F+ZMeTy1LzL01YlYBrGVDxP8AWVO2Dml6JolYxzplILSLvdPH6G8xOSjei/p9sm
 RK5ERHJENEI0l/cThpiLoAEWjzciPtR39T5Hq45onHyCs3bjJCcx51/QE8sBsl8x
 +BKvCmL40UKd30YKudJZYDM6NgMgWENhfTtIZQIInv99sMNCxIgTEUdX8ExdyjRZ
 24rst/BuQz4d8r/8zqE/hdFsHRGWwnEiYmGWylanPY5KdQ41ULfXC06xuoNOLoW8
 KQwD8SWv+W5vEJW0UQz5cb3vUgv5RnUzPvcmMfSztLeo2K4zj6zCK5L6XJwIJNbM
 1AJR7R4TRkQdf5QEeziFl738Yv1AgsPQuKSiiFa9YwXMLU8dYXlx14ioUzBL8MLe
 1wZPJ03x/N7eKJ0g6OIAAVfUTFFejv4Z2B2IDoObuLLsPwTdK6tS+9tJ5mos7ngG
 Vf1ZVmhmeJdw1qwK8ROzAJHkK807KgGO7LWmA7tIVLwWuZX14F7xLQIg3Ux3MhIh
 NhoBTFy2AGmdE0hFYv/4FA5dnUOU4VTVYVw3QUV4DMc0XIodZrE=
 =iYyx
 -----END PGP SIGNATURE-----

Merge tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "A one-liner that prevents leaking an internal error value 1 out of the
  ftruncate syscall.

  This has been observed in practice. The steps to reproduce make a
  common pattern (open/write/fync/ftruncate) but also need the
  application to not check only for negative values and happens only for
  compressed inlined files.

  The conditions are narrow but as this could break userspace I think
  it's better to merge it now and not wait for the merge window"

* tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix error handling in btrfs_truncate()
2018-05-24 11:47:43 -07:00
Lukas Wunner
009f8c90f5 ALSA: hda - Fix runtime PM
Before commit 3b5b899ca6 ("ALSA: hda: Make use of core codec functions
to sync power state"), hda_set_power_state() returned the response to
the Get Power State verb, a 32-bit unsigned integer whose expected value
is 0x233 after transitioning a codec to D3, and 0x0 after transitioning
it to D0.

The response value is significant because hda_codec_runtime_suspend()
does not clear the codec's bit in the codec_powered bitmask unless the
AC_PWRST_CLK_STOP_OK bit (0x200) is set in the response value.  That in
turn prevents the HDA controller from runtime suspending because
azx_runtime_idle() checks that the codec_powered bitmask is zero.

Since commit 3b5b899ca6, hda_set_power_state() only returns 0x0 or
0x1, thereby breaking runtime PM for any HDA controller.  That's because
an inline function introduced by the commit returns a bool instead of a
32-bit unsigned int.  The change was likely erroneous and resulted from
copying and pasting snd_hda_check_power_state(), which is immediately
preceding the newly introduced inline function.  Fix it.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=106597
Fixes: 3b5b899ca6 ("ALSA: hda: Make use of core codec functions to sync power state")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Abhijeet Kumar <abhijeet.kumar@intel.com>
Reported-and-tested-by: Gunnar Krüger <taijian@posteo.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-05-24 20:16:47 +02:00
Jingqi Liu
0ea3286e2d KVM: x86: Expose CLDEMOTE CPU feature to guest VM
The CLDEMOTE instruction hints to hardware that the cache line that
contains the linear address should be moved("demoted") from
the cache(s) closest to the processor core to a level more distant
from the processor core. This may accelerate subsequent accesses
to the line by other cores in the same coherence domain,
especially if the line was written by the core that demotes the line.

This patch exposes the cldemote feature to the guest.

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf
This patch has a dependency on https://lkml.org/lkml/2018/4/23/928

Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Reviewed-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24 20:15:22 +02:00
Liran Alon
cd9a491f6e KVM: nVMX: Emulate L1 individual-address invvpid by L0 individual-address invvpid
When vmcs12 uses VPID, all TLB entries populated by L2 are tagged with
vmx->nested.vpid02. Currently, INVVPID executed by L1 is emulated by L0
by using INVVPID single/global-context to flush all TLB entries
tagged with vmx->nested.vpid02 regardless of INVVPID type executed by
L1.

However, we can easily optimize the case of L1 INVVPID on an
individual-address. Just INVVPID given individual-address tagged with
vmx->nested.vpid02.

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
[Squashed with a preparatory patch that added the !operand.vpid line.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24 19:45:45 +02:00
Liran Alon
6f1e03bcab KVM: nVMX: Don't flush TLB when vmcs12 uses VPID
Since commit 5c614b3583 ("KVM: nVMX: nested VPID emulation"),
vmcs01 and vmcs02 don't share the same VPID. vmcs01 uses vmx->vpid
while vmcs02 uses vmx->nested.vpid02. This was done such that TLB
flush could be avoided when switching between L1 and L2.

However, the above mentioned commit only changed L2 VMEntry logic to
not flush TLB when switching from L1 to L2. It forgot to also remove
the TLB flush which is done when simulating a VMExit from L2 to L1.

To fix this issue, on VMExit from L2 to L1 we flush TLB only in case
vmcs01 enables VPID and vmcs01->vpid==vmcs02->vpid. This happens when
vmcs01 enables VPID and vmcs12 does not.

Fixes: 5c614b3583 ("KVM: nVMX: nested VPID emulation")

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24 19:45:40 +02:00
Liran Alon
6bce30c7d9 KVM: nVMX: Use vmx local var for referencing vpid02
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24 19:45:37 +02:00
Marek Vasut
0bbf6b9230 PCI: rcar: Remove IRQ mappings in rcar_pcie_enable_msi() failpath
The rcar_pcie_enable_msi() creates IRQ mappings using irq_create_mapping()
before requesting the IRQs using devm_request_irq(). If devm_request_irq()
fails for some reason, rcar_pcie_enable_msi() does not remove the mapping.

Pull out the code for disposing IRQ mappings from rcar_pcie_teardown_msi()
into a separate function and call it from both rcar_pcie_teardown_msi()
and rcar_pcie_enable_msi() failpath to remove the mappings correctly.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
2018-05-24 18:40:05 +01:00
Marek Vasut
1aea58be73 PCI: rcar: Teardown MSI setup if rcar_pcie_enable() fails
If the rcar_pcie_enable() fails and MSIs are enabled, the setup done in
rcar_pcie_enable_msi() is never undone. Add a function to tear down the
MSI setup by disabling the MSI handling in the PCIe block, deallocating
the pages requested for the MSIs and zapping the IRQ mapping.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
2018-05-24 18:39:59 +01:00
Marek Vasut
53f1aebfbe PCI: rcar: Add missing irq_dispose_mapping() into failpath
The rcar_pcie_get_resources() is another misnomer with a side effect.
The function does not only get resources, but also maps MSI IRQs via
irq_of_parse_and_map(). In case anything fails afterward, the IRQ
mapping must be disposed through irq_dispose_mapping() which is not
done.

This patch handles irq_of_parse_and_map() failures in by disposing
of the mapping in rcar_pcie_get_resources() as well as in probe.

Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
2018-05-24 18:39:24 +01:00
Marek Vasut
80b8471e69 PCI: rcar: Pull bus clock enable/disable from rcar_pcie_get_resources()
The rcar_pcie_get_resources() is another misnomer with a side effect.
The function does not only get resources, but also enables/disables bus
clock. This is forgotten in the probe() function though and if anything
in probe() fails after rcar_pcie_get_resources() is called, the bus
clock are never disabled.

This patch pulls the clock handling out of the rcar_pcie_get_resources()
and enables clock after all the resources were requested. Moreover, this
patch also always disables the clock in case of failure.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
2018-05-24 18:38:31 +01:00
Dan Carpenter
86bf20cb57 KVM: x86: prevent integer overflows in KVM_MEMORY_ENCRYPT_REG_REGION
This is a fix from reviewing the code, but it looks like it might be
able to lead to an Oops.  It affects 32bit systems.

The KVM_MEMORY_ENCRYPT_REG_REGION ioctl uses a u64 for range->addr and
range->size but the high 32 bits would be truncated away on a 32 bit
system.  This is harmless but it's also harmless to prevent it.

Then in sev_pin_memory() the "uaddr + ulen" calculation can wrap around.
The wrap around can happen on 32 bit or 64 bit systems, but I was only
able to figure out a problem for 32 bit systems.  We would pick a number
which results in "npages" being zero.  The sev_pin_memory() would then
return ZERO_SIZE_PTR without allocating anything.

I made it illegal to call sev_pin_memory() with "ulen" set to zero.
Hopefully, that doesn't cause any problems.  I also changed the type of
"first" and "last" to long, just for cosmetic reasons.  Otherwise on a
64 bit system you're saving "uaddr >> 12" in an int and it truncates the
high 20 bits away.  The math works in the current code so far as I can
see but it's just weird.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[Brijesh noted that the code is only reachable on X86_64.]
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24 19:32:20 +02:00
Sean Christopherson
a1d588e951 KVM: x86: remove obsolete EXPORT... of handle_mmio_page_fault
handle_mmio_page_fault() was recently moved to be an internal-only
MMU function, i.e. it's static and no longer defined in kvm_host.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-24 19:32:20 +02:00
Viresh Kumar
9ad14c0016 PM / Domain: Return 0 on error from of_genpd_opp_to_performance_state()
of_genpd_opp_to_performance_state() should return 0 on errors, as its
doc comment describes. While it follows that mostly, it returns a
negative error number on one of the failures.

Fix that.

Fixes: 6e41766a6a "PM / Domain: Implement of_genpd_opp_to_performance_state()"
Reported-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-24 19:08:14 +02:00
Joonsoo Kim
d883c6cf3b Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
This reverts the following commits that change CMA design in MM.

 3d2054ad8c ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")

 1d47a3ec09 ("mm/cma: remove ALLOC_CMA")

 bad8c6c0b1 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")

Ville reported a following error on i386.

  Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
  microcode: microcode updated early to revision 0x4, date = 2013-06-28
  Initializing CPU#0
  Initializing HighMem for node 0 (000377fe:00118000)
  Initializing Movable for node 0 (00000001:00118000)
  BUG: Bad page state in process swapper  pfn:377fe
  page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
  flags: 0x80000000()
  raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
  page dumped because: nonzero mapcount
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
  Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
  Call Trace:
   dump_stack+0x60/0x96
   bad_page+0x9a/0x100
   free_pages_check_bad+0x3f/0x60
   free_pcppages_bulk+0x29d/0x5b0
   free_unref_page_commit+0x84/0xb0
   free_unref_page+0x3e/0x70
   __free_pages+0x1d/0x20
   free_highmem_page+0x19/0x40
   add_highpages_with_active_regions+0xab/0xeb
   set_highmem_pages_init+0x66/0x73
   mem_init+0x1b/0x1d7
   start_kernel+0x17a/0x363
   i386_start_kernel+0x95/0x99
   startup_32_smp+0x164/0x168

The reason for this error is that the span of MOVABLE_ZONE is extended
to whole node span for future CMA initialization, and, normal memory is
wrongly freed here.  I submitted the fix and it seems to work, but,
another problem happened.

It's so late time to fix the later problem so I decide to reverting the
series.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-24 10:07:50 -07:00
Seth Forshee
f3f1a18330 fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
The user in control of a super block should be allowed to freeze
and thaw it. Relax the restrictions on the FIFREEZE and FITHAW
ioctls to require CAP_SYS_ADMIN in s_user_ns.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24 12:04:28 -05:00
Eric W. Biederman
b1d749c5c3 capabilities: Allow privileged user in s_user_ns to set security.* xattrs
A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.

The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24 12:03:31 -05:00
Eric W. Biederman
bc6155d132 fs: Allow superblock owner to access do_remount_sb()
Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24 12:02:25 -05:00
Marek Vasut
0353ff29b3 PCI: rcar: Poll more often in rcar_pcie_wait_for_dl()
The data link active signal usually takes ~20 uSec to be asserted, poll
the bit more often to avoid useless delays in this function.

Use udelay() instead of usleep() for such a small delay as suggested by
the timer documentation.

Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
2018-05-24 18:00:44 +01:00
Ming Lei
e6fc464987 blk-mq: avoid starving tag allocation after allocating process migrates
When the allocation process is scheduled back and the mapped hw queue is
changed, fake one extra wake up on previous queue for compensating wake
up miss, so other allocations on the previous queue won't be starved.

This patch fixes one request allocation hang issue, which can be
triggered easily in case of very low nr_request.

The race is as follows:

1) 2 hw queues, nr_requests are 2, and wake_batch is one

2) there are 3 waiters on hw queue 0

3) two in-flight requests in hw queue 0 are completed, and only two
   waiters of 3 are waken up because of wake_batch, but both the two
   waiters can be scheduled to another CPU and cause to switch to hw
   queue 1

4) then the 3rd waiter will wait for ever, since no in-flight request
   is in hw queue 0 any more.

5) this patch fixes it by the fake wakeup when waiter is scheduled to
   another hw queue

Cc: <stable@vger.kernel.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>

Modified commit message to make it clearer, and make it apply on
top of the 4.18 branch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24 11:00:39 -06:00