Commit graph

791450 commits

Author SHA1 Message Date
David S. Miller
3325cf9e51 Merge branch 'defza-fddi'
Maciej W. Rozycki says:

====================
FDDI: DEC FDDIcontroller 700 TURBOchannel adapter support

 This is an update to <http://patchwork.ozlabs.org/patch/342737/>.  I
believe I have addressed all the requests made in the previous review
round.

 There is still one `checkpatch.pl' warning remaining:

WARNING: quoted string split across lines
+       pr_info("%s: ROM rev. %.4s, firmware rev. %.4s, RMC rev. %.4s, "
+               "SMT ver. %u\n", fp->name, rom_rev, fw_rev, rmc_rev, smt_ver);

total: 0 errors, 1 warnings, 2458 lines checked

however I think the value of staying within 80 columns is higher than the
value of having the string on a single line.  This is because with all the
formatting specifiers there it is not directly greppable based on the
final output produced to the kernel log on one hand, e.g.:

tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1

while it can be easily tracked down by grepping for an obvious substring
such as "RMC rev" on the other.

 The issue with MMIO barriers I discussed in the course of the original
review turned out mostly irrelevant to this driver, because as I have
learnt in a recent Alpha/Linux discussion starting here:
<https://marc.info/?i=alpine.LRH.2.02.1808161556450.13597%20()%20file01%20!%20intranet%20!%20prod%20!%20int%20!%20rdu2%20!%20redhat%20!%20com>
our MMIO API mandates the `readX' and `writeX' accessors to be strongly
ordered with respect to each other, even if that is not implicitly
enforced by hardware.

 Consequently I have removed all the explicit ordering barriers and
instead submitted a fix for MIPS MMIO implementation, which currently does
not guarantee strong ordering (the MIPS architecture does not define bus
ordering rules except in terms of SYNC barriers), as recorded here:
<https://patchwork.linux-mips.org/project/linux-mips/list/?series=1538>.

 Enforcing strong MMIO ordering can be costly however and is often
unnecessary, e.g. when using PIO to access network frame data in onboard
packet memory.  I have therefore retained the information that would be
lost by the removal of barriers, by defining accessor wrappers suffixed by
`_o' and `_u', for accesses that have to be ordered and can be unordered
respectively.

 If we ever have an API defined for weakly-ordered MMIO accesses, then
these wrappers can be redefined accordingly.  Right now they all expand to
the respective `_relaxed' accessors, because, again, enforcing the
ordering WRT DMA transfers can be costly and we don't need it here except
in one place, where I chose to use explicit `dma_rmb' instead.

 Similarly I have replaced the completion barriers with a read back from
the respective MMIO location (all adapter MMIO registers can be read with
no side effects incurred), which will serve its purpose on the basis of
MMIO being strongly ordered (although a read from TURBOchannel is going to
be slower than `iob', making the delay incurred unnecessarily longer).

 And last but not least, I have split off the SMT Tx network tap support
to a separate change, 2/2 in this series, so that it does not block the
driver proper and can be discussed separately.

 I think it has value in that it makes the view of the outgoing network
traffic complete, as if one actually physically tapped into the outgoing
line of the ring, between the station being examined and its downstream
neighbour.  Without this part only traffic passed from applications
through the whole protocol stack can be captured and this is only a part
of the view.

 With the `dev_queue_xmit_nit' interface now exported it's only
`ptype_all' that remains private, and to define a properly abstracted API
I propose to provide am exported `dev_nit_active' predicate that tells
whether any taps are active.  This predicate is then used accordingly.

 NB if there is a long-term maintenance concern about the `dev_nit_active'
predicate, then well, corresponding inline code currently present in
`xmit_one' has to be maintained anyway, and if the resulting changes
require `defza' to be updated accordingly, then I am going to handle it;
after some 20 years with Linux it's not that I am going to disappear
anywhere anytime.  And once I am dead, which is inevitably going to happen
sooner or later, then the driver can simply be ripped from the kernel.
Though I suspect that at that point no DECstation Linux users may survive
anymore, even though hardware, being as sturdy as it is, likely will.

 I have a patch for `tcpdump' to actually decode SMT frames, which I plan
to upstream sometime.  Here's a sample of SMT traffic captured through the
`defza' driver in a small network of 4 stations and no concentrators,
printed in the most verbose mode:

01:16:59.138381 4f 00:60:b0:58:41:e7 00:60:b0:58:41:e7 73: SMT NIF ann vid:1 tid:00000270 sid:00-00-00-60-b0-58-41-e7 len:40: UNA: 00 00 00 06 0d 1a 02 ae StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.332750 4f 08:00:2b:a3:a3:29 08:00:2b:a3:a3:29 73: SMT NIF ann vid:1 tid:0000013b sid:00-00-08-00-2b-a3-a3-29 len:40: UNA: 00 00 00 06 0d 1a 82 e7 StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.354479 4f 00:60:b0:58:40:75 00:60:b0:58:40:75 73: SMT NIF ann vid:1 tid:0000029c sid:00-00-00-60-b0-58-40-75 len:40: UNA: 00 00 10 00 d4 74 b6 ae StationDescr: 00 01 02 00 StationState: 00 00 31 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.442175 4f 00:60:b0:58:41:e7 Broadcast 73: SMT NIF req vid:1 tid:00000271 sid:00-00-00-60-b0-58-41-e7 len:40: UNA: 00 00 00 06 0d 1a 02 ae StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.448657 41 08:00:2b:a3:a3:29 00:60:b0:58:41:e7 73: SMT NIF rsp vid:1 tid:00000271 sid:00-00-08-00-2b-a3-a3-29 len:40: UNA: 00 00 00 06 0d 1a 82 e7 StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:01.015152 4f 08:00:2b:a3:a3:29 Broadcast 73: SMT NIF req vid:1 tid:0000013c sid:00-00-08-00-2b-a3-a3-29 len:40: UNA: 00 00 00 06 0d 1a 82 e7 StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:01.111644 41 08:00:2b:2e:6d:75 08:00:2b:a3:a3:29 73: SMT NIF rsp vid:1 tid:0000013c sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01
01:17:04.814603 4f 08:00:2b:2e:6d:75 Broadcast 73: SMT NIF req vid:1 tid:0000013c sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01
01:17:04.814939 4f 08:00:2b:2e:6d:75 Broadcast 73: SMT NIF req vid:1 tid:0000013c sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01
01:17:04.820960 4f 08:00:2b:2e:6d:75 08:00:2b:2e:6d:75 73: SMT NIF ann vid:1 tid:0000013b sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01

 Questions, comments?  Otherwise, please apply.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:46:07 -07:00
Maciej W. Rozycki
9f9a742db4 FDDI: defza: Support capturing outgoing SMT traffic
DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate
SMT frames with adapter's firmware.  Any SMT frame received from the RMC
via the Rx queue is queued back by the driver to the SMT Rx queue for
the firmware to process.  Similarly the firmware uses the SMT Tx queue
to supply the driver with SMT frames which are queued back to the Tx
queue for the RMC to send to the ring.

When a network tap is attached to an FDDI interface handled by `defza'
any incoming SMT frames captured are queued to our usual processing of
network data received, which in turn delivers them to any listening
taps.

However the outgoing SMT frames produced by the firmware bypass our
network protocol stack and are therefore not delivered to taps.  This in
turn means that taps are missing a part of network traffic sent by the
adapter, which may make it more difficult to track down network problems
or do general traffic analysis.

Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that
a network tap is attached, with a newly-created `dev_nit_active' helper
wrapping the usual condition used in the transmit path.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:46:06 -07:00
Maciej W. Rozycki
61414f5ec9 FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment
Corporation's first-generation FDDI network interface adapter, made for
TURBOchannel and based on a discrete version of what eventually became
Motorola's widely used CAMEL chipset.

The CAMEL chipset is present for example in the DEC FDDIcontroller
TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support
with the `defxx' driver, however the host bus interface logic and the
firmware API are different in the DEFZA and hence a separate driver is
required.

There isn't much to say about the driver except that it works, but there
is one peculiarity to mention.  The adapter implements two Tx/Rx queue
pairs.

Of these one pair is the usual network Tx/Rx queue pair, in this case
used by the adapter to exchange frames with the ring, via the RMC (Ring
Memory Controller) chip.  The Tx queue is handled directly by the RMC
chip and resides in onboard packet memory.  The Rx queue is maintained
via DMA in host memory by adapter's firmware copying received data
stored by the RMC in onboard packet memory.

The other pair is used to communicate SMT frames with adapter's
firmware.  Any SMT frame received from the RMC via the Rx queue must be
queued back by the driver to the SMT Rx queue for the firmware to
process.  Similarly the firmware uses the SMT Tx queue to supply the
driver with SMT frames that must be queued back to the Tx queue for the
RMC to send to the ring.

This solution was chosen because the designers ran out of PCB space and
could not squeeze in more logic onto the board that would be required to
handle this SMT frame traffic without the need to involve the driver, as
with the later DEFTA/DEFEA/DEFPA adapters.

Finally the driver does some Frame Control byte decoding, so to avoid
magic numbers some macros are added to <linux/if_fddi.h>.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:46:06 -07:00
Serhey Popovych
df52eab23d tun: Consistently configure generic netdev params via rtnetlink
Configuring generic network device parameters on tun will fail in
presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
since tun_validate() always return failure.

This can be visualized with following ip-link(8) command sequences:

  # ip link set dev tun0 group 100
  # ip link set dev tun0 group 100 type tun
  RTNETLINK answers: Invalid argument

with contrast to dummy and veth drivers:

  # ip link set dev dummy0 group 100
  # ip link set dev dummy0 type dummy

  # ip link set dev veth0 group 100
  # ip link set dev veth0 group 100 type veth

Fix by returning zero in tun_validate() when @data is NULL that is
always in case since rtnl_link_ops->maxtype is zero in tun driver.

Fixes: f019a7a594 ("tun: Implement ip link del tunXXX")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:40:31 -07:00
David Disseldorp
33b3f8ca51 scsi: target: stash sess_err_stats on Data-Out timeout
sess_err_stats are currently filled on NOP ping timeout, but not Data-Out
timeout. Stash details of Data-Out timeouts using a
ISCSI_SESS_ERR_CXN_TIMEOUT value for last_sess_failure_type.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:38:37 -04:00
David Disseldorp
dce6190ca7 scsi: target: split out helper for cxn timeout error stashing
Replace existing nested code blocks with helper function calls.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:38:37 -04:00
David Disseldorp
c62ae3005b scsi: target: log NOP ping timeouts as errors
Events resulting in connection outages like this should be logged as
errors. Include the I_T Nexus in the message to aid path identification.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:38:36 -04:00
David Disseldorp
d9a771fd42 scsi: target: log Data-Out timeouts as errors
Data-Out timeouts resulting in connection outages should be logged as
errors. Include the I_T Nexus in the message to aid path identification.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:38:36 -04:00
David Disseldorp
df711553f4 scsi: target: use ISCSI_IQN_LEN in iscsi_target_stat
Move the ISCSI_IQN_LEN definition up, so that it can be used in more
places instead of a hardcoded value.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:38:36 -04:00
Wenwen Wang
58f5bbe331 ethtool: fix a privilege escalation bug
In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the
use-space buffer 'useraddr' and checked to see whether it is
ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from
the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next,
according to 'sub_cmd', a permission check is enforced through the function
ns_capable(). For example, the permission check is required if 'sub_cmd' is
ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is
ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be
done by anyone". The following execution invokes different handlers
according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE,
ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel
object 'per_queue_opt' is copied again from the user-space buffer
'useraddr' and 'per_queue_opt.sub_command' is used to determine which
operation should be performed. Given that the buffer 'useraddr' is in the
user space, a malicious user can race to change the sub-command between the
two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and
ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then
before ethtool_set_per_queue() is called, the attacker changes
ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can
bypass the permission check and execute ETHTOOL_SCOALESCE.

This patch enforces a check in ethtool_set_per_queue() after the second
copy from 'useraddr'. If the sub-command is different from the one obtained
in the first copy in dev_ethtool(), an error code EINVAL will be returned.

Fixes: f38d138a7d ("net/ethtool: support set coalesce per queue")
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:37:58 -07:00
Wenwen Wang
2bb3207dbb ethtool: fix a missing-check bug
In ethtool_get_rxnfc(), the eth command 'cmd' is compared against
'ETHTOOL_GRXFH' to see whether it is necessary to adjust the variable
'info_size'. Then the whole structure of 'info' is copied from the
user-space buffer 'useraddr' with 'info_size' bytes. In the following
execution, 'info' may be copied again from the buffer 'useraddr' depending
on the 'cmd' and the 'info.flow_type'. However, after these two copies,
there is no check between 'cmd' and 'info.cmd'. In fact, 'cmd' is also
copied from the buffer 'useraddr' in dev_ethtool(), which is the caller
function of ethtool_get_rxnfc(). Given that 'useraddr' is in the user
space, a malicious user can race to change the eth command in the buffer
between these copies. By doing so, the attacker can supply inconsistent
data and cause undefined behavior because in the following execution 'info'
will be passed to ops->get_rxnfc().

This patch adds a necessary check on 'info.cmd' and 'cmd' to confirm that
they are still same after the two copies in ethtool_get_rxnfc(). Otherwise,
an error code EINVAL will be returned.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:37:01 -07:00
Jian-Hong Pan
d49c88d767 r8169: Enable MSI-X on RTL8106e
Originally, we have an issue where r8169 MSI-X interrupt is broken after
S3 suspend/resume on RTL8106e of ASUS X441UAR.

02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
	Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	I/O ports at e000 [size=256]
	Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
	Memory at e0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Kernel driver in use: r8169
	Kernel modules: r8169

We found the all of the values in PCI BAR=4 of the ethernet adapter
become 0xFF after system resumes.  That breaks the MSI-X interrupt.
Therefore, we can only fall back to MSI interrupt to fix the issue at
that time.

However, there is a commit which resolves the drivers getting nothing in
PCI BAR=4 after system resumes.  It is 04cb3ae895d7 "PCI: Reprogram
bridge prefetch registers on resume" by Daniel Drake.

After apply the patch, the ethernet adapter works fine before suspend
and after resume.  So, we can revert the workaround after the commit
"PCI: Reprogram bridge prefetch registers on resume" is merged into main
tree.

This patch reverts commit 7bb05b85bc
"r8169: don't use MSI-X on RTL8106e".

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201181
Fixes: 7bb05b85bc ("r8169: don't use MSI-X on RTL8106e")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:31:53 -07:00
Xiang Chen
3bccfba831 scsi: hisi_sas: Update v3 hw AIP_LIMIT and CFG_AGING_TIME register values
Update registers as follows:
- Default value of AIP timer is 1ms, and it is easy for some expanders to
  cause IO error. Change the value to max value 65ms to avoid IO error for
  those expanders.

- A CQ completion will be reported by HW when 4 CQs have occurred or the
  aging timer expires, whichever happens first. Sor serial IO scenario, it
  will still wait 8us for every IO before it is reported. So in the
  situation, the performance is poor. So to improve it, change the limit
  time to the least value.
  For other scenario, it does little affect to the performance.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Xiang Chen
784b46b7cb scsi: hisi_sas: Use block layer tag instead for IPTT
Currently we use the IPTT defined in LLDD to identify IOs. Actually for
IOs which are from the block layer, they have tags to identify them. So
for those IOs, use tag of the block layer directly, and for IOs which is
not from the block layer (such as internal IOs from libsas/LLDD), reserve
96 IPTTs for them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Xiang Chen
6ecf5ba13c scsi: hisi_sas: unmask interrupts ent72 and ent74
The interrupts of ent72 and ent74 are not processed by PCIe AER handling,
so we need to unmask the interrupts and process them first in the driver.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Xiang Chen
3e178f3ecf scsi: hisi_sas: Free slot later in slot_complete_vx_hw()
If an SSP/SMP IO times out, it may be actually in reality be
simultaneously processing completion of the slot in
slot_complete_vx_hw().

Then if the slot is freed in slot_complete_vx_hw() (this IPTT is freed
and it may be re-used by other slot), and we may abort the wrong slot in
hisi_sas_abort_task().

So to solve the issue, free the slot after the check of
SAS_TASK_STATE_ABORTED in slot_complete_vx_hw().

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Xiang Chen
584f53fe5f scsi: hisi_sas: Fix the race between IO completion and timeout for SMP/internal IO
If SMP/internal IO times out, we will possibly free the task immediately.

However if the IO actually completes at the same time, the IO completion
may refer to task which has been freed.

So to solve the issue, flush the tasklet to finish IO completion before
free'ing slot/task.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Luo Jiaxing
1668e3b6f8 scsi: hisi_sas: Move evaluation of hisi_hba in hisi_sas_task_prep()
In evaluating hisi_hba, the sas_port may be NULL, so for safety relocate
the the check to value possible NULL deference.

Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Luo Jiaxing
5a54691f87 scsi: hisi_sas: Feed back linkrate(max/min) when re-attached
At directly attached situation, if the user modifies the sysfs interface
of maximum_linkrate and minimum_linkrate to renegotiate the linkrate
between SAS controller and target, the value of both files mentioned
above should have change to user setting after renegotiate is over, but
it remains unchanged.

To fix this bug, maximum_linkrate and minimum_linkrate will be directly
fed back to relevant sas_phy structure.

Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:03 -04:00
Nicholas Bellinger
38fe73cc2c scsi: target: Fix target_wait_for_sess_cmds breakage with active signals
With the addition of commit 00d909a107 ("scsi: target: Make the session
shutdown code also wait for commands that are being aborted") in v4.19-rc, it
incorrectly assumes no signals will be pending for task_struct executing the
normal session shutdown and I/O quiesce code-path.

For example, iscsi-target and iser-target issue SIGINT to all kthreads as part
of session shutdown.  This has been the behaviour since day one.

As-is when signals are pending with se_cmds active in se_sess->sess_cmd_list,
wait_event_interruptible_lock_irq_timeout() returns a negative number and
immediately kills the machine because of the do while (ret <= 0) loop that was
added in commit 00d909a107 to spin while backend I/O is taking any amount of
extended time (say 30 seconds) to complete.

Here's what it looks like in action with debug plus delayed backend I/O
completion:

[ 4951.909951] se_sess: 000000003e7e08fa before target_wait_for_sess_cmds
[ 4951.914600] target_wait_for_sess_cmds: signal_pending: 1
[ 4951.918015] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 0
[ 4951.921639] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 1
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 2
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 3
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 4
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 5
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 6
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 7
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 8
[ 4951.921944] wait_event_interruptible_lock_irq_timeout ret: -512 signal_pending: 1 loop count: 9

... followed by the usual RCU CPU stalls and deadlock.

There was never a case pre commit 00d909a107 where
wait_for_complete(&se_cmd->cmd_wait_comp) was able to be interrupted, so to
address this for v4.19+ moving forward go ahead and use
wait_event_lock_irq_timeout() instead so new code works with all fabric
drivers.

Also for commit 00d909a107, fix a minor regression in
target_release_cmd_kref() to only wake_up the new se_sess->cmd_list_wq only
when shutdown has actually been triggered via se_sess->sess_tearing_down.

Fixes: 00d909a107 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted")
Cc: <stable@vger.kernel.org> # v4.19+
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Bryant G. Ly <bly@catalogicsoftware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:11:13 -04:00
Nicholas Bellinger
25ab0bc334 scsi: sched/wait: Add wait_event_lock_irq_timeout for TASK_UNINTERRUPTIBLE usage
Short of reverting commit 00d909a107 ("scsi: target: Make the session
shutdown code also wait for commands that are being aborted") for v4.19,
target-core needs a wait_event_t macro can be executed using
TASK_UNINTERRUPTIBLE to function correctly with existing fabric drivers that
expect to run with signals pending during session shutdown and active se_cmd
I/O quiesce.

The most notable is iscsi-target/iser-target, while ibmvscsi_tgt invokes
session shutdown logic from userspace via configfs attribute that could also
potentially have signals pending.

So go ahead and introduce wait_event_lock_irq_timeout() to achieve this, and
update + rename __wait_event_lock_irq_timeout() to make it accept 'state' as a
parameter.

Fixes: 00d909a107 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted")
Cc: <stable@vger.kernel.org> # v4.19+
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Bryant G. Ly <bly@catalogicsoftware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:11:13 -04:00
Hannes Reinecke
0b4aafc332 scsi: libfc: retry PRLI if we cannot analyse the payload
When we fail to analyse the payload of a PRLI response we should reset
the state machine to retry the PRLI; eventually we will be getting a
proper frame.  Not doing so will result in a stuck state machine and the
port never to be presented to the systsm.

Suggested-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:53:37 -04:00
Thomas Abraham
aad1271a48 scsi: libfc: check fc_frame_payload_get() return value for null
We should not assume the payload of a PRLI or PLOGI respons is always
present.

Signed-off-by: Thomas Abraham <tabraham@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:53:36 -04:00
Hannes Reinecke
a33e5bfb29 scsi: core: Allow state transitions from OFFLINE to BLOCKED
When an RSCN gets delayed (or not being sent at all), the transport class
will detect an error, EH kicks in, and eventually will be setting the
device to offline.  If we receive an RSCN after that, the device will
stay in 'offline'.  This patch allows for an 'offline' to 'blocked'
transition, thereby allowing the device to become active again.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:53:29 -04:00
Christoph Hellwig
86117d7f95 scsi: esp_scsi: remove union in esp_cmd_priv
The dma_addr_t member is unused ever since we switched the SCSI
layer to send down single-segement command using a scatterlist
as well many years ago.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:00:39 -04:00
Christoph Hellwig
3f9295b65e scsi: esp_scsi: move dma mapping into the core code
Except for the mac_esp driver, which uses PIO or pseudo DMA, all drivers
share the same dma mapping calls.  Move the dma mapping into the core
code using the scsi_dma_map / scsi_dma_unmap helpers, with a special
identify mapping variant triggered off a new ESP_FLAG_NO_DMA_MAP flag
for mac_esp.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:00:38 -04:00
Christoph Hellwig
44b1b4d24b scsi: esp_scsi: remove the dev argument to scsi_esp_register
We can simplify use esp->dev now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:00:38 -04:00
Christoph Hellwig
98cda6a2e0 scsi: esp_scsi: use strong typing for the dev field
esp->dev is a void pointer that points either to a struct device, or a
struct platform_device.  As we can easily get from the device to the
platform_device if needed change it to always point to a struct device
and properly type the pointer to avoid errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:00:38 -04:00
Christoph Hellwig
10c0cd38ce scsi: sun_esp: don't use GFP_ATOMIC for command block allocation
esp_sbus_map_command_block is called straight from the probe routine
without any locks held, so we can safely use GFP_KERNEL here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:00:38 -04:00
Christoph Hellwig
d47b3bd797 scsi: am53c974: use the generic DMA API
Remove usage of the legacy PCI DMA API.  To make this easier we also
store a struct device instead of pci_dev in the dev field of struct
esp.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 23:00:38 -04:00
Venkat Gopalakrishnan
5adaf1e8d5 scsi: ufs: make UFS Tx lane1 clock optional for QCOM platforms
Per Qcom's UFS host controller HW design, the UFS Tx lane1 clock could be
muxed with Tx lane0 clock, hence keep Tx lane1 clock optional by ignoring
it if it is not provided in device tree. This change also performs some
cleanup to lanes per direction checks when enable/disable lane clocks
just for symmetry.

Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Reviewed-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 22:55:44 -04:00
Arnd Bergmann
664e68bcab scsi: ufs: fix integer type usage in uapi header
We get a warning from 'make headers_check' about a newly introduced usage
of integer types in the scsi/scsi_bsg_ufs.h uapi header:

usr/include/scsi/scsi_bsg_ufs.h:18: found __[us]{8,16,32,64} type without #include <linux/types.h>

Aside from the missing linux/types.h inclusion, I also noticed that it
uses the wrong types: 'u32' is not available at all in user space, and
'uint32_t' depends on the inclusion of a standard header that we should
not include from kernel headers.

Change the all to __u32 and similar types here.

I also note the usage of '__be32' and '__be16' that seems unfortunate for
a user space API. I wonder if it would be better to define the interface
in terms of a CPU-endian structure and convert it in kernel space.

Fixes: e77044c5a8 ("scsi: ufs-bsg: Add support for uic commands in ufs_bsg_request()")
Fixes: df032bf27a ("scsi: ufs: Add a bsg endpoint that supports UPIUs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 22:44:30 -04:00
Christoph Hellwig
416c461372 scsi: lpfc: remove a bogus pci_dma_sync_single_for_device call
dma_alloc_coherent allocates memory that can be used by the cpu and the
device at the same time, calls to pci_dma_sync_* are not required, and in
fact actively harmful on some architectures like arm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 22:41:00 -04:00
Christoph Hellwig
67d98f0a83 scsi: megaraid_mbox: remove bogus use of pci_dma_sync_sg_* APIs
The dma_map_sg / dma_unmap_sg APIs called from scsi_dma_map /
scsi_dma_unmap already transfer memory ownership to the device or cpu
respectively.  Adding additional calls to pci_dma_sync_sg_* will in fact
lead to data corruption if we end up using swiotlb for some reason.

Also remove the now pointless megaraid_mbox_sync_scb function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 22:39:23 -04:00
Jens Axboe
804186fa95 xsysace: convert to blk-mq
Straight forward conversion, using an internal list to enable the
driver to pull requests at will.

Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:08:24 -06:00
Jens Axboe
77218ddf46 paride: convert pf to blk-mq
Tested-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:08:15 -06:00
Jens Axboe
99fe8b02a8 paride: convert pd to blk-mq
Tested-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:08:12 -06:00
Jens Axboe
89c6b16509 paride: convert pcd to blk-mq
Tested-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:08:07 -06:00
Jens Axboe
fab1adcf95 ps3disk: convert to blk-mq
Convert from the old request_fn style driver to blk-mq.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:07:56 -06:00
Jens Axboe
9316a9ed68 blk-mq: provide helper for setting up an SQ queue and tag set
This pattern is repeated throughout all the blk-mq conversions.
Provide a basic helper to get it done.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:05:18 -06:00
YueHaibing
de038597be null_blk: remove set but not used variable 'q'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/block/null_blk_main.c: In function 'end_cmd':
drivers/block/null_blk_main.c:609:24: warning:
 variable 'q' set but not used [-Wunused-but-set-variable]

It not used any more after commit
e50b1e327a ("null_blk: remove legacy IO path")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-15 20:02:59 -06:00
David S. Miller
a06ecbfe78 Revert "sparc: Convert to using %pOFn instead of device_node.name"
This reverts commit 0b9871a3a8.

Causes crashes with qemu, interacts badly with commit commit
6d0a70a284 ("vsprintf: print OF node name using full_name")
etc.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 18:32:54 -07:00
Jakub Kicinski
0b592b5a01 tools: bpftool: add map create command
Add a way of creating maps from user space.  The command takes
as parameters most of the attributes of the map creation system
call command.  After map is created its pinned to bpffs.  This makes
it possible to easily and dynamically (without rebuilding programs)
test various corner cases related to map creation.

Map type names are taken from bpftool's array used for printing.
In general these days we try to make use of libbpf type names, but
there are no map type names in libbpf as of today.

As with most features I add the motivation is testing (offloads) :)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:39:21 -07:00
Alexei Starovoitov
2f1d774f7d Merge branch 'bpftool_sockmap'
John Fastabend says:

====================
The first patch adds support for attaching programs to maps. This is
needed to support sock{map|hash} use from bpftool. Currently, I carry
around custom code to do this so doing it using standard bpftool will
be great.

The second patch adds a compat mode to ignore non-zero entries in
the map def. This allows using bpftool with maps that have a extra
fields that the user knows can be ignored. This is needed to work
correctly with maps being loaded by other tools or directly via
syscalls.

v3: add bash completion and doc updates for --mapcompat
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:13:15 -07:00
John Fastabend
c034a177d3 bpf: bpftool, add flag to allow non-compat map definitions
Multiple map definition structures exist and user may have non-zero
fields in their definition that are not recognized by bpftool and
libbpf. The normal behavior is to then fail loading the map. Although
this is a good default behavior users may still want to load the map
for debugging or other reasons. This patch adds a --mapcompat flag
that can be used to override the default behavior and allow loading
the map even when it has additional non-zero fields.

For now the only user is 'bpftool prog' we can switch over other
subcommands as needed. The library exposes an API that consumes
a flags field now but I kept the original API around also in case
users of the API don't want to expose this. The flags field is an
int in case we need more control over how the API call handles
errors/features/etc in the future.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:13:14 -07:00
John Fastabend
b7d3826c2e bpf: bpftool, add support for attaching programs to maps
Sock map/hash introduce support for attaching programs to maps. To
date I have been doing this with custom tooling but this is less than
ideal as we shift to using bpftool as the single CLI for our BPF uses.
This patch adds new sub commands 'attach' and 'detach' to the 'prog'
command to attach programs to maps and then detach them.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:13:14 -07:00
Alexei Starovoitov
7d1f12b8b2 Merge branch 'ipv6_sk_lookup_fixes'
Joe Stringer says:

====================
This series includes a couple of fixups for the IPv6 socket lookup
helper, to make the API more consistent (always supply all arguments in
network byte-order) and to allow its use when IPv6 is compiled as a
module.
====================

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:08:40 -07:00
Joe Stringer
5ef0ae84f0 bpf: Fix IPv6 dport byte-order in bpf_sk_lookup
Commit 6acc9b432e ("bpf: Add helper to retrieve socket in BPF")
mistakenly passed the destination port in network byte-order to the IPv6
TCP/UDP socket lookup functions, which meant that BPF writers would need
to either manually swap the byte-order of this field or otherwise IPv6
sockets could not be located via this helper.

Fix the issue by swapping the byte-order appropriately in the helper.
This also makes the API more consistent with the IPv4 version.

Fixes: 6acc9b432e ("bpf: Add helper to retrieve socket in BPF")
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:08:39 -07:00
Joe Stringer
8a615c6b03 bpf: Allow sk_lookup with IPv6 module
This is a more complete fix than d71019b54b ("net: core: Fix build
with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
module is loaded (not just if it's compiled in).

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:08:39 -07:00
Matthew Wilcox
a309d5db58 idr: Change documentation license
This documentation was inadvertently released under the CC-BY-SA-4.0
license.  It was intended to be released under GPL-2.0 or later.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-15 16:31:29 -04:00