Commit graph

767209 commits

Author SHA1 Message Date
Benjamin Herrenschmidt
ac6e31d35f ARM: dts: Add Aspeed SoC USB controllers to device-tree
This adds the USB controllers to the DT template of the
AST24xx and AST25xx SoCs.

This patch doesn't enable them by default on any board specific
.dts yet. This will be done when we have the necessary clock/reset
and pinmux support. In the meantime though, this will work if
u-boot configures things properly.

For the AST2400 I only added pinmux definition for port 1
which is dual USB1/USB2. There are additional USB1 only ports
that might require more work but I don't have HW to test at
hand so I'm leaving that to whoever cares.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:30 +09:30
James Feist
c4043ecac3 ARM: dts: aspeed: Add S2600WF BMC Machine
S2600WF is a Intel platform family with an ASPEED AST2500 BMC.

Signed-off-by: James Feist <james.feist@linux.intel.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Brian Yang
25337c7354 ARM: dts: aspeed: Add Inventec Lanyang BMC
The Inventec Lanyang is Power 9 platform with ast2500 BMC.

Signed-off-by: Brian Yang <yang.brianc.w@inventec.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Amithash Prasad
c808a10435 ARM: dts: aspeed: Add Portwell Neptune machine
Initial introduction of Portwell Neptune family equipped with
Aspeed 2500 BMC SoC. Neptune is a x86 server development kit with a
ASPEED ast2500 BMC manufactured by Portwell. Specifically, This
adds the neptune platform device tree file including the flash
layout used by the neptune machines.

Signed-off-by: Amithash Prasad <amithash@fb.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Eddie James
f45ffcc634 ARM: dts: aspeed: witherspoon: Set alternate boot
Set watchdog 2 to boot from the alternate flash chip when the watchdog
timer expires and the system is reset. This enables "brick protection."

Signed-off-by: Eddie James <eajames@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Christopher Bostic
af8f533b2a ARM: dts: aspeed: witherspoon: Add gpio keys for power supply presence
Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Brad Bishop
2013b14f90 ARM: dts: aspeed: witherspoon: Enable checkstop and cooling gpio keys
Enable gpio-keys events for the checkstop and water/air cooled
gpios for use by applications on the Witherspoon system.

Signed-off-by: Brad Bishop <bradleyb@fuzziesquirrel.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Lei YU
fa41a7fdcd ARM: dts: aspeed: zaius: Add pcie-e2b-present gpio key
Add GPIO key to check presence of PCIE E2B.

Signed-off-by: Lei YU <mine260309@gmail.com>
Acked-by: Xo Wang <xow@google.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Lei YU
168dbc3565 ARM: dts: aspeed: romulus: Add id-button gpio key
Signed-off-by: Lei YU <mine260309@gmail.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
2018-05-25 13:57:22 +09:30
Inki Dae
12678199c7 drm/exynos: scaler: fix static checker warning
drivers/gpu/drm/exynos/exynos_drm_scaler.c:402 scaler_task_done()
warn: signedness bug returning '(-22)'

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2018-05-25 13:09:57 +09:00
Niklas Cassel
5ec3444c83 firmware: qcom: scm: Fix crash in qcom_scm_call_atomic1()
qcom_scm_call_atomic1() can crash with a NULL pointer dereference at
qcom_scm_call_atomic1+0x30/0x48.

disassembly of qcom_scm_call_atomic1():
...
<0xc08d73b0 <+12>: ldr r3, [r12]
... (no instruction explicitly modifies r12)
0xc08d73cc <+40>: smc 0
... (no instruction explicitly modifies r12)
0xc08d73d4 <+48>: ldr r3, [r12] <- crashing instruction
...

Since the first ldr is successful, and since r12 isn't explicitly
modified by any instruction between the first and the second ldr,
it must have been modified by the smc call, which is ok,
since r12 is caller save according to the AAPCS.

Add r12 to the clobber list so that the compiler knows that the
callee potentially overwrites the value in r12.
Clobber descriptions may not in any way overlap with an input or
output operand.

Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
2018-05-24 22:36:45 -05:00
David S. Miller
9c5904904b Merge branch 'nfp-offload-LAG-for-tc-flower-egress'
Jakub Kicinski says:

====================
nfp: offload LAG for tc flower egress

This series from John adds bond offload to the nfp driver.  Patch 5
exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
hashing matches that of the software LAG.  This may be unnecessarily
conservative, let's see what LAG maintainers think :)

John says:

This patchset sets up the infrastructure and offloads output actions for
when a TC flower rule attempts to egress a packet to a LAG port.

Firstly it adds some of the infrastructure required to the flower app and
to the nfp core. This includes the ability to change the MAC address of a
repr, a function for combining lookup and write to a FW symbol, and the
addition of private data to a repr on a per app basis.

Patch 6 continues by implementing notifiers that track Linux bonds and
communicates to the FW those which enslave reprs, along with the current
state of reprs within the bond.

Patch 7 ensures bonds are synchronised with FW by receiving and acting
upon cmsgs sent to the kernel. These may request that a bond message is
retransmitted when FW can process it, or may request a full sync of the
bonds defined in the kernel.

Patch 8 offloads a flower action when that action requires egressing to a
pre-defined Linux bond.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:57 -04:00
John Hurley
7e24a59311 nfp: flower: compute link aggregation action
If the egress device of an offloaded rule is a LAG port, then encode the
output port to the NFP with a LAG identifier and the offloaded group ID.

A prelag action is also offloaded which must be the first action of the
series (although may appear after other pre-actions - e.g. tunnels). This
causes the FW to check that it has the necessary information to output to
the requested LAG port. If it does not, the packet is sent to the kernel
before any other actions are applied to it.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:57 -04:00
John Hurley
2e1cc5226b nfp: flower: implement host cmsg handler for LAG
Adds the control message handler to synchronize offloaded group config
with that of the kernel. Such messages are sent from fw to driver and
feature the following 3 flags:

- Data: an attached cmsg could not be processed - store for retransmission
- Xon: FW can accept new messages - retransmit any stored cmsgs
- Sync: full sync requested so retransmit all kernel LAG group info

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:57 -04:00
John Hurley
bb9a8d0311 nfp: flower: monitor and offload LAG groups
Monitor LAG events via the NETDEV_CHANGEUPPER/NETDEV_CHANGELOWERSTATE
notifiers to maintain a list of offloadable groups. Sync these groups with
HW via a delayed workqueue to prevent excessive re-configuration. When the
workqueue is triggered it may generate multiple control messages for
different groups. These messages are linked via a batch ID and flags to
indicate a new batch and the end of a batch.

Update private data in each repr to track their LAG lower state flags. The
state of a repr is used to determine the active netdevs that can be
offloaded. For example, in active-backup mode, we only offload the netdev
currently active.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:57 -04:00
John Hurley
f44aa9ef79 net: include hash policy in LAG changeupper info
LAG upper event notifiers contain the tx type used by the LAG device.
Extend this to also include the hash policy used for tx types that
utilize hashing.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:57 -04:00
John Hurley
b945245297 nfp: flower: add per repr private data for LAG offload
Add a bitmap to each flower repr to track its state if it is enslaved by a
bond. This LAG state may be different to the port state - for example, the
port may be up but LAG state may be down due to the selection in an
active/backup bond.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:56 -04:00
John Hurley
898bc7d634 nfp: flower: check for/turn on LAG support in firmware
Check if the fw contains the _abi_flower_balance_sync_enable symbol. If it
does then write a 1 to this indicating that the driver is willing to
receive NIC to kernel LAG related control messages.

If the write is successful, update the list of extra features supported by
the fw and add a stub to accept LAG cmsgs.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:56 -04:00
John Hurley
1945ca7a81 nfp: nfpcore: add rtsym writing function
Add an rtsym API function that combines the lookup of a symbol and the
writing of a value to it. Values can be written as unsigned 32 or 64 bits.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:56 -04:00
John Hurley
24f132e29c nfp: add ndo_set_mac_address for representors
Adding a netdev to a bond requires that its mac address can be modified.
The default eth_mac_addr is sufficient to satisfy this requirement.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:10:56 -04:00
Stephen Hemminger
d97cde6ab5 hv_netvsc: fix bogus ifalias on network device
If the guest network adapter is not configured with DeviceNaming
enabled on the host, then the query for friendly name will return
success but with a zero length name. Which then leads to a garbage value
(stack contents) for ifalias.

Fix is simple, just don't set name if  host doesn't return it.

Fixes: 0fe554a46a ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:06:33 -04:00
Govindarajulu Varadarajan
322eaa06d5 enic: set DMA mask to 47 bit
In commit 624dbf55a3 ("driver/net: enic: Try DMA 64 first, then
failover to DMA") DMA mask was changed from 40 bits to 64 bits.
Hardware actually supports only 47 bits.

Fixes: 624dbf55a3 ("driver/net: enic: Try DMA 64 first, then failover to DMA")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:05:30 -04:00
David S. Miller
de0a8267e2 Merge branch 'net-Update-fib_table_lookup-tracepoints'
David Ahern says:

====================
net: Update fib_table_lookup tracepoints

Update the FIB lookup tracepoints to include ip proto and port fields
from the flow struct. In the process make the IPv4 tracepoint inline
with IPv6 which is much easier to use and follow the lookup and result.

Remove the tracepoint in fib_validate_source which does not provide
value above the fib_table_lookup which immediately follows it.

v2
- move CREATE_TRACE_POINTS for the v6 tracepoint to route.c to handle
  its need for an internal function to convert route type to error and
  handle IPv6 as a module or builtin. Reported by kbuild robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:01:23 -04:00
David Ahern
c949cbbbe5 net/ipv4: Remove tracepoint in fib_validate_source
Tracepoint does not add value and the call to fib_lookup follows
it which shows the same information and the fib lookup result.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:01:15 -04:00
David Ahern
30d444d300 net/ipv6: Udate fib6_table_lookup tracepoint
Commit bb0ad1987e ("ipv6: fib6_rules: support for match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:01:15 -04:00
David Ahern
9f323973c9 net/ipv4: Udate fib_table_lookup tracepoint
Commit 4a2d73a4fb ("ipv4: fib_rules: support match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.

In addition, make the IPv4 tracepoint similar to the IPv6 one where
the lookup parameters and result are dumped in 1 event. It is much
easier to use and understand the outcome of the lookup.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:00:31 -04:00
Cong Wang
aaa908ffbe net_sched: switch to rcu_work
Commit 05f0fe6b74 ("RCU, workqueue: Implement rcu_work") introduces
new API's for dispatching work in a RCU callback. Now we can just
switch to the new API's for tc filters. This could get rid of a lot
of code.

Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:56:15 -04:00
Eric Biggers
af8d3c7c00 ppp: remove the PPPIOCDETACH ioctl
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea.  It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference.  However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances.  As reported by syzbot, this can trivially
be used to cause a use-after-free.

Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.

All pppd versions released in the last 15 years just close() the file
descriptor instead.

Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
a stub in place that prints a one-time warning and returns EINVAL.

Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:55:07 -04:00
David S. Miller
1bb58d2d3c Merge branch 'Mirroring-tests-involving-VLAN'
Petr Machata says:

====================
Mirroring tests involving VLAN

This patchset tests mirror-to-gretap with various underlay
configurations involving VLAN netdevice in particular. Some of the tests
involve bridges as well, but tests aimed specifically at testing bridges
(i.e. FDB, STP) are not part of this patchset.

In patches #1-#6, the codebase is adapted to support the new tests.

In patch #7, a test for mirroring to VLAN is introduced.

Patches #8-#10 add three tests where VLAN is part of underlay path after
gretap encapsulation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:20 -04:00
Petr Machata
181d95f8e1 selftests: forwarding: Test mirror-to-gre w/ UL 802.1d+VLAN
Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at an 802.1d bridge and packet egresses through a
VLAN device.

Besides testing basic connectivity, this also tests that the traffic is
properly tagged.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:20 -04:00
Petr Machata
a08fb9f1ad selftests: forwarding: Test mirror-to-gre w/ UL VLAN
Test for "tc action mirred egress mirror" that mirrors to a gretap
netdevice whose underlay route points at a vlan device.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:20 -04:00
Petr Machata
0056042f80 selftests: forwarding: Test mirror-to-gre w/ UL VLAN+802.1q
Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at a vlan device on top of a bridge device with
vlan filtering (802.1q).

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
Petr Machata
35388a6a0c selftests: forwarding: Test mirror-to-vlan
Test for "tc action mirred egress mirror" that mirrors to a vlan device.
- test_vlan() tests that the packets get mirrored
- test_tagged_vlan() tests that the mirrored packets have correct inner
  VLAN tag.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
Petr Machata
87c0c046e8 selftests: forwarding: lib: Extract trap_{, un}install()
A mirror-to-vlan test that's coming next needs to install the trap
unconditionally. Therefore extract from slow_path_trap_{,un}install()
a more generic functions trap_install() and trap_uninstall(), and covert
the former two to conditional wrappers around these.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
Petr Machata
1893150fd5 selftests: forwarding: mirror_gre_lib: Support VLAN
Add full_test_span_gre_dir_vlan_ips() and full_test_span_gre_dir_vlan()
to support mirror-to-gre tests that involve VLAN.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
Petr Machata
0e7a504c09 selftests: forwarding: lib: Support VLAN devices
Add vlan_create() and vlan_destroy() to manage VLAN netdevices.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
Petr Machata
91bac7f997 selftests: forwarding: Add $h3's clsact to mirror_topo_lib.sh
Having a clsact qdisc on $h3 is useful in several tests, and will be
useful in more tests to come. Move the registration from all the tests
that need it into the topology file itself.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
Petr Machata
d5ea2bfc80 selftests: forwarding: mirror_gre_lib: Extract generic functions
For non-GRE mirroring tests, a functions along the lines of
do_test_span_gre_dir_ips() and test_span_gre_dir_ips() are necessary,
but such that they don't assume tunnels are involved. Extract the code
from mirror_gre_lib.sh to mirror_lib.sh and convert to just use a given
device without assuming it's named "h3-$tundev". Convert the two
above-mentioned functions to wrappers that pass along the correct device
name.

Add test_span_dir() and fail_test_span_dir() to round up the API for use
by following patches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
Petr Machata
74ed089d48 selftests: forwarding: Split mirror_gre_topo_lib.sh
Move generic parts of mirror_gre_topo_lib.sh into a new file
mirror_topo_lib.sh. Reuse the functions in GRE topo, adding the tunnel
devices as necessary.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:26:19 -04:00
David S. Miller
90fed9c946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-05-24

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).

2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.

3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.

4) Jiong Wang adds support for indirect and arithmetic shifts to NFP

5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.

6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
   to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.

7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.

8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.

9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:20:51 -04:00
David S. Miller
49a473f5b5 Merge branch 'ibmvnic-Failover-hardening'
Thomas Falcon says:

====================
ibmvnic: Failover hardening

Introduce additional transport event hardening to handle
events during device reset. In the driver's current state,
if a transport event is received during device reset, it can
cause the device to become unresponsive as invalid operations
are processed as the backing device context changes. After
a transport event, the device expects a request to begin the
initialization process. If the driver is still processing
a previously queued device reset in this state, it is likely
to fail as firmware will reject any commands other than the
one to initialize the client driver's Command-Response Queue.

Instead of failing and becoming dormant, the driver will make
one more attempt to recover and continue operation. This is
achieved by setting a state flag, which if true will direct
the driver to clean up all allocated resources and perform
a hard reset in an attempt to bring the driver back to an
operational state.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:26 -04:00
Thomas Falcon
2770a7984d ibmvnic: Introduce hard reset recovery
Introduce a recovery hard reset to handle reset failure as a result of
change of device context following a transport event, such as a
backing device failover or partition migration. These operations reset
the device context to its initial state. If this occurs during a reset,
any initialization commands are likely to fail with an invalid state
error as backing device firmware requests reinitialization.

When this happens, make one more attempt by performing a hard reset,
which frees any resources currently allocated and performs device
initialization. If a transport event occurs during a device reset, a
flag is set which will trigger a new hard reset following the
completionof the current reset event.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:26 -04:00
Thomas Falcon
06e43d7f9f ibmvnic: Set resetting state at earliest possible point
Set device resetting state at the earliest possible point: as soon as a
reset is successfully scheduled. The reset state is toggled off when
all resets have been processed to completion.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:26 -04:00
Thomas Falcon
8a348450a0 ibmvnic: Create separate initialization routine for resets
Instead of having one initialization routine for all cases, create
a separate, simpler function for standard initialization, such as during
device probe. Use the original initialization function to handle
device reset scenarios. The goal of this patch is to avoid having
a single, cluttered init function to handle all possible
scenarios.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:26 -04:00
Thomas Falcon
ab5ec33b9a ibmvnic: Handle error case when setting link state
If setting the link state is not successful, print a warning
with the resulting return code and return it to be handled
by the caller.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:26 -04:00
Thomas Falcon
17c8705838 ibmvnic: Return error code if init interrupted by transport event
If device init is interrupted by a failover, set the init return
code so that it can be checked and handled appropriately by the
init routine.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:26 -04:00
Thomas Falcon
9c4eaabd1b ibmvnic: Check CRQ command return codes
Check whether CRQ command is successful before awaiting a response
from the management partition. If the command was not successful, the
driver may hang waiting for a response that will never come.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:26 -04:00
Thomas Falcon
5153698e55 ibmvnic: Introduce active CRQ state
Introduce an "active" state for a IBM vNIC Command-Response Queue. A CRQ
is considered active once it has initialized or linked with its partner by
sending an initialization request and getting a successful response back
from the management partition.  Until this has happened, do not allow CRQ
commands to be sent other than the initialization request.

This change will avoid a protocol error in case of a device transport
event occurring during a initialization. When the driver receives a
transport event notification indicating that the backing hardware
has changed and needs reinitialization, any further commands other
than the initialization handshake with the VIOS management partition
will result in an invalid state error. Instead of sending a command
that will be returned with an error, print a warning and return an
error that will be handled by the caller.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:25 -04:00
Thomas Falcon
c3f2241547 ibmvnic: Mark NAPI flag as disabled when released
Set adapter NAPI state as disabled if they are removed. This will allow
them to be enabled again if reallocated in case of a hard reset.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:19:25 -04:00
Willem de Bruijn
730c54d594 ipv4: remove warning in ip_recv_error
A precondition check in ip_recv_error triggered on an otherwise benign
race. Remove the warning.

The warning triggers when passing an ipv6 socket to this ipv4 error
handling function. RaceFuzzer was able to trigger it due to a race
in setsockopt IPV6_ADDRFORM.

  ---
  CPU0
    do_ipv6_setsockopt
      sk->sk_socket->ops = &inet_dgram_ops;

  ---
  CPU1
    sk->sk_prot->recvmsg
      udp_recvmsg
        ip_recv_error
          WARN_ON_ONCE(sk->sk_family == AF_INET6);

  ---
  CPU0
    do_ipv6_setsockopt
      sk->sk_family = PF_INET;

This socket option converts a v6 socket that is connected to a v4 peer
to an v4 socket. It updates the socket on the fly, changing fields in
sk as well as other structs. This is inherently non-atomic. It races
with the lockless udp_recvmsg path.

No other code makes an assumption that these fields are updated
atomically. It is benign here, too, as ip_recv_error cares only about
the protocol of the skbs enqueued on the error queue, for which
sk_family is not a precise predictor (thanks to another isue with
IPV6_ADDRFORM).

Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr
Fixes: 7ce875e5ec ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error")
Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:16:57 -04:00