Antoine Tenart says:
====================
net: mvpp2: small improvements
Those 3 patches are small improvements to the Marvell PPv2 driver. The
series does not conflict with the one sent about phylink and
1000/2500baseX support, so the two series can live in parallel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent flood of RX error prints during heavy traffic with weak signal
in link by checking net_ratelimit() before using netdev_err().
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: small rework, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove special stop/start handling from the set_mac_address callback.
All this special care is not needed, and can be removed. It also
simplifies the up/down status in the driver and helps avoiding possible
link status mismatch issues.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid repeating the check for free aggregated descriptors when it
already failed at the beginning of the function.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0a67487403 ("selftests/bpf: Only run tests if !bpf_disabled")
forgot to check return value of fopen.
This caused some confusion, when running test_verifier (from
tools/testing/selftests/bpf/) on an older kernel (< v4.4) as it will
simply seqfault.
This fix avoids the segfault and prints an error, but allow program to
continue. Given the sysctl was introduced in 1be7f75d16 ("bpf:
enable non-root eBPF programs"), we know that the running kernel
cannot support unpriv, thus continue with unpriv_disabled = true.
Fixes: 0a67487403 ("selftests/bpf: Only run tests if !bpf_disabled")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Antoine Tenart says:
====================
net: mvpp2: phylink conversion
This series convert the Marvell PPv2 driver to phylink (models the MAC
to PHY link).
One important point is the PPv2 driver supports two probe modes: device
tree and ACPI. This series only brings phylink support for the device
tree mode, as the ACPI one will need further work. Still, the driver
should be working as before when using ACPI. This split should be
temporary, and was discussed with Marcin (in Cc.) who added ACPI support
to the driver.
Also as the SFP cages on both DB boards can be considered as non-wired.
We thus chose not to describe those SFP cages and we use fixed-link.
The rest of the series uses phylink to add support for 1000BaseX and
2500BaseX modes in the PPv2 driver. To do this, two patches are needed
in the common PHY framework (patches 3 and 4). The last 4 patches modify
the device tree to use the new PPv2 functionalities.
The series has been tested for the device tree mode on the 7040-db,
8040-db and 8040-mcbin boards, to ensure all the interface where working
as expected.
@Dave: patches 7 to 10 should go through the mvebu tree (Gregory in
Cc.) to avoid any conflict with the other mvebu dt patches taken during
this cycle.
The series is based on today's net-next.
Since v2:
- Removed the SFP description from the DB boards, as their SFP cages
are wired properly. We now use fixed-link.
- Because of this rework, split the series in two, so that the SFP
part is reviewed separately.
- Small fixes in the phylink patch.
- Rebased on the latest net-next branch.
Since v1:
- Chose a different approach to the SFP changes, as the previous ones
weren't valid and reworked both BD boards device trees.
- Misc fixes.
- Added Kishon's acked-by on one patch.
- Rebaed on latest net-next branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the 2500Base-X PHY mode support in the Marvell PPv2
driver. 2500Base-X is quite close to 1000Base-X and SGMII modes and uses
nearly the same code path.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the 1000Base-X PHY mode support in the Marvell PPv2
driver. 1000Base-X is quite close the SGMII and uses nearly the same
code path.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allow the CP110 comphy to configure some lanes in the
2.5G SGMII mode. This mode is quite close to SGMII and uses nearly the
same code path.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds one more generic PHY mode to the phy_mode enum, to allow
configuring generic PHYs to the 2.5G SGMII mode by using the set_mode
callback.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the PPv2 driver to implement phylink helpers, and use phylink in
DT mode. The other mode supported is ACPI, which will need further work
in order to be entirely compatible with phylink.
The MAC and GoP configuration functions were completely moved to fit
into the phylink helpers. When a PHY is always present between the MAC
and the physical port, phylink only is used, but when this is not the
case (the MAC directly is connected to the physical port) the link IRQ
is used to detect changes in the link state and call phylink_mac_change.
The ACPI mode do not uses phylink as of now, and the changes shouldn't
impact its use.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic patch to align the ethtool functions to ops definitions. This
patch does not change in any way the driver's behaviour.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a pure refactoring of the function, preparing for some further
cleanups. The thing was pretty illegible, and the core functionality
still is, but now the core loop is a bit more isolated from the thing
that goes on around it.
This was "inspired" by the confluence of kworker workqueue name cleanups
by Tejun, currently scheduled for 4.18, and commit 7f7ccc2ccc ("proc:
do not access cmdline nor environ from file-backed areas").
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The first pull request for 4.18. As usual new features and bug fixes
but nothing really special.
I also merged wireless-drivers due to an iwlwifi patch dependency.
Major changes:
iwlwifi
* implement Traffic Condition Monitor and use it for scan, BT coex and
to detect when the AP doesn't support UAPSD properly
* some more work for the 22000 family of devices;
* introduce AMSDU rate control offload
qtnfmac
* DFS offload support
rsi
* roaming enhancements
* increase max supported aggregation subframes
* don't advertise 5 GHz support if the device doesn't support it
brcmfmac
* add support for BCM4366E chipset
* add support for bcm43364 wireless chipset
ath10k
* enable temperature reads for QCA6174 and QCA9377
* add firmware memory dump support for QCA9984
* continue adding WCN3990 support via SNOC bus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJa/TreAAoJEG4XJFUm622boV0IAI/tTu3obIdhdlnZJsjat/wH
tmQX2rZl0g7kbthVU+WqPA1KgvK/HEX1SUIP0leARl6FDqxrBzE1G4P1fOY3JIaZ
+T3UG9LgFM3hoXtJ1VRdvi8rTBVU67TTOrQCVD7AapGWfQwn6AXfy4ARUEqBjkrA
SxDemdAwIks3miMU3EnsRlzLaI56R7l1mk0Xr30tM5Coq721AcWE6FBz6lqmFnTC
3vdDzpMRIiTt5zLICJZYgAB3akiaJEqHnIAv+y0sbXG1gHDhKcfEH674SM6FCB2N
3TP7EpzzxH/FYB0i+zOFg6wnAqUngLLnwkG/ciniVi75feb+gbaKqWHT8FNfx04=
=rF/V
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.18
The first pull request for 4.18. As usual new features and bug fixes
but nothing really special.
I also merged wireless-drivers due to an iwlwifi patch dependency.
Major changes:
iwlwifi
* implement Traffic Condition Monitor and use it for scan, BT coex and
to detect when the AP doesn't support UAPSD properly
* some more work for the 22000 family of devices;
* introduce AMSDU rate control offload
qtnfmac
* DFS offload support
rsi
* roaming enhancements
* increase max supported aggregation subframes
* don't advertise 5 GHz support if the device doesn't support it
brcmfmac
* add support for BCM4366E chipset
* add support for bcm43364 wireless chipset
ath10k
* enable temperature reads for QCA6174 and QCA9377
* add firmware memory dump support for QCA9984
* continue adding WCN3990 support via SNOC bus
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On the rebase of the following commit on the new seccomp actions_logged
function, one audit_context access was missed.
commit cdfb6b341f
("audit: use inline function to get audit context")
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
As documented in Documentation/timers/timers-howto.txt,
replace msleep(1) with usleep_range().
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When perf data is recorded with the call-graph option enabled, the
callchain shown by perf script shows the binary offsets of the symbols
as the ip. This is incorrect for kernel symbols as the ip values are
always off by a fixed offset depending on the architecture. If the
offsets from the start of the symbols are printed, they are also
incorrect for both kernel and userspace symbols.
Without the call-graph option, the callchain shows the virtual addresses
of the symbols rather than their binary offsets. The offsets printed in
this case are also correct.
This fixes the inconsistency in perf script's output.
This can be verified on a powerpc64le system running Fedora 27 as
follows:
# cat /proc/kallsyms | grep sys_write
...
c0000000004025a0 T sys_write
c0000000004025a0 T __se_sys_write
...
# perf probe -a sys_write
Before applying this patch:
# perf record -e probe:sys_write -g ~/test
# perf script -F ip,sym,symoff
4125b0 sys_write+0x8000000000008010
1b9e0 system_call+0x8000000000008058
118234 __GI___libc_write+0xffff0000f52c0024
92c74 _IO_file_write@@GLIBC_2.17+0xffff0000f52c0044
5afbfd8a [unknown]
91a60 new_do_write+0xffff0000f52c0090
94638 _IO_do_write@@GLIBC_2.17+0xffff0000f52c0038
94bbc _IO_file_overflow@@GLIBC_2.17+0xffff0000f52c014c
95a24 __overflow+0xffff0000f52c0064
84548 _IO_puts+0xffff0000f52c0218
440 main+0xffffffffe0000020
236a0 generic_start_main.isra.0+0xffff0000f52c0140
23898 __libc_start_main+0xffff0000f52c00b8
0 [unknown]
...
# perf record -e probe:sys_write ~/test
# perf script -F ip,sym,symoff
c0000000004025b0 sys_write+0x10
...
After applying this patch:
# perf record -e probe:sys_write -g ~/test
# perf script -F ip,sym,symoff
c0000000004025b0 sys_write+0x10
c00000000000b9e0 system_call+0x58
7fffb70d8234 __GI___libc_write+0x24
7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44
5afc1818 [unknown]
7fffb7051a60 new_do_write+0x90
7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38
7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c
7fffb7055a24 __overflow+0x64
7fffb7044548 _IO_puts+0x218
10000440 main+0x20
7fffb6fe36a0 generic_start_main.isra.0+0x140
7fffb6fe3898 __libc_start_main+0xb8
0 [unknown]
...
# perf record -e probe:sys_write ~/test
# perf script -F ip,sym,symoff
c0000000004025b0 sys_write+0x10
...
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lkml.kernel.org/r/20180517063326.6319-1-sandipan@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce an new common helper to avoid redundancy.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ERSPAN only support version 1 and 2. When packets send to an
erspan device which does not have proper version number set,
drop the packet. In real case, we observe multicast packets
sent to the erspan pernet device, erspan0, which does not have
erspan version configured.
Reported-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng says:
====================
tcp: default RACK loss recovery
This patch set implements the features correspond to the
draft-ietf-tcpm-rack-03 version of the RACK draft.
https://datatracker.ietf.org/meeting/101/materials/slides-101-tcpm-update-on-tcp-rack-00
1. SACK: implement equivalent DUPACK threshold heuristic in RACK to
replace existing RFC6675 recovery (tcp_mark_head_lost).
2. Non-SACK: simplify RFC6582 NewReno implementation
3. RTO: apply RACK's time-based approach to avoid spuriouly
marking very recently sent packets lost.
4. with (1)(2)(3), make RACK the exclusive fast recovery mechanism to
mark losses based on time on S/ACK. Tail loss probe and F-RTO remain
enabled by default as complementary mechanisms to send probes in
CA_Open and CA_Loss states. The probes would solicit S/ACKs to trigger
RACK time-based loss detection.
All Google web and internal servers have been running RACK-only mode
(4) for a while now. a/b experiments indicate RACK/TLP on average
reduces recovery latency by 10% compared to RFC6675. RFC6675
is default-off now but can be enabled by disabling RACK (sysctl
net.ipv4.tcp_recovery=0) for unseen issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
An RTO event indicates the head has not been acked for a long time
after its last (re)transmission. But the other packets are not
necessarily lost if they have been only sent recently (for example
due to application limit). This patch would prohibit marking packets
sent within an RTT to be lost on RTO event, using similar logic in
TCP RACK detection.
Normally the head (SND.UNA) would be marked lost since RTO should
fire strictly after the head was sent. An exception is when the
most recent RACK RTT measurement is larger than the (previous)
RTO. To address this exception the head is always marked lost.
Congestion control interaction: since we may not mark every packet
lost, the congestion window may be more than 1 (inflight plus 1).
But only one packet will be retransmitted after RTO, since
tcp_retransmit_timer() calls tcp_retransmit_skb(...,segs=1). The
connection still performs slow start from one packet (with Cubic
congestion control).
This commit was tested in an A/B test with Google web servers,
and showed a reduction of 2% in (spurious) retransmits post
timeout (SlowStartRetrans), and correspondingly reduced DSACKs
(DSACKIgnoredOld) by 7%.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack
to prepare the final RTO change.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously when TCP times out, it first updates cwnd and ssthresh,
marks packets lost, and then updates congestion state again. This
was fine because everything not yet delivered is marked lost,
so the inflight is always 0 and cwnd can be safely set to 1 to
retransmit one packet on timeout.
But the inflight may not always be 0 on timeout if TCP changes to
mark packets lost based on packet sent time. Therefore we must
first mark the packet lost, then set the cwnd based on the
(updated) inflight.
This is not a pure refactor. Congestion control may potentially
break if it uses (not yet updated) inflight to compute ssthresh.
Fortunately all existing congestion control modules does not do that.
Also it changes the inflight when CA_LOSS_EVENT is called, and only
westwood processes such an event but does not use inflight.
This change has two other minor side benefits:
1) consistent with Fast Recovery s.t. the inflight is updated
first before tcp_enter_recovery flips state to CA_Recovery.
2) avoid intertwining loss marking with state update, making the
code more readable.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets
lost upon RTO.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous approach for the lost and retransmit bits was to
wipe the slate clean: zero all the lost and retransmit bits,
correspondingly zero the lost_out and retrans_out counters, and
then add back the lost bits (and correspondingly increment lost_out).
The new approach is to treat this very much like marking packets
lost in fast recovery. We don’t wipe the slate clean. We just say
that for all packets that were not yet marked sacked or lost, we now
mark them as lost in exactly the same way we do for fast recovery.
This fixes the lost retransmit accounting at RTO time and greatly
simplifies the RTO code by sharing much of the logic with Fast
Recovery.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a rewrite of NewReno loss recovery implementation that is
simpler and standalone for readability and better performance by
using less states.
Note that NewReno refers to RFC6582 as a modification to the fast
recovery algorithm. It is used only if the connection does not
support SACK in Linux. It should not to be confused with the Reno
(AIMD) congestion control.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch disables RFC6675 loss detection and make sysctl
net.ipv4.tcp_recovery = 1 controls a binary choice between RACK
(1) or RFC6675 (0).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the classic DUPACK threshold rule
(#DupThresh) in RACK.
When the number of packets SACKed is greater or equal to the
threshold, RACK sets the reordering window to zero which would
immediately mark all the unsacked packets below the highest SACKed
sequence lost. Since this approach is known to not work well with
reordering, RACK only uses it if no reordering has been observed.
The DUPACK threshold rule is a particularly useful extension to the
fast recoveries triggered by RACK reordering timer. For example
data-center transfers where the RTT is much smaller than a timer
tick, or high RTT path where the default RTT/4 may take too long.
Note that this patch differs slightly from RFC6675. RFC6675
considers a packet lost when at least #DupThresh higher-sequence
packets are SACKed.
With RACK, for connections that have seen reordering, RACK
continues to use a dynamically-adaptive time-based reordering
window to detect losses. But for connections on which we have not
yet seen reordering, this patch considers a packet lost when at
least one higher sequence packet is SACKed and the total number
of SACKed packets is at least DupThresh. For example, suppose a
connection has not seen reordering, and sends 10 packets, and
packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2
lost. RACK considers packets 1, 2, 4, 6 lost.
There is some small risk of spurious retransmits here due to
reordering. However, this is mostly limited to the first flight of
a connection on which the sender receives SACKs from reordering.
And RFC 6675 and FACK loss detection have a similar risk on the
first flight with reordering (it's just that the risk of spurious
retransmits from reordering was slightly narrower for those older
algorithms due to the margin of 3*MSS).
Also the minimum reordering window is reduced from 1 msec to 0
to recover quicker on short RTT transfers. Therefore RACK is more
aggressive in marking packets lost during recovery to reduce the
reordering window timeouts.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let tools that need to have those variables with the sysctl current
values use a function that will read them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9vowe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The early versions of am33xx devices, related to ES1.0 SoC revision
have errata limiting mq support. That's the same errata as
commit 7da1160002 ("drivers: net: cpsw: add am335x errata workarround for
interrutps")
AM33xx Errata [1] Advisory 1.0.9
http://www.ti.com/lit/er/sprz360f/sprz360f.pdf
After additional investigation were found that drivers w/a is
propagated on all AM33xx SoCs and on DM814x. But the errata exists
only for ES1.0 of AM33xx family, limiting mq support for revisions
after ES1.0. So, disable mq support only for related SoCs and use
separate polls for revisions allowing mq.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not read as commonly as 'page_size', so it makes sense to read it
lazily, caching its value when it is first read.
Less files open unconditionally at startup.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-35xhrq91u94uc1djtclek1ie@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rick bisected a regression on large systems which use the x2apic cluster
mode for interrupt delivery to the commit wich reworked the cluster
management.
The problem is caused by a missing initialization of the clusterid field
in the shared cluster data structures. So all structures end up with
cluster ID 0 which only allows sharing between all CPUs which belong to
cluster 0. All other CPUs with a cluster ID > 0 cannot share the data
structure because they cannot find existing data with their cluster
ID. This causes malfunction with IPIs because IPIs are sent to the wrong
cluster and the caller waits for ever that the target CPU handles the IPI.
Add the missing initialization when a upcoming CPU is the first in a
cluster so that the later booting CPUs can find the data and share it for
proper operation.
Fixes: 023a611748 ("x86/apic/x2apic: Simplify cluster management")
Reported-by: Rick Warner <rick@microway.com>
Bisected-by: Rick Warner <rick@microway.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rick Warner <rick@microway.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
Thomas Falcon says:
====================
ibmvnic: Fix bugs and memory leaks
This is a small patch series fixing up some bugs and memory leaks
in the ibmvnic driver. The first fix frees up previously allocated
memory that should be freed in case of an error. The second fixes
a reset case that was failing due to TX/RX queue IRQ's being
erroneously disabled without being enabled again. The final patch
fixes incorrect reallocated of statistics buffers during a device
reset, resulting in loss of statistics information and a memory leak.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move initialization of statistics buffers from ibmvnic_init function
into ibmvnic_probe. In the current state, ibmvnic_init will be called
again during a device reset, resulting in the allocation of new
buffers without freeing the old ones.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not necessary to disable interrupt lines here during a reset
to handle a non-fatal firmware error. Move that call within the code
block that handles the other cases that do require interrupts to be
disabled and re-enabled.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the firmware map fails for whatever reason, remember to free
up the memory after.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updating the FIB tracepoint for the recent change to allow rules using
the protocol and ports exposed a few places where the entries in the flow
struct are not initialized.
For __fib_validate_source add the call to fib4_rules_early_flow_dissect
since it is invoked for the input path. For netfilter, add the memset on
the flow struct to avoid future problems like this. In ip_route_input_slow
need to set the fields if the skb dissection does not happen.
Fixes: bfff486265 ("net: fib_rules: support for match on ip_proto, sport and dport")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
scatterlist code expects virt_to_page() to work, which fails with
CONFIG_VMAP_STACK=y.
Fixes: c46234ebb4 ("tls: RX path for ktls")
Signed-off-by: Matt Mullins <mmullins@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adopt it from the kernel sources, will be used soon.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-oubheiqj8edo5rzewt11cbn0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The error messages at sanity checks of memory pages tend to repeat too
many times once when it hits, and without the rate limit, it may flood
and become unreadable. Replace such messages with the *_ratelimited()
variant.
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1093027
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Not anymore accessed outside this library, keep it private.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-wg1m07flfrg1rm06jjzie8si@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That takes care of using the right call to get the tracing_path
directory, the one that will end up calling tracing_path_set() to figure
out where tracefs is mounted.
One more step in doing just lazy reading of system structures to reduce
the number of operations done unconditionaly at 'perf' start.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-42zzi0f274909bg9mxzl81bu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of accessing the trace_events_path variable directly, that may
not have been properly initialized wrt detecting where tracefs is
mounted.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-id7hzn1ydgkxbumeve5wapqz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When using for_each_event() we needlessly rebuild the whole path to
the tracepoint directory, reuse the dir_path instead, saving some cycles
and reducing the size of the next patch.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-54bcs15n0cp6gwcgpc4hptyc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* x86 fixes: PCID, UMIP, locking
* Improved support for recent Windows version that have a 2048 Hz
APIC timer.
* Rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME
* Better behaved selftests.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJa/bkTAAoJEL/70l94x66Dzf8IAJ1GqtXi0CNbq8MvU4QIqw0L
HLIRoe/QgkTeTUa2fwirEuu5I+/wUyPvy5sAIsn/F5eiZM7nciLm+fYzw6F2uPIm
lSCqKpVwmh8dPl1SBaqPnTcB1HPVwcCgc2SF9Ph7yZCUwFUtoeUuPj8v6Qy6y21g
jfobHFZa3MrFgi7kPxOXSrC1qxuNJL9yLB5mwCvCK/K7jj2nrGJkLLDuzgReCqvz
isOdpof3hz8whXDQG5cTtybBgE9veym4YqJY8R5ANXBKqbFlhaNF1T3xXrdPMISZ
7bsGgkhYEOqeQsPrFwzAIiFxe2DogFwkn1BcvJ1B+duXrayt5CBnDPRB6Yxg00M=
=H0d0
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- ARM/ARM64 locking fixes
- x86 fixes: PCID, UMIP, locking
- improved support for recent Windows version that have a 2048 Hz APIC
timer
- rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME
- better behaved selftests
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
KVM: arm/arm64: Properly protect VGIC locks from IRQs
KVM: X86: Lower the default timer frequency limit to 200us
KVM: vmx: update sec exec controls for UMIP iff emulating UMIP
kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled
KVM: selftests: exit with 0 status code when tests cannot be run
KVM: hyperv: idr_find needs RCU protection
x86: Delay skip of emulated hypercall instruction
KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
Writes to ZCR_EL1 are self-synchronising, and so may be expensive
in typical implementations.
This patch adopts the approach used for costly system register
writes elsewhere in the kernel: the system register write is
suppressed if it would not change the stored value.
Since the common case will be that of switching between tasks that
use the same vector length as one another, prediction hit rates on
the conditional branch should be reasonably good, with lower
expected amortised cost than the unconditional execution of a
heavyweight self-synchronising instruction.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We have a core fix in the compat code for covering a potential race
(double references), but it's a very minor change.
The rest are all small device-specific quirks, as well as a correction
of the new UAC3 support code.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlr8J4oOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/u9BAAonj61ZwTiKYQS6Zgv/yXnhGeaqMgnu1AG/pf
c3MI9mjR1E+WZy8CehCgNuvd9b6rc5PwrNgmTP58nu/DMZB1DeQkWJgv2fNm3y1c
byuBHG+xH2AdH+mpjIWcMU857T75oaDaj3Gu36ORacCDGOHsdL0OyynT0y/C0LUd
SwEegucAFc9Ft2vb4WfRprm9RiohT7WEyU/G+nACderaIDE12B4/CtC3l64QPWxN
uJydQ4io92qkCMOCXBupGmwUvCCkwB+acTSLRUgKd/IEbp8cTrnOgNpgJmB6TLXY
fj1UO6pi+fp9yXdyWwrDCqsvlrXbmDu25Sqy1CVEA/iApC5mFwFaEvLl5eEhldSV
+o2r6O7N3IOGsMjlAov7lp1wqUgqSaeOWRjFAeNxs5lc+G4Cts27x9XMvpKPNUON
pCAs9C+hdpcIS/ZAdpdO0JVashK6rVIP0oaUBRkTT4kKz5E6TUXbhJ1tUPpzJT0j
98jYQOJmBhQfAfTuN54rpciv5NUA1b/KpV17BothpL6Npe0WYjS037fcwfj8u1DH
T+2NjqZLYUkhzwzU3sDokRJcjCm/Wq2qv2aON/6CQR9LCdIJFKoWc5i7a5/v3Rm5
xXUQgCEPzJ9kzqQguZjn/fQnnMxgK++sYiJP+TKPNxyYn4LlJ+UQBk0/dazLKQtd
dEN+zzo=
=lBzh
-----END PGP SIGNATURE-----
Merge tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We have a core fix in the compat code for covering a potential race
(double references), but it's a very minor change.
The rest are all small device-specific quirks, as well as a correction
of the new UAC3 support code"
* tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Use Class Specific EP for UAC3 devices.
ALSA: hda/realtek - Clevo P950ER ALC1220 Fixup
ALSA: usb: mixer: volume quirk for CM102-A+/102S+
ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
ALSA: control: fix a redundant-copy issue
KVM_HINTS_DEDICATED seems to be somewhat confusing:
Guest doesn't really care whether it's the only task running on a host
CPU as long as it's not preempted.
And there are more reasons for Guest to be preempted than host CPU
sharing, for example, with memory overcommit it can get preempted on a
memory access, post copy migration can cause preemption, etc.
Let's call it KVM_HINTS_REALTIME which seems to better
match what guests expect.
Also, the flag most be set on all vCPUs - current guests assume this.
Note so in the documentation.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If is_enabled() is not defined, regulator core will assume
this regulator is already enabled, then it can NOT be really
enabled after disabled.
Based on Li Jun's patch from the NXP kernel tree.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>