Commit graph

900375 commits

Author SHA1 Message Date
Douglas Anderson
c5c689d322 dt-bindings: timer: Use non-empty ranges in example
On many arm64 qcom device trees, running `make dtbs_check` yells:

  timer@17c20000: #size-cells:0:0: 1 was expected

It appears that someone was trying to assert the fact that sub-nodes
describing frames would never have a size that's more than 32-bits
big.  That does indeed appear to be true for all cases I could find.

Currently many arm64 qcom device tree files have a #address-cells and
about in commit bede7d2dc8 ("arm64: dts: qcom: sdm845: Increase
address and size cells for soc").  That means the only way we can
shrink them down is to use a non-empty ranges.

Since forever it has said in "writing-bindings.txt" to "DO use
non-empty 'ranges' to limit the size of child buses/devices".  I guess
we should start listening to it.

I believe (but am not certain) that this also means that we should use
"ranges" to simplify the "reg" of our sub devices by specifying an
offset.  Let's update the example in the bindings to make this
obvious.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Rob Herring <robh@kernel.org>
2020-01-23 14:34:15 -06:00
David S. Miller
9bbc8be29d mlx5-updates-2020-01-22
This series provides updates to mlx5 driver.
 1) Misc small cleanups
 2) Some SW steering updates including header copy support
 3) Full ethtool statistics support for E-Switch uplink representor
 Some refactoring was required to share the bare-metal NIC ethtool
 stats with the Uplink representor. On Top of this Vlad converts the
 ethtool stats support in E-Swtich vports representors to use the mlx5e
 "stats groups" infrastructure and then applied all applicable stats
 to the uplink representor netdev.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl4pPoQACgkQSD+KveBX
 +j43XAf/fFagUye8gFCFI2Yg0tbaDXKKvp+Z6zfA8K8URVXSksSE6dr9VraWVO8C
 dtYvyv+u9IKerdi0j/uuHOHvuYAr/MwY9AV/uxltU3bxKfIbQp6puRoSU5CPVKZN
 20dqN/zpbm0Tzk/k8WSpeZaPg+im5sFn7HUIoSL14AJW6KrjU6jP/pBjQMJVIcMs
 Yq+QBsC9PWSfH3O4MX5qYg8IRbtXWIQplVeSU9A5Rj9+ParbMmyKOhRwq7ZcNDMM
 UMA/y2z7FZ0ENQ1dajs4Yu/MiAX50sfU9M0TXwbCxim5437l26lQw3sum62Zr/Oz
 rmyyVhNAX7Vm+SkHKaX85p2pOcejkQ==
 =GB6e
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-01-22

This series provides updates to mlx5 driver.
1) Misc small cleanups
2) Some SW steering updates including header copy support
3) Full ethtool statistics support for E-Switch uplink representor
Some refactoring was required to share the bare-metal NIC ethtool
stats with the Uplink representor. On Top of this Vlad converts the
ethtool stats support in E-Swtich vports representors to use the mlx5e
"stats groups" infrastructure and then applied all applicable stats
to the uplink representor netdev.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:27:23 +01:00
David S. Miller
42c9bdae23 Merge branch 'Fixes-for-SONIC-ethernet-driver'
Finn Thain says:

====================
Fixes for SONIC ethernet driver

Various SONIC driver problems have become apparent over the years,
including tx watchdog timeouts, lost packets and duplicated packets.

The problems are mostly caused by bugs in buffer handling, locking and
(re-)initialization code.

This patch series resolves these problems.

This series has been tested on National Semiconductor hardware (macsonic),
qemu-system-m68k (macsonic) and qemu-system-mips64el (jazzsonic).

The emulated dp8393x device used in QEMU also has bugs.
I have fixed the bugs that I know of in a series of patches at,
https://github.com/fthain/qemu/commits/sonic

Changed since v1:
 - Minor revisions as described in commit logs.
 - Deferred net-next patches.
Changed since v2:
 - Minor revisions as described in commit logs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
686f85d71d net/sonic: Prevent tx watchdog timeout
Section 5.5.3.2 of the datasheet says,

    If FIFO Underrun, Byte Count Mismatch, Excessive Collision, or
    Excessive Deferral (if enabled) errors occur, transmission ceases.

In this situation, the chip asserts a TXER interrupt rather than TXDN.
But the handler for the TXDN is the only way that the transmit queue
gets restarted. Hence, an aborted transmission can result in a watchdog
timeout.

This problem can be reproduced on congested link, as that can result in
excessive transmitter collisions. Another way to reproduce this is with
a FIFO Underrun, which may be caused by DMA latency.

In event of a TXER interrupt, prevent a watchdog timeout by restarting
transmission.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
772f66421d net/sonic: Fix CAM initialization
Section 4.3.1 of the datasheet says,

    This bit [TXP] must not be set if a Load CAM operation is in
    progress (LCAM is set). The SONIC will lock up if both bits are
    set simultaneously.

Testing has shown that the driver sometimes attempts to set LCAM
while TXP is set. Avoid this by waiting for command completion
before and after giving the LCAM command.

After issuing the Load CAM command, poll for !SONIC_CR_LCAM rather than
SONIC_INT_LCD, because the SONIC_CR_TXP bit can't be used until
!SONIC_CR_LCAM.

When in reset mode, take the opportunity to reset the CAM Enable
register.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
27e0c31c5f net/sonic: Fix command register usage
There are several issues relating to command register usage during
chip initialization.

Firstly, the SONIC sometimes comes out of software reset with the
Start Timer bit set. This gets logged as,

    macsonic macsonic eth0: sonic_init: status=24, i=101

Avoid this by giving the Stop Timer command earlier than later.

Secondly, the loop that waits for the Read RRA command to complete has
the break condition inverted. That's why the for loop iterates until
its termination condition. Call the helper for this instead.

Finally, give the Receiver Enable command after clearing interrupts,
not before, to avoid the possibility of losing an interrupt.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
3f4b7e6a2b net/sonic: Quiesce SONIC before re-initializing descriptor memory
Make sure the SONIC's DMA engine is idle before altering the transmit
and receive descriptors. Add a helper for this as it will be needed
again.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
89ba879e95 net/sonic: Fix receive buffer replenishment
As soon as the driver is finished with a receive buffer it allocs a new
one and overwrites the corresponding RRA entry with a new buffer pointer.

Problem is, the buffer pointer is split across two word-sized registers.
It can't be updated in one atomic store. So this operation races with the
chip while it stores received packets and advances its RRP register.
This could result in memory corruption by a DMA write.

Avoid this problem by adding buffers only at the location given by the
RWP register, in accordance with the National Semiconductor datasheet.

Re-factor this code into separate functions to calculate a RRA pointer
and to update the RWP.

Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
94b1663495 net/sonic: Improve receive descriptor status flag check
After sonic_tx_timeout() calls sonic_init(), it can happen that
sonic_rx() will subsequently encounter a receive descriptor with no
flags set. Remove the comment that says that this can't happen.

When giving a receive descriptor to the SONIC, clear the descriptor
status field. That way, any rx descriptor with flags set can only be
a newly received packet.

Don't process a descriptor without the LPKT bit set. The buffer is
still in use by the SONIC.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
eaabfd19b2 net/sonic: Avoid needless receive descriptor EOL flag updates
The while loop in sonic_rx() traverses the rx descriptor ring. It stops
when it reaches a descriptor that the SONIC has not used. Each iteration
advances the EOL flag so the SONIC can keep using more descriptors.
Therefore, the while loop has no definite termination condition.

The algorithm described in the National Semiconductor literature is quite
different. It consumes descriptors up to the one with its EOL flag set
(which will also have its "in use" flag set). All freed descriptors are
then returned to the ring at once, by adjusting the EOL flags (and link
pointers).

Adopt the algorithm from datasheet as it's simpler, terminates quickly
and avoids a lot of pointless descriptor EOL flag changes.

Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
9e311820f6 net/sonic: Fix receive buffer handling
The SONIC can sometimes advance its rx buffer pointer (RRP register)
without advancing its rx descriptor pointer (CRDA register). As a result
the index of the current rx descriptor may not equal that of the current
rx buffer. The driver mistakenly assumes that they are always equal.
This assumption leads to incorrect packet lengths and possible packet
duplication. Avoid this by calling a new function to locate the buffer
corresponding to a given descriptor.

Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
427db97df1 net/sonic: Fix interface error stats collection
The tx_aborted_errors statistic should count packets flagged with EXD,
EXC, FU, or BCM bits because those bits denote an aborted transmission.
That corresponds to the bitmask 0x0446, not 0x0642. Use macros for these
constants to avoid mistakes. Better to leave out FIFO Underruns (FU) as
there's a separate counter for that purpose.

Don't lump all these errors in with the general tx_errors counter as
that's used for tx timeout events.

On the rx side, don't count RDE and RBAE interrupts as dropped packets.
These interrupts don't indicate a lost packet, just a lack of resources.
When a lack of resources results in a lost packet, this gets reported
in the rx_missed_errors counter (along with RFO events).

Don't double-count rx_frame_errors and rx_crc_errors.

Don't use the general rx_errors counter for events that already have
special counters.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
e3885f5761 net/sonic: Use MMIO accessors
The driver accesses descriptor memory which is simultaneously accessed by
the chip, so the compiler must not be allowed to re-order CPU accesses.
sonic_buf_get() used 'volatile' to prevent that. sonic_buf_put() should
have done so too but was overlooked.

Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:37 +01:00
Finn Thain
5fedabf5a7 net/sonic: Clear interrupt flags immediately
The chip can change a packet's descriptor status flags at any time.
However, an active interrupt flag gets cleared rather late. This
allows a race condition that could theoretically lose an interrupt.
Fix this by clearing asserted interrupt flags immediately.

Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:36 +01:00
Finn Thain
865ad2f220 net/sonic: Add mutual exclusion for accessing shared state
The netif_stop_queue() call in sonic_send_packet() races with the
netif_wake_queue() call in sonic_interrupt(). This causes issues
like "NETDEV WATCHDOG: eth0 (macsonic): transmit queue 0 timed out".
Fix this by disabling interrupts when accessing tx_skb[] and next_tx.
Update a comment to clarify the synchronization properties.

Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:24:36 +01:00
David S. Miller
790e01149a Merge branch 'Add-PHY-IDs-for-DP83825-6'
Dan Murphy says:

====================
Add PHY IDs for DP83825/6

Adding new PHY IDs for the DP83825 and DP83826 TI Ethernet PHYs to the DP83822
PHY driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:21:12 +01:00
Dan Murphy
2ace13e10d net: phy: DP83822: Add support for additional DP83825 devices
Add PHY IDs for the DP83825CS, DP83825CM and the DP83825S devices to the
DP83822 driver.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:21:12 +01:00
Dan Murphy
783da36bb9 phy: dp83826: Add phy IDs for DP83826N and 826NC
Add phy IDs to the DP83822 phy driver for the DP83826N
and the DP83826NC devices.  The register map and features
are the same for basic enablement.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:21:12 +01:00
Madalin Bucur
457bfc0a4b net: fsl/fman: rename IF_MODE_XGMII to IF_MODE_10G
As the only 10G PHY interface type defined at the moment the code
was developed was XGMII, although the PHY interface mode used was
not XGMII, XGMII was used in the code to denote 10G. This patch
renames the 10G interface mode to remove the ambiguity.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:18:57 +01:00
David S. Miller
72b5917413 Merge branch 'net-fsl-fman-address-erratum-A011043'
Madalin Bucur says:

====================
net: fsl/fman: address erratum A011043

This addresses a HW erratum on some QorIQ DPAA devices.

MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.
When the issue was present, one could see such errors
during boot:

  mdio_bus ffe4e5000: Error while reading PHY0 reg at 3.3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:17:13 +01:00
Madalin Bucur
1d3ca681b9 net/fsl: treat fsl,erratum-a011043
When fsl,erratum-a011043 is set, adjust for erratum A011043:
MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:17:13 +01:00
Madalin Bucur
73d527aef6 powerpc/fsl/dts: add fsl,erratum-a011043
Add fsl,erratum-a011043 to internal MDIO buses.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:17:13 +01:00
Madalin Bucur
2934d2c678 dt-bindings: net: add fsl,erratum-a011043
Add an entry for erratum A011043: the MDIO_CFG[MDIO_RD_ER]
bit may be falsely set when reading internal PCS registers.
MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:17:13 +01:00
Manish Chopra
22e984493a qlcnic: Fix CPU soft lockup while collecting firmware dump
Driver while collecting firmware dump takes longer time to
collect/process some of the firmware dump entries/memories.
Bigger capture masks makes it worse as it results in larger
amount of data being collected and results in CPU soft lockup.
Place cond_resched() in some of the driver flows that are
expectedly time consuming to relinquish the CPU to avoid CPU
soft lockup panic.

Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Tested-by: Yonggen Xu <Yonggen.Xu@dell.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23 21:13:33 +01:00
Linus Torvalds
4703d91199 XArray updates for 5.5
Primarily bugfixes, mostly around handling index wrap-around correctly.
 A couple of doc fixes and adding missing APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAl4pJv4ACgkQDpNsjXcp
 gj5OXgf/Q52Am0tLS6iC7XtcojjidnSFzpAqn9OGo7zuTsP+QURsiZAXJpjvUF8y
 kGIcD9R/tobEADkyDnj8D64yO+CqT1PlkaYvO5R+nYx/WTtmvr3/O/7Y4/ZTAwcf
 tTvj3yXlKbsA+9YJa34t43Ul4EDCOrQTz+oIZUQsiaRNzIdmCCguLg5gOkIhoXt6
 wJwp2Fc3YN6McBBpZH2z0GRwdJWPcixtKH0RNhOyGUiWBhkWYFXNvNEdxKJmTXdK
 qqkoW9So8IkyyoXdW8ZpQqCCk9tPRv9+7adDFVwl/T8q/RBrHNtcyQqJL3EsScsi
 9IqCrKldzRenk0v6lG+mgJEuTvYs8A==
 =7cpi
 -----END PGP SIGNATURE-----

Merge tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax

Pull XArray fixes from Matthew Wilcox:
 "Primarily bugfixes, mostly around handling index wrap-around
  correctly.

  A couple of doc fixes and adding missing APIs.

  I had an oops live on stage at linux.conf.au this year, and it turned
  out to be a bug in xas_find() which I can't prove isn't triggerable in
  the current codebase. Then in looking for the bug, I spotted two more
  bugs.

  The bots have had a few days to chew on this with no problems
  reported, and it passes the test-suite (which now has more tests to
  make sure these problems don't come back)"

* tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax:
  XArray: Add xa_for_each_range
  XArray: Fix xas_find returning too many entries
  XArray: Fix xa_find_after with multi-index entries
  XArray: Fix infinite loop with entry at ULONG_MAX
  XArray: Add wrappers for nested spinlocks
  XArray: Improve documentation of search marks
  XArray: Fix xas_pause at ULONG_MAX
2020-01-23 11:37:19 -08:00
Linus Torvalds
34597c85be Various tracing fixes:
- Fix a function comparison warning for a xen trace event macro
  - Fix a double perf_event linking to a trace_uprobe_filter for multiple events
  - Fix suspicious RCU warnings in trace event code for using
     list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.
  - Fix a bug in the histogram code when using the same variable
  - Fix a NULL pointer dereference when tracefs lockdown enabled and calling
     trace_set_default_clock()
 
 This v2 version contains:
 
  - A fix to a bug found with the double perf_event linking patch
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXinakBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhNZAQCi86p9eW3f3w7hM2hZcirC+mQKVZgp
 2rO4zIAK5V6G7gEAh6I7VZa50a6AE647ZjryE7ufTRUhmSFMWoG0kcJ7OAk=
 =/J9n
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - Fix a function comparison warning for a xen trace event macro

   - Fix a double perf_event linking to a trace_uprobe_filter for
     multiple events

   - Fix suspicious RCU warnings in trace event code for using
     list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.

   - Fix a bug in the histogram code when using the same variable

   - Fix a NULL pointer dereference when tracefs lockdown enabled and
     calling trace_set_default_clock()

   - A fix to a bug found with the double perf_event linking patch"

* tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobe: Fix to make trace_uprobe_filter alignment safe
  tracing: Do not set trace clock if tracefs lockdown is in effect
  tracing: Fix histogram code when expression has same var as value
  tracing: trigger: Replace unneeded RCU-list traversals
  tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
  tracing: xen: Ordered comparison of function pointers
2020-01-23 11:23:37 -08:00
Linus Torvalds
fa0a4e3b54 A fix for a potential use-after-free from Jeff, marked for stable.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl4p1+MTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi4YtCACPHyE8aoDTHZF8UZ9bHKNFVt4C1bRx
 ihFB6/PzmIfFw4Cbf+yTW85q3zqJ/6eJIOZF4dlwoFWK+osSk8sYRaOvlEovysbR
 sYiAbcOxePj9tSPdrWLYB/5ELtwMTloxBo7mPiJYt127UntWlPGfiz4sdHJBt1zI
 IBPOIeACJKGe0+Wtj0mGsXk+WhEB3nFk2DINnLuFc4tG6yXkFNq5/fnXrgVTlUTF
 4EwDQgHBUIqKDJarSyIBzud6VVshS7VaMAu8h9kwPScN4sG1y4ucgFzXIc4JfqRN
 TnEV48hdRQMVuQtsvuzAMPQvsjMlIXUSTGZzs4XPbEBjgAP8+MP+PJvL
 =XVg1
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a potential use-after-free from Jeff, marked for stable"

* tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client:
  ceph: hold extra reference to r_parent over life of request
2020-01-23 11:21:35 -08:00
Linus Torvalds
3a83c8c81c Power management fix for 5.5-rc8
Prevent the kernel from crashing during resume from hibernation
 if free pages contain leftover data from the restore kernel and
 init_on_free is set (Alexander Potapenko).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl4psjQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxpOcP/1UTGUr+VdGfBBjG5WlgCY0Jrd50Y78b
 RoVNDR/NvSVSuIs44AgrnfyQiz2Y8jG6qY1iSAbIXmFl37/4+kGhkNmd6pV/xFUc
 TdZZotFwFlRQjmeQxxH0kNXuAY6nJ2RwELWrjXqM8PuNjNKIEpfS+0fSaWexHqIm
 MDArxcDHkvZU5SnnRQM+LkT/EmbEheB7tgm7vGGqMLsSKc0gUsBmVCURe/lLAH5o
 EUKX4FI2jCy+LlmSdZ3EDjf1cstm3YXLiegTLSq1Jh3mFHXkFTwJMmidiz21qXJh
 Hc4r3iG0NZ37J8HXpwuq++KlhvNbhHJz+ZgC1IYls16RNYh5mUzxtMHWdSyyqlrW
 +z8gBVUyeJUYos5Kjb/NKSt43gnz7Uhy0UVbQXD66hgajXe71CQZQq/D5CMeTdJL
 jWNaeGYnhskz3IW2vnrs9Ucf6RHHWezXk51kVsyJXadiLhTdOv7DKahDKVwC/Hvf
 kyN1W0F5PZpF50yYmnhJgqDfxkGBNKpwXxTAGk6X0WQFaWeh/2FkX045UdJzBbHu
 fa1taTM/5RfPlbWq0wLPeHHSP4M2I0ndeWXZk88vUwwMfm9Wo+FNBhgs/EZfeuKD
 16sVMsX0r7R1bG+hEj+mvNeLWqfgz7MpGludCkV1dHIDkn2esxx6JqaWxR/UnLHb
 D3fZM/cKGdpY
 =kDFG
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Prevent the kernel from crashing during resume from hibernation if
  free pages contain leftover data from the restore kernel and
  init_on_free is set (Alexander Potapenko)"

* tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: fix crashes with init_on_free=1
2020-01-23 11:10:21 -08:00
Linus Torvalds
a572582b1a pci-v5.5-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl4prP8UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwQuRAAu5diJEEOerqKmnJavh+rk9eXnnO3
 39vsfkQpAyafSINHMaWOIHWyW9V7tEq41tEQF6TXwamlG+6gNBTv+GOPQ2MOJHT7
 cDpXCcnanbkwy1FuXMVjvzbDD0zfbi5Xsl7TGLc5NJbVpRjlXAN+sLzJlcs0Rzvb
 98Zm2Vtq4x4CTJiA+vF/em9VJ2shRZhQKpxMo7d3sR8ueoLnus36Ymdt3rf+HF9R
 ih8LdfYRkUqnd04IcX/z0N702fnZds7udbXN5/b46BA+Jt8KF2u0Md+sXbVyiBMn
 LWtbV3l3Rnqn0kwmYdw30MxJbRFyd19HO0ACPPtcalhLuJlMuEWFOxW/7VgJQ8wQ
 aEE/+BWxujiyEcPYSEVN6Jb5I95H7CgkXeWSqwVgvr3skw7XpCn7TRt5Ouo+yQus
 kUsX0+m/R1bO3KTktHRh7IsxVxwYqz11esD1rXuuMUsMa5WRde7dfsrN43x+pheI
 8IJf8hAInkH4w2xn+81qtrgdFy9tKvTBxO0hyuE34DzNvhUDpCALAKxcv83QHM9O
 YsNblrYSMnNSppOMCik6FVh0yy4HAHRC1iGNZXa0EqHiClRM/BQ4YtF/Wf8TczSy
 9aYz0m+nW68W/+E8di+2UvxX5ax/EPjd4i7auq7bSAzdaLJYGOxXukPK1YcaHHGC
 CDe5KelXO9hDGr4=
 =aIDk
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Mark ATS as broken on AMD Navi14 GPU rev 0xc5 (Alex Deucher)"

* tag 'pci-v5.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken
2020-01-23 11:08:15 -08:00
Gustavo A. R. Silva
987f028b86 char: hpet: Use flexible-array member
Old code in the kernel uses 1-byte and 0-byte arrays to indicate the
presence of a "variable length array":

struct something {
    int length;
    u8 data[1];
};

struct something *instance;

instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL);
instance->length = size;
memcpy(instance->data, source, size);

There is also 0-byte arrays. Both cases pose confusion for things like
sizeof(), CONFIG_FORTIFY_SOURCE, etc.[1] Instead, the preferred mechanism
to declare variable-length types such as the one above is a flexible array
member[2] which need to be the last member of a structure and empty-sized:

struct something {
        int stuff;
        u8 data[];
};

Also, by making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
unadvertenly introduced[3] to the codebase from now on.

[1] https://github.com/KSPP/linux/issues/21
[2] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Link: https://lore.kernel.org/r/20200120235326.GA29231@embeddedor.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 19:54:26 +01:00
Colin Ian King
5336da37a5 partitions/ldm: fix spelling mistake "to" -> "too"
There is a spelling mistake in a ldm_error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:41:45 -07:00
Dave Hansen
45fc24e89b x86/mpx: remove MPX from arch/x86
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

This removes all the remaining (dead at this point) MPX handling
code remaining in the tree.  The only remaining code is the XSAVE
support for MPX state which is currently needd for KVM to handle
VMs which might use MPX.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:20 -08:00
Dave Hansen
42222eae17 mm: remove arch_bprm_mm_init() hook
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

arch_bprm_mm_init() is used at execve() time.  The only non-stub
implementation is on x86 for MPX.  Remove the hook entirely from
all architectures and generic code.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:16 -08:00
Dave Hansen
aa9ccb7b47 x86/mpx: remove bounds exception code
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

Remove the other user-visible ABI: signal handling.  This code
should basically have been inactive after the prctl()s were
removed, but there may be some small ABI remnants from this code.
Remove it.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:15 -08:00
Dave Hansen
4ba68d0005 x86/mpx: remove build infrastructure
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

Remove the Kconfig option and the Makefile line.  This makes
arch/x86/mm/mpx.c and anything under an #ifdef for
X86_INTEL_MPX dead code.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:14 -08:00
Dave Hansen
3a1255396b x86/alternatives: add missing insn.h include
From: Dave Hansen <dave.hansen@linux.intel.com>

While testing my MPX removal series, Borislav noted compilation
failure with an allnoconfig build.

Turned out to be a missing include of insn.h in alternative.c.
With MPX, it got it implicitly from:

	asm/mmu_context.h -> asm/mpx.h -> asm/insn.h

Fixes: c3d6324f84 ("x86/alternatives: Teach text_poke_bp() to emulate instructions")
Reported-by: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:13 -08:00
Zong Li
6435f773d8
riscv: mm: add support for CONFIG_DEBUG_VIRTUAL
This patch implements CONFIG_DEBUG_VIRTUAL to do additional checks on
virt_to_phys and __pa_symbol calls. virt_to_phys used for linear mapping
check, and __pa_symbol used for kernel symbol check. In current RISC-V,
kernel image maps to linear mapping area. If CONFIG_DEBUG_VIRTUAL is
disable, these two functions calculate the offset on the address feded
directly without any checks.

The result of test_debug_virtual as follows:

[    0.358456] ------------[ cut here ]------------
[    0.358738] virt_to_phys used for non-linear address: (____ptrval____) (0xffffffd000000000)
[    0.359174] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x3c/0x50
[    0.359409] Modules linked in:
[    0.359630] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc3-00002-g5133c5c0ca13 #57
[    0.359861] epc: ffffffe000253d1a ra : ffffffe000253d1a sp : ffffffe03aa87da0
[    0.360019]  gp : ffffffe000ae03b0 tp : ffffffe03aa88000 t0 : ffffffe000af2660
[    0.360175]  t1 : 0000000000000064 t2 : 00000000000000b7 s0 : ffffffe03aa87dc0
[    0.360330]  s1 : ffffffd000000000 a0 : 000000000000004f a1 : 0000000000000000
[    0.360492]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffe000a84358
[    0.360672]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[    0.360876]  s2 : ffffffe000ae0600 s3 : ffffffe00000fc7c s4 : ffffffe0000224b0
[    0.361067]  s5 : ffffffe000030890 s6 : ffffffe000022470 s7 : 0000000000000008
[    0.361267]  s8 : ffffffe0000002c4 s9 : ffffffe000ae0640 s10: ffffffe000ae0630
[    0.361453]  s11: 0000000000000000 t3 : 0000000000000000 t4 : 000000000001e6d0
[    0.361636]  t5 : ffffffe000ae0a18 t6 : ffffffe000aee54e
[    0.361806] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003
[    0.362056] ---[ end trace aec0bf78d4978122 ]---
[    0.362404] PA: 0xfffffff080200000 for VA: 0xffffffd000000000
[    0.362607] PA: 0x00000000baddd2d0 for VA: 0xffffffe03abdd2d0

Signed-off-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Tested-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-01-23 10:40:06 -08:00
Coly Li
e3de04469a bcache: reap from tail of c->btree_cache in bch_mca_scan()
When shrink btree node cache from c->btree_cache in bch_mca_scan(),
no matter the selected node is reaped or not, it will be rotated from
the head to the tail of c->btree_cache list. But in bcache journal
code, when flushing the btree nodes with oldest journal entry, btree
nodes are iterated and slected from the tail of c->btree_cache list in
btree_flush_write(). The list_rotate_left() in bch_mca_scan() will
make btree_flush_write() iterate more nodes in c->btree_list in reverse
order.

This patch just reaps the selected btree node cache, and not move it
from the head to the tail of c->btree_cache list. Then bch_mca_scan()
will not mess up c->btree_cache list to btree_flush_write().

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:02 -07:00
Coly Li
d5c9c470b0 bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
In order to skip the most recently freed btree node cahce, currently
in bch_mca_scan() the first 3 caches in c->btree_cache_freeable list
are skipped when shrinking bcache node caches in bch_mca_scan(). The
related code in bch_mca_scan() is,

 737 list_for_each_entry_safe(b, t, &c->btree_cache_freeable, list) {
 738         if (nr <= 0)
 739                 goto out;
 740
 741         if (++i > 3 &&
 742             !mca_reap(b, 0, false)) {
             		lines free cache memory
 746         }
 747         nr--;
 748 }

The problem is, if virtual memory code calls bch_mca_scan() and
the calculated 'nr' is 1 or 2, then in the above loop, nothing will
be shunk. In such case, if slub/slab manager calls bch_mca_scan()
for many times with small scan number, it does not help to shrink
cache memory and just wasts CPU cycles.

This patch just selects btree node caches from tail of the
c->btree_cache_freeable list, then the newly freed host cache can
still be allocated by mca_alloc(), and at least 1 node can be shunk.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:02 -07:00
Coly Li
125d98edd1 bcache: remove member accessed from struct btree
The member 'accessed' of struct btree is used in bch_mca_scan() when
shrinking btree node caches. The original idea is, if b->accessed is
set, clean it and look at next btree node cache from c->btree_cache
list, and only shrink the caches whose b->accessed is cleaned. Then
only cold btree node cache will be shrunk.

But when I/O pressure is high, it is very probably that b->accessed
of a btree node cache will be set again in bch_btree_node_get()
before bch_mca_scan() selects it again. Then there is no chance for
bch_mca_scan() to shrink enough memory back to slub or slab system.

This patch removes member accessed from struct btree, then once a
btree node ache is selected, it will be immediately shunk. By this
change, bch_mca_scan() may release btree node cahce more efficiently.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:02 -07:00
Guoju Fang
d44330b7f1 bcache: print written and keys in trace_bcache_btree_write
It's useful to dump written block and keys on btree write, this patch
add them into trace_bcache_btree_write.

Signed-off-by: Guoju Fang <fangguoju@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:02 -07:00
Coly Li
2aa8c52938 bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
the commit 91be66e131 ("bcache: performance improvement for
btree_flush_write()") was an effort to flushing btree node with oldest
btree node faster in following methods,
- Only iterate dirty btree nodes in c->btree_cache, avoid scanning a lot
  of clean btree nodes.
- Take c->btree_cache as a LRU-like list, aggressively flushing all
  dirty nodes from tail of c->btree_cache util the btree node with
  oldest journal entry is flushed. This is to reduce the time of holding
  c->bucket_lock.

Guoju Fang and Shuang Li reported that they observe unexptected extra
write I/Os on cache device after applying the above patch. Guoju Fang
provideed more detailed diagnose information that the aggressive
btree nodes flushing may cause 10x more btree nodes to flush in his
workload. He points out when system memory is large enough to hold all
btree nodes in memory, c->btree_cache is not a LRU-like list any more.
Then the btree node with oldest journal entry is very probably not-
close to the tail of c->btree_cache list. In such situation much more
dirty btree nodes will be aggressively flushed before the target node
is flushed. When slow SATA SSD is used as cache device, such over-
aggressive flushing behavior will cause performance regression.

After spending a lot of time on debug and diagnose, I find the real
condition is more complicated, aggressive flushing dirty btree nodes
from tail of c->btree_cache list is not a good solution.
- When all btree nodes are cached in memory, c->btree_cache is not
  a LRU-like list, the btree nodes with oldest journal entry won't
  be close to the tail of the list.
- There can be hundreds dirty btree nodes reference the oldest journal
  entry, before flushing all the nodes the oldest journal entry cannot
  be reclaimed.
When the above two conditions mixed together, a simply flushing from
tail of c->btree_cache list is really NOT a good idea.

Fortunately there is still chance to make btree_flush_write() work
better. Here is how this patch avoids unnecessary btree nodes flushing,
- Only acquire c->journal.lock when getting oldest journal entry of
  fifo c->journal.pin. In rested locations check the journal entries
  locklessly, so their values can be changed on other cores
  in parallel.
- In loop list_for_each_entry_safe_reverse(), checking latest front
  point of fifo c->journal.pin. If it is different from the original
  point which we get with locking c->journal.lock, it means the oldest
  journal entry is reclaim on other cores. At this moment, all selected
  dirty nodes recorded in array btree_nodes[] are all flushed and clean
  on other CPU cores, it is unncessary to iterate c->btree_cache any
  longer. Just quit the list_for_each_entry_safe_reverse() loop and
  the following for-loop will skip all the selected clean nodes.
- Find a proper time to quit the list_for_each_entry_safe_reverse()
  loop. Check the refcount value of orignial fifo front point, if the
  value is larger than selected node number of btree_nodes[], it means
  more matching btree nodes should be scanned. Otherwise it means no
  more matching btee nodes in rest of c->btree_cache list, the loop
  can be quit. If the original oldest journal entry is reclaimed and
  fifo front point is updated, the refcount of original fifo front point
  will be 0, then the loop will be quit too.
- Not hold c->bucket_lock too long time. c->bucket_lock is also required
  for space allocation for cached data, hold it for too long time will
  block regular I/O requests. When iterating list c->btree_cache, even
  there are a lot of maching btree nodes, in order to not holding
  c->bucket_lock for too long time, only BTREE_FLUSH_NR nodes are
  selected and to flush in following for-loop.
With this patch, only btree nodes referencing oldest journal entry
are flushed to cache device, no aggressive flushing for  unnecessary
btree node any more. And in order to avoid blocking regluar I/O
requests, each time when btree_flush_write() called, at most only
BTREE_FLUSH_NR btree nodes are selected to flush, even there are more
maching btree nodes in list c->btree_cache.

At last, one more thing to explain: Why it is safe to read front point
of c->journal.pin without holding c->journal.lock inside the
list_for_each_entry_safe_reverse() loop ?

Here is my answer: When reading the front point of fifo c->journal.pin,
we don't need to know the exact value of front point, we just want to
check whether the value is different from the original front point
(which is accurate value because we get it while c->jouranl.lock is
held). For such purpose, it works as expected without holding
c->journal.lock. Even the front point is changed on other CPU core and
not updated to local core, and current iterating btree node has
identical journal entry local as original fetched fifo front point, it
is still safe. Because after holding mutex b->write_lock (with memory
barrier) this btree node can be found as clean and skipped, the loop
will quite latter when iterate on next node of list c->btree_cache.

Fixes: 91be66e131 ("bcache: performance improvement for btree_flush_write()")
Reported-by: Guoju Fang <fangguoju@gmail.com>
Reported-by: Shuang Li <psymon@bonuscloud.io>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:02 -07:00
Coly Li
7a0bc2a896 bcache: add code comments for state->pool in __btree_sort()
To explain the pages allocated from mempool state->pool can be
swapped in __btree_sort(), because state->pool is a page pool,
which allocates pages by alloc_pages() indeed.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:02 -07:00
Ben Dooks (Codethink)
0e0c12316d lib: crc64: include <linux/crc64.h> for 'crc64_be'
The crc64_be() is declared in <linux/crc64.h> so include
this where the symbol is defined to avoid the following
warning:

lib/crc64.c:43:12: warning: symbol 'crc64_be' was not declared. Should it be static?

Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00
Christoph Hellwig
6321bef028 bcache: use read_cache_page_gfp to read the superblock
Avoid a pointless dependency on buffer heads in bcache by simply open
coding reading a single page.  Also add a SB_OFFSET define for the
byte offset of the superblock instead of using magic numbers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00
Christoph Hellwig
475389ae5c bcache: store a pointer to the on-disk sb in the cache and cached_dev structures
This allows to properly build the superblock bio including the offset in
the page using the normal bio helpers.  This fixes writing the superblock
for page sizes larger than 4k where the sb write bio would need an offset
in the bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00
Christoph Hellwig
cfa0c56db9 bcache: return a pointer to the on-disk sb from read_super
Returning the properly typed actual data structure insteaf of the
containing struct page will save the callers some work going
forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00
Christoph Hellwig
fc8f19cc5d bcache: transfer the sb_page reference to register_{bdev,cache}
Avoid an extra reference count roundtrip by transferring the sb_page
ownership to the lower level register helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00
Coly Li
ae3cd29991 bcache: fix use-after-free in register_bcache()
The patch "bcache: rework error unwinding in register_bcache" introduces
a use-after-free regression in register_bcache(). Here are current code,
	2510 out_free_path:
	2511         kfree(path);
	2512 out_module_put:
	2513         module_put(THIS_MODULE);
	2514 out:
	2515         pr_info("error %s: %s", path, err);
	2516         return ret;
If some error happens and the above code path is executed, at line 2511
path is released, but referenced at line 2515. Then KASAN reports a use-
after-free error message.

This patch changes line 2515 in the following way to fix the problem,
	2515         pr_info("error %s: %s", path?path:"", err);

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00
Coly Li
29cda393bc bcache: properly initialize 'path' and 'err' in register_bcache()
Patch "bcache: rework error unwinding in register_bcache" from
Christoph Hellwig changes the local variables 'path' and 'err'
in undefined initial state. If the code in register_bcache() jumps
to label 'out:' or 'out_module_put:' by goto, these two variables
might be reference with undefined value by the following line,

	out_module_put:
	        module_put(THIS_MODULE);
	out:
	        pr_info("error %s: %s", path, err);
	        return ret;

Therefore this patch initializes these two local variables properly
in register_bcache() to avoid such issue.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-23 11:40:01 -07:00