In bnxt_rfs_capable(), add call to reserve vnic resources to support
NTUPLE. Return true if we can successfully reserve enough vnics.
Otherwise, reserve the minimum 1 VNIC for normal operations not
supporting NTUPLE and return false.
Also, suppress warning message about not enough resources for NTUPLE when
only 1 RX ring is in use. NTUPLE filters by definition require multiple
RX rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new method will call firmware to reserve the desired tx, rx, cmpl
rings, ring groups, stats context, and vnic resources. A second query
call will check the actual resources that firmware is able to reserve.
The driver will then trim and adjust based on the actual resources
provided by firmware. The driver will then reserve the final resources
in use.
This method is a more flexible way of using hardware resources. The
resources are not fixed and can by adjusted by firmware. The driver
adapts to the available resources that the firmware can reserve for
the driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In combined mode, the driver is currently not setting RX and TX ring
numbers the same when firmware can allocate more RX than TX or vice versa.
This will confuse the user as the ethtool convention assumes they are the
same in combined mode. Fix it by adding bnxt_trim_dflt_sh_rings() to trim
RX and TX ring numbers to be the same as the completion ring number in
combined mode.
Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
will not be the same as RX rings in combined mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
resources. Use the new API when it is supported by firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for new firmware APIs to allocate hardware resources,
add a new struct bnxt_hw_resc to hold various min, max and reserved
resources. This new structure is common for PFs and VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After SRIOV has been enabled and disabled, the MSIX vectors assigned to
the VFs have to be re-initialized. Otherwise they cannot be re-used by
the PF. For example, increasing the number of PF rings after disabling
SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.
To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
NIC, clear and re-init MSIX, and re-open the NIC.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new __bnxt_close_nic() function to do all the work previously done
in bnxt_close_nic() except waiting for SRIOV configuration. The new
function will be used in the next patch as part of SRIOV cleanup.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The version has new firmware APIs to allocate PF/VF resources more
flexibly.
New toolchains were used to generate this file, resulting in a one-time
large diffstat.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer fix from Thomas Gleixner:
"A one-liner fix which prevents deferrable timers becoming stale when
the system does not switch into NOHZ mode"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Unconditionally check deferrable base
Martin Blumenstingl says:
====================
dwmac-meson8b: clock fixes for Meson8b
this series is now successfully tested, thus we think it's ready to be
applied to your net-next tree.
Emiliano reported [0] that he couldn't get dwmac-meson8b to work on his
Odroid-C1. This is the (hopefully) final version of this series, which
was successfully tested.
Due to the fact that the public S805/S905/S912 datasheets all seem to
be outdated regarding the description of the PRG_ETH0 (also called
PRG_ETHERNET_ADDR0) register Linus Lüssing offered to help testing with
an oscilloscope and an Odroid-C1. I would like to say HUGE thanks to him
at this point as he spent hours figuring out the effects of the bits
that are (though to be) relevant to get Ethernet working on the
Odroid-C1.
We tested three scenarios, all based on version 3 of this series:
1) MPLL2 at ~500MHz, m250_div set to 1, bit 10 enabled
this resulted in a clock rate twice as high as expected at the RGMII TX
clock pin (250MHz instead of 125MHz for Gbit connections and 50MHz
instead of 25MHz for 100Mbit/s connections). it did not change the
rate at the XTAL_IN pin of PHY (which stayed consistenly at 25MHz)
2) MPLL2 at ~250MHz, m250_div set to 1, bit 10 disabled
the oscilloscope shows "no clock" for the RGMII TX clock pin at it's
highest resolution (and random rates at lower resolutions). XTAL_IN is
still at 25MHz
3) MPLL2 at ~250MHz, m250_div set to 1, bit 10 enabled
this resulted in a 125MHz signal at the RGMII TX clock pin for Gbit
speeds and 25MHz for 100Mbit/s - both values are as expected. The rate
on the XTAL_IN pin was at 25MHz
-> boot-logs (with the PRG_ETH0 register value) and screenshots from the
readings of the oscilloscope can be found at:
https://metameute.de/~tux/linux/amlogic/odroidc1/ethernet/
Version 4 of this series is based on the results from Linus Lüssing's
help with the oscilloscope and Odroid-C1.
Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
only partially test this. @Emiliano: Could you please give this version
a try and let me know about the results (preferably with a "Tested-by"
if it works)?
You obviously still need your two "ARM: dts: meson8b" patches which
- add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
- enable Ethernet on the Odroid-C1 (according to your last thest a TX
delay of 4ns is required to make it work properly)
When testing on Meson8b this also needs a fix for the MPLL clock driver:
"clk: meson: mpll: use 64-bit maths in params_from_rate", see:
https://patchwork.kernel.org/patch/10131677/
I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
fine (so let's hope that this also fixes your Meson8b issue :)).
changes since v4 at [4]:
- dropped "RFT" status since Jerome tested this series successfully!
- dropped PATCH #2 ("simplify generating the clock names"). I will
improve the whole clock registration in a separate series. since that
patch didn't really improve anything I dropped it for now
- added Jerome's Acked-/Reviewed-/Tested-by's - many thanks!
changes since v3 at [3]:
- renamed the function PATCH #1 from meson8b_init_rgmii_clk to
meson8b_init_rgmii_tx_clk since we now know what the register bits
mean
- rewrote PATCH #3 because bit 10 is a gate clock and it seems that
there is an internal fixed divide-by-2 clock. see the patch
description for a detailed explanation
- updated the description of PATCH #4 and #5 as the clock we're trying
to fix is the "RGMII TX" clock (old version stated that this is the
"RGMII clock" or "PHY reference clock"). also updated the numbers in
the description now that we have the clock hierarchy right (at least
we hope so)
changes since v2 at [2]:
- added PATCH #2 to make the following patch easier
- Emiliano reported that there's currently another bug in the
dwmac-meson8b driver which prevents it from working with RGMII PHYs on
Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
(instead of a divide by 5 or divide by 10 clock divider). This has not
been visible on GXBB and later due to the input clock which always led
to a selection of "divide by 10" (which is done internally in the IP
block, but the bit actually means "enable RGMII clock output").
PATCH #3 was added to address this issue.
- the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
updated and the patch itself rebased because the m25_div clock was
removed with the new PATCH #3 (so some of the statements were not
valid anymore)
changes since v1 at [1]:
- changed the subject of the cover-letter to indicate that this is all
about the RGMII clock
- added PATCH #1 which ensures that we don't unnecessarily change the
parent clocks in RMII mode (and also makes the code easier to
understand)
- changed subject of PATCH #2 (formerly PATCH #1) to state that this
is about the RGMII clock
- added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
- replaced PATCH #3 (formerly PATCH #2) with one that sets
CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
on Meson8b correctly
[0] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
[1] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
[2] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005861.html
[3] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005899.html
[4] http://lists.infradead.org/pipermail/linux-amlogic/2018-January/006125.html
====================
Tested-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Meson8b the only valid input clock is MPLL2. The bootloader
configures that to run at 500002394Hz which cannot be divided evenly
down to 125MHz using the m250_div clock. Currently the common clock
framework chooses a m250_div of 2 - with the internal fixed
"divide by 10" this results in a RGMII TX clock of 125001197Hz (120Hz
above the requested 125MHz).
Letting the common clock framework propagate the rate changes up to the
parent of m250_mux allows us to get the best possible clock rate. With
this patch the common clock framework calculates a rate of
very-close-to-250MHz (249999701Hz to be exact) for the MPLL2 clock
(which is the mux input). Dividing that by 2 (which is an internal,
fixed divider for the RGMII TX clock) gives us an RGMII TX clock of
124999850Hz (which is only 150Hz off the requested 125MHz, compared to
1197Hz based on the MPLL2 rate set by u-boot and the Amlogic GPL kernel
sources).
SoCs from the Meson GX series are not affected by this change because
the input clock is FCLK_DIV2 whose rate cannot be changed (which is fine
since it's running at 1GHz, so it's already a multiple of 250MHz and
125MHz).
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Meson8b only supports MPLL2 as clock input. The rate of the MPLL2 clock
set by Odroid-C1's u-boot is close to (but not exactly) 500MHz. The
exact rate is 500002394Hz, which is calculated in
drivers/clk/meson/clk-mpll.c using the following formula:
DIV_ROUND_UP_ULL((u64)parent_rate * SDM_DEN, (SDM_DEN * n2) + sdm)
Odroid-C1's u-boot configures MPLL2 with the following values:
- SDM_DEN = 16384
- SDM = 1638
- N2 = 5
The 250MHz clock (m250_div) inside dwmac-meson8b driver is derived from
the MPLL2 clock. Due to MPLL2 running slightly faster than 500MHz the
common clock framework chooses a divider which is too big to generate
the 250MHz clock (a divider of 2 would be needed, but this is rounded up
to a divider of 3). This breaks the RTL8211F RGMII PHY on Odroid-C1
because it requires a (close to) 125MHz RGMII TX clock (on Gbit speeds,
the IP block internally divides that down to 25MHz on 100Mbit/s
connections and 2.5MHz on 10Mbit/s connections - we don't need any
special configuration for that).
Round the divider to the closest value to prevent this issue on Meson8b.
This means we'll now end up with a clock rate for the RGMII TX clock of
125001197Hz (= 125MHz plus 1197Hz), which is close-enough to 125MHz.
This has no effect on the Meson GX SoCs since there fclk_div2 is used as
input clock, which has a rate of 1000MHz (and thus is divisible cleanly
to 250MHz and 125MHz).
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Reported-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tests (using an oscilloscope and an Odroid-C1 board with a RTL8211F
RGMII PHY) have shown that the PRG_ETH0 register behaves as follows:
- bit 4 is a mux to choose between two parent clocks. according to the
public S805 datasheet the only supported parent clock is MPLL2 (this
was not verified using the oscilloscope).
The public S805/S905 datasheet claims that this bit is reserved.
- bits 9:7 control a one-based divider (register value 1 means "divide
by 1", etc.) for the input clock. we call this clock the "m250_div"
clock because it's value is always supposed to be (close to) 250MHz
(see below for an explanation).
The description in the public S805/S905 datasheet is a bit cryptic,
but it comes down to "input clock = 250MHz * value" (which could also
be expressed as "250MHz = input clock / value")
- there seems to be an internal fixed divide-by-2 clock which takes the
output from the m250_div and divides it by 2. This is not unusual on
Amlogic SoCs, since the SDIO (MMC) driver also uses an internal fixed
divide-by-2 clock.
This is not documented in the public S805/S905 datasheet
- bit 10 controls a gate clock which enables or disables the RGMII TX
clock (which is an output on the MAC/SoC and an input in the PHY). we
call this the "rgmii_tx_en" clock. if this bit is set to "0" the RGMII
TX clock output is close to 0
The description for this bit in the public S805/S905 datasheet is
"Generate 25MHz clock for PHY". Based on these tests it's believed
that this is wrong, and should probably read "Generate the 125MHz
RGMII TX clock for the PHY"
- the RGMII TX clock has to be set to 125MHz - the IP block adjusts the
output (automatically) depending on the line speed (RGMII specifies
that Gbit connections use a 125MHz clock, 100Mbit/s connections use a
25MHz clock and 10Mbit/s connections use a 2.5MHz clock. only Gbit and
100Mbit/s were tested with an oscilloscope). Due to the requirement
that this clock always has to be set to 125MHz and due to the fixed
divide-by-2 parent clock this means that m250_div will always end up
with a rate of (close to) 250MHz.
- bits 6:5 are the TX delay, which is also named "clock phase" in some
of Amlogic's older GPL kernel sources.
The PHY also has an XTAL_IN pin where a 25MHz clock has to be provided.
Tests with the oscilloscope have shown that this is routed to a crystal
right next to the RTL8211F PHY. The same seems to be true on the Khadas
VIM2 (which uses a GXM SoC) board - however the 25MHz crystal is on the
other side of the PCB there.
This updates the clocks in the dwmac-meson8b driver by replacing the
"m25_div" with the "rgmii_tx_en" clock and additionally introducing a
fixed divide-by-2 clock between "m250_div" and "rgmii_tx_en".
Now we also need to set a frequency of 125MHz on the RGMII clock
(opposed to the 25MHz we set before, with that non-existing
divide-by-5-or-10 divider).
Special thanks go to Linus Lüssing for testing the various bits and
checking the results with an oscilloscope on his Odroid-C1!
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Reported-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neither the m25_div_clk nor the m250_div_clk or m250_mux_clk are used in
RMII mode. The m25_div_clk output is routed to the RGMII PHY's "RGMII
clock".
This means that we don't need to configure the clocks in RMII mode. The
driver however did this - with no effect since the clocks are not routed
to the PHY in RMII mode.
While here also rename meson8b_init_clk to meson8b_init_rgmii_tx_clk to
make it easier to understand the code.
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per 90caccdd8c ("bpf: fix bpf_tail_call() x64 JIT"), the index used
for array lookup is defined to be 32-bit wide. Update a misleading
comment that suggests it is 64-bit wide.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When the source and destination register are identical, our JIT does not
generate correct code, which leads to kernel oopses.
Fix this by (a) generating more efficient code, and (b) making use of
the temporary earlier if we will overwrite the address register.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When an eBPF program tail-calls another eBPF program, it enters it after
the prologue to avoid having complex stack manipulations. This can lead
to kernel oopses, and similar.
Resolve this by always using a fixed stack layout, a CPU register frame
pointer, and using this when reloading registers before returning.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The stack layout documentation incorrectly suggests that the BPF JIT
scratch space starts immediately below BPF_FP. This is not correct,
so let's fix the documentation to reflect reality.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Move the stack documentation towards the top of the file, where it's
relevant for things like the register layout.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
As per 2dede2d8e9 ("ARM EABI: stack pointer must be 64-bit aligned
after a CPU exception") the stack should be aligned to a 64-bit boundary
on EABI systems. Ensure that the eBPF JIT appropraitely aligns the
stack.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Avoid the 'bx' instruction on CPUs that have no support for Thumb and
thus do not implement this instruction by moving the generation of this
opcode to a separate function that selects between:
bx reg
and
mov pc, reg
according to the capabilities of the CPU.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Commit 0dfb33a0d7 ("sch_red: report backlog information") copied
child's backlog into RED's backlog. Back then RED did not maintain
its own backlog counts. This has changed after commit 2ccccf5fb4
("net_sched: update hierarchical backlog too") and commit d7f4f332f0
("sch_red: update backlog as well"). Copying is no longer necessary.
Tested:
$ tc -s qdisc show dev veth0
qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
backlog 1260b 14p requeues 14
marked 0 early 0 pdrop 0 other 0
qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
backlog 1260b 14p requeues 14
Recently RED offload was added. We need to make sure drivers don't
depend on resetting the stats. This means backlog should be treated
like any other statistic:
total_stat = new_hw_stat - prev_hw_stat;
Adjust mlxsw.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Nogah Frankel <nogahf@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the user has not set max_beb_per1024 using either the cmdline or
Kconfig options for doing so, use the MTD function 'max_bad_blocks' to
compute the UBI bad_peb_limit.
Signed-off-by: Jeff Westfahl <jeff.westfahl@ni.com>
Signed-off-by: Zach Brown <zach.brown@ni.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electron.com>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
fs/ubifs/tnc.c: In function ‘search_dh_cookie’:
fs/ubifs/tnc.c:1893: warning: ‘err’ is used uninitialized in this function
Indeed, err is always used uninitialized.
According to an original review comment from Hyunchul, acknowledged by
Richard, err should be initialized to -ENOENT to avoid the first call to
tnc_next(). But we can achieve the same by reordering the code.
Fixes: 781f675e2d ("ubifs: Fix unlink code wrt. double hash lookups")
Reported-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
struct timeval is deprecated because it cannot represent times
past 2038. In this driver, the only use of this structure is
to capture debug information. This is easily changed to ktime_t,
which we then format as needed when printing it later.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20180117154916.219273-1-arnd@arndb.de
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Inline macro for MODULE_LICENSE to make the license information easy to
find, eg with grep. Inline the other module-related macros at the same
time.
A simplified version of the semantic patch for the MODULE_LICENSE
case is as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@s@
identifier i; expression e;
@@
@@
declarer name MODULE_LICENSE;
identifier s.i;
expression s.e;
@@
MODULE_LICENSE(
- i
+ e
);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
[dtor: added a couple of drivers missed by the script, removed a few unused
DRIVER_VERSION macros]
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Return value for mipi_dsi_shutdown_peripheral() is unchecked.
Check it and return any errors if they come up. Even if
mipi_dsi_shutdown_peripheral() fails, continue attempting to
disable.
Cc: Philippe Cornu <philippe.cornu@st.com>
Reviewed-by: Philippe Cornu <philippe.cornu@st.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180116222217.240939-1-seanpaul@chromium.org
When testing that the timeout fired, we need to be sure we have waited
just long enough for the timeout to have occurred and for the softirq
(on another cpu) to have completed. Sleeping for an arbitrary amount is
prone to error, so wait for the timeout instead and complain if it was
too late.
v2: Use wait_event_timeout to provide an upper bound
v3: Fix inverted check for wait_event_timeout timing out
v4: Restore the check that the fences aren't signalled too early, by
inspecting them before the expected timeout.
References: https://bugs.freedesktop.org/show_bug.cgi?id=104670
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180117135713.2324-1-chris@chris-wilson.co.uk
Fixes: 4246a0b63b ("block: add a bi_error field to struct bio")
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Define the bit positions instead of macros using the magic values,
and move the expanded helpers to calculate the size and size unit into
the implementation C file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Refactor the call to nvme_map_cmb, and change the conditions for probing
for the CMB. First remove the version check as NVMe TPs always apply
to earlier versions of the spec as well. Second check for the whole CMBSZ
register for support of the CMB feature instead of just the size field
inside of it to simplify the code a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
When connectivity is lost to a device, the association is terminated
and the blk-mq queues are quiesced/stopped. When connectivity is
re-established, they are resumed.
If connectivity is lost for a sufficient amount of time that the
controller is then deleted, the delete path starts tearing down queues,
and eventually calling nvme_ns_remove(). It appears that pending
commands may cause blk_cleanup_queue() to never complete and the
teardown stalls.
Correct by starting the ns queues after transitioning to a DELETING
state, allowing pending commands to be flushed with io failures. Thus
the delete path is clear when reached.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When connectivity is lost to a device, the association is terminated
and the blk-mq queues are quiesced/stopped. When connectivity is
re-established, they are resumed.
If an admin command is received while connectivity is list, the ioctl
queues the command on the admin_q and the command stalls (the thread
issuing the ioctl hangs/waits). if the connectivity is lost long
enough such that the controller is then deleted, the delete code
makes its calls to initiate the delete, which then expects the core
layer to call the transport when all references are removed and the
controller can be freed. Unfortunately, nothing in this path dequeued
the admin command, so a reference sits outstanding and things stop,
hanging the delete indefinitely.
Correct by unquiescing the admin queue in the delete association. This
means any admin command (which should only be from an ioctl) issued
after connectivity is lost will detect the controller is in a
reconnecting state and will (fast) fail the command. Thus, a pending
reference can no longer be created. Once connectivity is re-established,
a new ioctl/admin command would see proper device state and function again.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
After commit:
923218f616 ("blk-mq: don't allocate driver tag upfront for flush rq")
we no longer use the 'can_block' argument in
blk_mq_sched_insert_request(). Kill it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Added actual commit message as to why it's being removed.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_insert_cloned_request() is called in the fast path of a dm-rq driver
(e.g. blk-mq request-based DM mpath). blk_insert_cloned_request() uses
blk_mq_request_bypass_insert() to directly append the request to the
blk-mq hctx->dispatch_list of the underlying queue.
1) This way isn't efficient enough because the hctx spinlock is always
used.
2) With blk_insert_cloned_request(), we completely bypass underlying
queue's elevator and depend on the upper-level dm-rq driver's elevator
to schedule IO. But dm-rq currently can't get the underlying queue's
dispatch feedback at all. Without knowing whether a request was issued
or not (e.g. due to underlying queue being busy) the dm-rq elevator will
not be able to provide effective IO merging (as a side-effect of dm-rq
currently blindly destaging a request from its elevator only to requeue
it after a delay, which kills any opportunity for merging). This
obviously causes very bad sequential IO performance.
Fix this by updating blk_insert_cloned_request() to use
blk_mq_request_direct_issue(). blk_mq_request_direct_issue() allows a
request to be issued directly to the underlying queue and returns the
dispatch feedback (blk_status_t). If blk_mq_request_direct_issue()
returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE
to _not_ destage the request. Whereby preserving the opportunity to
merge IO.
With this, request-based DM's blk-mq sequential IO performance is vastly
improved (as much as 3X in mpath/virtio-scsi testing).
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[blk-mq.c changes heavily influenced by Ming Lei's initial solution, but
they were refactored to make them less fragile and easier to read/review]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No functional change. Just makes code flow more logically.
In following commit, __blk_mq_try_issue_directly() will be used to
return the dispatch result (blk_status_t) to DM. DM needs this
information to improve IO merging.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We know this WARN_ON is harmless and in reality it may be trigged,
so convert it to printk() and dump_stack() to avoid to confusing
people.
Also add comment about two releated races here.
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When hctx->next_cpu is set from possible online CPUs, there is one
race in which hctx->next_cpu may be set as >= nr_cpu_ids, and finally
break workqueue.
The race can be triggered in the following two sitations:
1) when one CPU is becoming DEAD, blk_mq_hctx_notify_dead() is called
to dispatch requests from the DEAD cpu context, but at that
time, this DEAD CPU has been cleared from 'cpu_online_mask', so all
CPUs in hctx->cpumask may become offline, and cause hctx->next_cpu set
a bad value.
2) blk_mq_delay_run_hw_queue() is called from CPU B, and found the queue
should be run on the other CPU A, then CPU A may become offline at the
same time and all CPUs in hctx->cpumask become offline.
This patch deals with this issue by re-selecting next CPU, and making
sure it is set correctly.
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: "jianchao.wang" <jianchao.w.wang@oracle.com>
Tested-by: "jianchao.wang" <jianchao.w.wang@oracle.com>
Fixes: 20e4d81393 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The latest merge between net and net-next introduced a complier assert in
mlx5 driver. In hca_cap_bits older fields are kept along with newer
fields that should have replaced them.
Fixes: c02b3741eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix various per event 'max-stack' and 'call-graph=dwarf' issues,
mostly in 'perf trace', allowing to use 'perf trace --call-graph' with
'dwarf' and 'fp' to setup the callgraph details for the syscall events
and make that apply to other events, whilhe allowing to override that on
a per-event basis, using '-e sched:*switch/call-graph=dwarf/' for
instance (Arnaldo Carvalho de Melo)
- Improve the --time percent support in record/report/script (Jin Yao)
- Fix copyfile_offset update of output offset (Jiri Olsa)
- Add python script to profile and resolve physical mem type (Kan Liang)
- Add ARM Statistical Profiling Extensions (SPE) support (Kim Phillips)
- Remove trailing semicolon in the evlist code (Luis de Bethencourt)
- Fix incorrect handling of type _TERM_DRV_CFG (Mathieu Poirier)
- Use asprintf when possible in libtraceevent (Federico Vaga)
- Fix bad force_token escape sequence in libtraceevent (Michael Sartain)
- Add UL suffix to MISSING_EVENTS in libtraceevent (Michael Sartain)
- value of unknown symbolic fields in libtraceevent (Jan Kiszka)
- libtraceevent updates: (Steven Rostedt)
o Show value of flags that have not been parsed
o Simplify pointer print logic and fix %pF
o Handle new pointer processing of bprint strings
o Show contents (in hex) of data of unrecognized type records
o Fix get_field_str() for dynamic strings
- Add missing break in FALSE case of pevent_filter_clear_trivial() (Taeung Song)
- Fix failed memory allocation for get_cpuid_str (Thomas Richter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlpfcqoACgkQ1lAW81NS
qkBDxQ/8CdJGxeHyookAaEYAUvezRBQKi7fwkZN+wYwWE5YpsgLI4MONT9boQGBE
siPCrRcPjGU7455Xka2/wC067lZkfn7aTEHrVODUld3uLrnNtsa0zuKogaF9+D8E
4MxW5OsMyfn+hx9JGxQcH59euHeeJcMY5Rj+067x6KgMX9znl6wFJfBTkLC8tXhx
lRhpC3sbENHorLg+u00MCtlAIIHc9iqsUR32CCsGfFJTLQc2Asy1Xb0he/GRYLrX
vdrflQ7TB7I/c+yQayBPpKSLQvhj/F0ltHtWBzH3haCdp97NWQ+gaYxeO6MZ3j7Z
6zTrUh9ICG65Nw4SnOHHiz9Y0u0prMSkHdcfCl03d4AJf+iHfXoZwWiEMZhzBmdQ
rGDNEdXigORkjOLGG7oFlQYny3GQzH4cY7+r138dpicZmys6z4wh9yU1I1ex8wcL
EfGxpzGi7uKoACu3Rlxv+ByeLTsiCnkUUcmeQg9qq1M99Bw3qP19VXiFlaCE2NPT
gL8iKo392oVyyJzMZgFAxzDQ+2YlvcsjBZ6kW7YG6Jl8H3AqRPDoSLXrs890PQzR
BJTkRUEajddVv/1lzSXy67RypQdcGADqNC2F5/4akyfAF2jCBJj4VVnxlvky7gxj
r6u9h8zpi0pV5Rk4qSicu4IvidfNLpK6gWqfoJIEoJDA2/CYZP8=
=dEKk
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-4.16-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Fix various per event 'max-stack' and 'call-graph=dwarf' issues,
mostly in 'perf trace', allowing to use 'perf trace --call-graph' with
'dwarf' and 'fp' to setup the callgraph details for the syscall events
and make that apply to other events, whilhe allowing to override that on
a per-event basis, using '-e sched:*switch/call-graph=dwarf/' for
instance (Arnaldo Carvalho de Melo)
- Improve the --time percent support in record/report/script (Jin Yao)
- Fix copyfile_offset update of output offset (Jiri Olsa)
- Add python script to profile and resolve physical mem type (Kan Liang)
- Add ARM Statistical Profiling Extensions (SPE) support (Kim Phillips)
- Remove trailing semicolon in the evlist code (Luis de Bethencourt)
- Fix incorrect handling of type _TERM_DRV_CFG (Mathieu Poirier)
- Use asprintf when possible in libtraceevent (Federico Vaga)
- Fix bad force_token escape sequence in libtraceevent (Michael Sartain)
- Add UL suffix to MISSING_EVENTS in libtraceevent (Michael Sartain)
- value of unknown symbolic fields in libtraceevent (Jan Kiszka)
- libtraceevent updates: (Steven Rostedt)
o Show value of flags that have not been parsed
o Simplify pointer print logic and fix %pF
o Handle new pointer processing of bprint strings
o Show contents (in hex) of data of unrecognized type records
o Fix get_field_str() for dynamic strings
- Add missing break in FALSE case of pevent_filter_clear_trivial() (Taeung Song)
- Fix failed memory allocation for get_cpuid_str (Thomas Richter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When powering up, the SATA controller may fail to mount the HDD. The SATA
controller will lock up, preventing it from negotiating to a lower speed or
transmitting data. Root cause is power supply noise creating resonance at 6 Ghz
and 3 GHz frequencies, which causes instability in the Clock-Data Recovery
(CDR) frontend module, resulting in false acquisition of the clock at SATA
6G/3G speeds.
The SATA controller may fail to mount the HDD and lock up, requiring a power
cycle. Broadcom chips suspected of being susceptible to this issue include
BCM7445, BCM7439, and BCM7366.
The Kernel implements an error recovery mechanism that resets the SATA PHY and
digital controller when the controller locks up. During this error recovery
process, typically there is less activity on the board and Broadcom STB chip,
so that the power supply is less noisy, thus allowing the SATA controller to
lock correctly.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Implement the calibration callback to allow turning on the Clock-Data
Recovery module clamping when necessary, e.g: during failure to identify
a SATA hard drive from ahci_brcm.c::brcm_ahci_read_id.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
'struct frame' uses two variables to store the sent timestamp - 'struct
timeval' and jiffies. jiffies is used to avoid discrepancies caused by
updates to system time. 'struct timeval' is deprecated because it uses
32-bit representation for seconds which will overflow in year 2038.
This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is the recommended type for timestamping
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock
time prevents any discprepancies caused by updates to system time.
[updates by Arnd below]
The original patch from Tina never went anywhere as we discussed how
to keep the impact on performance minimal. I've started over now but
arrived at basically the same patch that she had originally, except for
an slightly improved tsince_hr() function. I'm making it more robust
against overflows, and also optimize explicitly for the common case
in which a frame is less than 4.2 seconds old, using only a 32-bit
division in that case.
This should make the new version more efficient than the old code,
since we replace the existing two 32-bit division in do_gettimeofday()
plus one multiplication with a single single 32-bit division in
tsince_hr() and drop the double bookkeeping. It's also more efficient
than the ktime_get_us() API we discussed before, since that would
also rely on multiple divisions.
Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.html
Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Cc: Ed Cashin <ed.cashin@acm.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It looks like in all cases 'struct vmw_connector_state' is used. But
only in stdu connectors, was atomic_{duplicate,destroy}_state() properly
subclassed. Leading to writes beyond the end of the allocated connector
state block and all sorts of fun memory corruption related crashes.
Fixes: d7721ca711 "drm/vmwgfx: Connector atomic state"
Cc: <stable@vger.kernel.org>
Signed-off-by: Rob Clark <rclark@redhat.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
btcoex configure antenna by rfe_type that is RF type programmed in efuse.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
This commit implement the common function to sort old features, and add
more new features that are get_supported_feature, get_supported_version,
get_ant_det_val, ble_scan_type, ble_scan_para, bt_dev_info,
forbidden_slot_val, afh_map and etc.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Each of these typedefs are only referenced in a single location later
in this header. Thus, they are easily removed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Remove global variables, so btcoexist can support multiple instances
simultaneously.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>