Currently when probing returns an error, the netdev is freed but
phylink_disconnect is not called.
Create a common function between the unbind path and the error path,
call it the opposite of dpaa2_switch_probe_port: dpaa2_switch_remove_port,
and call it from both the unbind and the error path.
Fixes: 84cba72956 ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an ASSERT_RTNL in phylink_disconnect_phy which triggers
whenever dpaa2_switch_port_disconnect_mac is called.
To follow the pattern established by dpaa2_eth_disconnect_mac, take the
rtnl_mutex every time we call dpaa2_switch_port_disconnect_mac.
Fixes: 84cba72956 ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When disable_irq_nosync for an interrupt is called from within its
interrupt handler, this interrupt is only marked as disabled with the
intention to mask it when it triggers again.
The AIC hardware however automatically masks the interrupt when it is read.
aic_irq_eoi then unmasks it again if it's not disabled *and* not masked.
This results in a state mismatch between the hardware state and the
state kept in irq_data: The hardware interrupt is masked but
IRQD_IRQ_MASKED is not set. Any further calls to unmask_irq will directly
return and the interrupt can never be enabled again.
Fix this by keeping the hardware and irq_data state in sync by unmasking in
aic_irq_eoi if and only if the irq_data state also assumes the interrupt to
be unmasked.
Fixes: 76cde26394 ("irqchip/apple-aic: Add support for the Apple Interrupt Controller")
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Acked-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210812100942.17206-1-sven@svenpeter.dev
Gerhard Engleder says:
====================
Add Xilinx GMII2RGMII loopback support
The Xilinx GMII2RGMII driver overrides PHY driver functions in order to
configure the device according to the link speed of the PHY attached to
it. This is implemented for a normal link but not for loopback.
Andrew told me to use phy_loopback and this changes make phy_loopback
work in combination with Xilinx GMII2RGMII.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure speed if loopback is used. read_status is not called for
loopback.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct phy_device contains a pointer to the PHY driver and nearly
everywhere this pointer is used to access the PHY driver. Only
mdio_bus_phy_may_suspend() is still using to_phy_driver() instead of the
PHY driver pointer. Uniform PHY driver access by eliminating
to_phy_driver() use in mdio_bus_phy_may_suspend().
Only phy_bus_match() and phy_probe() are still using to_phy_driver(),
because PHY driver pointer is not available there.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
phy_read_status and various other PHY functions support PHY specific
overriding of driver functions by using a PHY specific pointer to the
PHY driver. Add support of PHY specific override to phy_loopback too.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund says:
====================
Adding Frame DMA functionality to Sparx5
v2:
Removed an unused variable (proc_ctrl) from sparx5_fdma_start.
This add frame DMA functionality to the Sparx5 platform.
Until now the Sparx5 SwitchDev driver has been using register based
injection and extraction when sending frames to/from the host CPU.
With this series the Frame DMA functionality now added.
The Frame DMA is only used if the Frame DMA interrupt is configured in the
device tree; otherwise the existing register based injection and extraction
is used.
The Sparx5 has two ports that can be used for sending and receiving frames,
but there are 8 channels that can be configured: 6 for injection and 2 for
extraction.
The additional channels can be used for more advanced scenarios e.g. where
virtual cores are used, but currently the driver only uses port 0 and
channel 0 and 6 respectively.
DCB (data control block) structures are passed to the Frame DMA with
suitable information about frame start/end etc, as well as pointers to DB
(data blocks) buffers.
The Frame DMA engine can use interrupts to signal back when the frames have
been injected or extracted.
There is a limitation on the DB alignment also for injection: Block must
start on 16byte boundaries, and this is why the driver currently copies the
data to into separate buffers.
The Sparx5 switch core needs a IFH (Internal Frame Header) to pass
information from the port to the switch core, and this header is added
before injection and stripped after extraction.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the interrupt for the Sparx5 Frame DMA.
If this configuration is present the Sparx5 SwitchDev driver will use the
Frame DMA feature, and if not it will use register based injection and
extraction for sending and receiving frames to the CPU.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This add frame DMA functionality to the Sparx5 platform.
Ethernet frames can be extracted or injected autonomously to or from the
device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
structures in memory are used for injecting or extracting Ethernet frames.
The FDMA generates interrupts when frame extraction or injection is done
and when the linked lists need updating.
The FDMA implements two extraction channels, one per switch core port
towards the VCore CPU system and a total of six injection channels.
Extraction channels are mapped one-to-one to the CPU ports, while injection
channels can be individually assigned to any CPU port.
- FDMA channel 0 through 5 corresponds to CPU port 0 injection direction
FDMA_CH_CFG[channel].CH_INJ_PORT is set to 0.
- FDMA channel 0 through 5 corresponds to CPU port 1 injection direction when
FDMA_CH_CFG[channel].CH_INJ_PORT is set to 1.
- FDMA channel 6 corresponds to CPU port 0 extraction direction.
- FDMA channel 7 corresponds to CPU port 1 extraction direction.
The FDMA implements a strict priority scheme among channels. Extraction
channels are prioritized over injection channels and secondarily channels
with higher channel number are prioritized over channels with lower number.
On the other hand, ports are being served on an equal-bandwidth principle
both on injection and extraction directions. The equal-bandwidth principle
will not force an equal bandwidth. Instead, it ensures that the ports
perform at their best considering the operating conditions.
When more than one injection channel is enabled for injection on the same
CPU port, priority determines which channel can inject data. Ownership
is re-arbitrated on frame boundaries.
The FDMA processes linked lists of DMA Control Block Structures (DCBs). The
DCBs have the same basic structure for both injection and extraction. A DCB
must be placed on a 64-bit word-aligned address in memory. Each DCB has a
per-channel configurable amount of associated data blocks in memory, where
the frame data is stored.
The data blocks that are used by extraction channels must be placed on
64-bit word aligned addresses in memory, and their length must be a
multiple of 128 bytes.
A DCB carries the pointer to the next DCB of the linked list, the INFO word
which holds information for the DCB, and a pair of status word and memory
pointer for every data block that it is associated with.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device string on these can differ, apparently, including typos. I've
bought 2 of these in 2012 and googling shows many folks out there with
that broken spelling in their dmesg.
Signed-off-by: Ulrich Spörlein <uqs@FreeBSD.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This commit adds support for the Guitar Hero Live PS4 dongles.
These dongles require a "magic" USB control message to be sent
every 8 seconds otherwise the dongle will not report events where
the strumbar is hit while a fret is being held.
Note that the GHL_GUITAR_POKE_INTERVAL is reduced to 8 seconds in order
to support PS3, Wii U, and PS4 GHL dongles.
Also note that the constant for vendor id 0x1430 has been renamed from
Activision to RedOctane as self-declared by the device.
Co-developed-by: Pascal Giard <pascal.giard@etsmtl.ca>
Signed-off-by: Pascal Giard <pascal.giard@etsmtl.ca>
Signed-off-by: Daniel Nguyen <daniel.nguyen.1@ens.etsmtl.ca>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Commit c49c336378 ("HID: support for initialization of some Thrustmaster
wheels") messed up the Makefile and quirks during the refactoring of this
commit.
Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs:
HID_TMINIT
Referencing files: drivers/hid/Makefile, drivers/hid/hid-quirks.c
Following the discussion (see Link), CONFIG_HID_THRUSTMASTER is the
intended config for CONFIG_HID_TMINIT and the file hid-tminit.c was
actually added as hid-thrustmaster.c.
So, clean up Makefile and adapt quirks to that refactoring.
Fixes: c49c336378 ("HID: support for initialization of some Thrustmaster wheels")
Link: https://lore.kernel.org/linux-input/CAKXUXMx6dByO03f3dX0X5zjvQp0j2AhJBg0vQFDmhZUhtKxRxw@mail.gmail.com/
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
A quirk was recently added for Elan devices that has same device match
as an entry earlier in the list. The i2c_hid_lookup_quirk function will
always return the last match in the list, so the new entry shadows the
old entry. The quirk in the previous entry, I2C_HID_QUIRK_BOGUS_IRQ,
silenced a flood of messages which have reappeared in the 5.13 kernel.
This change moves the two quirk flags into the same entry.
Fixes: ca66a6770b (HID: i2c-hid: Skip ELAN power-on command after reset)
Signed-off-by: Jim Broadus <jbroadus@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Prevent the ASUS Claymore keyboard from sending a suspend event
when the device sleeps itself. The suspend event causes a system
suspend if uncaught.
Signed off by: Luke D Jones <luke@ljones.dev>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add dynamic debug for debugging sensors states during
initialization, stop, suspend and resume.
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add support for power management routines.
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Earlier platforms don’t have sensor status checking mechanism.
Sensors are always enabled without checking sensor status.
Hence invoke hid probe only after the sensor is enabled by
checking sensor status.
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Sometimes sensor enable/disable may take time, without checking the
actual status bits from MP2 FW can lead the amd-sfh to misbehave.
Hence add a status check of enable/disable command
by waiting on the command response before sending the next
command to FW.
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Existing amd-sfh driver is programming the MP2 firmware period field in
units of jiffies, but the MP2 firmware expects in milliseconds unit.
Changing it to milliseconds.
Fixes: 4b2c53d93a ("SFH:Transport Driver to add support of AMD Sensor Fusion Hub (SFH)")
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add driver for Intel Keem Bay SoC PCIe controller. This controller
is based on DesignWare PCIe core.
In Root Complex mode, only internal reference clock is possible for
Keem Bay A0. For Keem Bay B0, external reference clock can be used
and will be the default configuration. Currently, keembay_pcie_of_data
structure has one member. It will be expanded later to handle this
difference.
Endpoint mode link initialization is handled by the boot firmware.
Link: https://lore.kernel.org/r/20210805211010.29484-3-srikanth.thokala@intel.com
Signed-off-by: Wan Ahmad Zainie <wan.ahmad.zainie.wan.mohamad@intel.com>
Signed-off-by: Srikanth Thokala <srikanth.thokala@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Document DT bindings for PCIe controller found on Intel Keem Bay SoC.
Link: https://lore.kernel.org/r/20210805211010.29484-2-srikanth.thokala@intel.com
Signed-off-by: Wan Ahmad Zainie <wan.ahmad.zainie.wan.mohamad@intel.com>
Signed-off-by: Srikanth Thokala <srikanth.thokala@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
In commit 6df6ba974a ("PCI: aardvark: Remove PCIe outbound window
configuration") was removed aardvark PCIe outbound window configuration and
commit description said that was recommended solution by HW designers.
But that commit completely removed support for configuring PCIe IO
resources without removing PCIe IO 'ranges' from DTS files. After that
commit PCIe IO space started to be treated as PCIe MEM space and accessing
it just caused kernel crash.
Moreover implementation of PCIe outbound windows prior that commit was
incorrect. It completely ignored offset between CPU address and PCIe bus
address and expected that in DTS is CPU address always same as PCIe bus
address without doing any checks. Also it completely ignored size of every
PCIe resource specified in 'ranges' DTS property and expected that every
PCIe resource has size 128 MB (also for PCIe IO range). Again without any
check. Apparently none of PCIe resource has in DTS specified size of 128
MB. So it was completely broken and thanks to how aardvark mask works,
configuration was completely ignored.
This patch reverts back support for PCIe outbound window configuration but
implementation is a new without issues mentioned above. PCIe outbound
window is required when DTS specify in 'ranges' property non-zero offset
between CPU and PCIe address space. To address recommendation by HW
designers as specified in commit description of 6df6ba974a, set default
outbound parameters as PCIe MEM access without translation and therefore
for this PCIe 'ranges' it is not needed to configure PCIe outbound window.
For PCIe IO space is needed to configure aardvark PCIe outbound window.
This patch fixes kernel crash when trying to access PCIe IO space.
Link: https://lore.kernel.org/r/20210624215546.4015-2-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org # 6df6ba974a ("PCI: aardvark: Remove PCIe outbound window configuration")
The implict soft-mask table addresses get relocated if they use a
relative symbol like a label. This is right for code that runs relocated
but not for unrelocated. The scv interrupt vectors run unrelocated, so
absolute addresses are required for their soft-mask table entry.
This fixes crashing with relocated kernels, usually an asynchronous
interrupt hitting in the scv handler, then hitting the trap that checks
whether r1 is in userspace.
Fixes: 325678fd05 ("powerpc/64s: add a table of implicit soft-masked addresses")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210820103431.1701240-1-npiggin@gmail.com
Add the missing unlock before return from function arm_smmu_device_group()
in the error handling case.
Fixes: b1a1347912 ("iommu/arm-smmu: Fix race condition during iommu_group creation")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210820074949.1946576-1-yangyingliang@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For devices that only support the BATTERY_VOLTAGE (0x1001) feature, UPower
requires the additional information provided by this patch, to set them up.
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Reviewed-by: Filipe Laíns <lains@riseup.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
- bump version strings, by Simon Wunderlich
- update docs about move IRC channel away from freenode,
by Sven Eckelmann (updated, added missing sign-off)
- Switch to kstrtox.h for kstrtou64, by Sven Eckelmann
- Update NULL checks, by Sven Eckelmann (2 patches)
- remove remaining skb-copy calls for broadcast packets,
by Linus Lüssing
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmEfZ9gWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoR68D/4sL2MjvDRlhuUjyVlBaD0NH3uY
mvCvgZB2AXCNFe3ivz3C4yvMg+YMF/yk3GrI8kbPR6TGPO4H4SeU9LOGFSymIFGV
CGicXCAB7Q3Wa1/1Pkao3mRvoEUJRMgLtWBrkqhZJYWN8rosjw6RiECM4H4YyQ51
xy83T4cR6anU10JX5aIaqJIv/eZyB4oW64yh1EmQTxbcSfCbcWKgY5qZM8DbKNMR
hy9jo32ugPDb7e4JTyy06r6u1TFOie1R1mkzhb0+Xm4AxpaoCRf7RnSsNgGa+Ch/
Id5YGGCAetEex5kedw1rvR8OG9DXvKT3eWT5mY7QSCKhG6GslXXS5CPJXNeTY2uP
O5X/+KVjo6FQHim/GiAIv5MwHWHcCDT099bOqom9nGS+kEFh/plPwpaF2eU9CWJt
wHEkq7es6/hbSeXF6AQTpgpVJRKznpW5BqrFt0JqE0MMKCahs6OdvacSIEH0gjKn
pvULqtoSwgWKsYuETgXmGMC3iNX76DAKQH4M4Z2ycuGbwh7ctfxDKwbbt2Webhvs
Uv7e5iaqqVwlpeoZo+GBn7EwODuQCgFdMYoqmkF9lG2F3OK6UPgOtUUB9nEcKICv
ylnM1p3V73Bpp6HZbC5FXU9fJOprt3F23nS29twLeBV7KEZ9irjzZHR2gjLr5UJA
HOsDxVz00m5xAV9mhA==
=APg0
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20210820' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This (updated) cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- update docs about move IRC channel away from freenode,
by Sven Eckelmann (updated, added missing sign-off)
- Switch to kstrtox.h for kstrtou64, by Sven Eckelmann
- Update NULL checks, by Sven Eckelmann (2 patches)
- remove remaining skb-copy calls for broadcast packets,
by Linus Lüssing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
thrustmaster_interrupts() does not free memory for send_buf when
usb_interrupt_msg() fails. This is fixed by the given patch.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Evgeny Novikov <novikov@ispras.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
thrustmaster_remove() does not release memory for
tm_wheel->change_request. This is fixed by the patch.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Evgeny Novikov <novikov@ispras.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When thrustmaster_probe() handles errors of usb_submit_urb() it does not
free allocated resources and fails. The patch fixes that.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Evgeny Novikov <novikov@ispras.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Use usb_get_dev() to increment the reference count of the usb device
structure in order to avoid releasing the structure while it is still in
use. And use usb_put_dev() to decrement the reference count and thus,
when it will be equal to 0 the structure will be released.
Signed-off-by: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This function logs a warning if the workqueue gets too big.
In order to save a few cycles, use 'atomic_inc_return()' instead of an
'atomic_inc()/atomic_read()' sequence.
This axes a line of code and saves a 'atomic_read()' call.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
drivers/spi/spi-stm32.c:915:23-25: WARNING !A || A && B is equivalent to !A || B
Condition !A || A && B is equivalent to !A || B.
Generated by: scripts/coccinelle/misc/excluded_middle.cocci
Fixes: 7ceb0b8a3c ("spi: stm32: finalize message either on dma callback or EOT")
CC: Alain Volmat <alain.volmat@foss.st.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Alain Volmat <alain.volmat@foss.st.com>
Link: https://lore.kernel.org/r/20210713191004.GA14729@5eb5c2cbef84
Signed-off-by: Mark Brown <broonie@kernel.org>
This driver is assuming that all adg->clk[i] is not NULL.
Because of this prerequisites, for_each_rsnd_clk() is possible to work
for all clk without checking NULL. In other words, all adg->clk[i]
should not NULL.
Some SoC might doesn't have clk_a/b/c/i. devm_clk_get() returns error in
such case. This driver calls rsnd_adg_null_clk_get() and use null_clk
instead of NULL in such cases.
But devm_clk_get() might returns NULL even though such clocks exist, but
it doesn't mean error (user deliberately chose to disable the feature).
NULL clk itself is not error from clk point of view, but is error from
this driver point of view because it is not assuming such case.
But current code is using IS_ERR() which doesn't care NULL.
This driver uses IS_ERR_OR_NULL() instead of IS_ERR() for clk check.
And it uses ERR_CAST() to clarify null_clk error.
One concern here is that it unconditionally uses null_clk if clk_a/b/c/i
was error. It is correct if it doesn't exist, but is not correct if it
returns error even though it exist.
It needs to check "clock-names" from DT before calling devm_clk_get() to
handling such case. But let's assume it is overkill so far.
Link: https://lore.kernel.org/r/YMCmhfQUimHCSH/n@mwanda
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87v940wyf9.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Don't populate arrays on the stack but instead them static const.
Makes the object code smaller by 48 bytes.
Before:
text data bss dec hex filename
20938 916 104 21958 55c6 ./sound/soc/sh/rcar/core.o
After:
text data bss dec hex filename
20890 916 104 21910 5596 ./sound/soc/sh/rcar/core.o
gcc version 11.1.0)
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87tujkwydx.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This series introduces the support for two new mlx5 features:
1) Sample offload for tunneled traffic
2) devlink rate objects support
1) From Chris Mi: Sample offload for tunneled traffic
=====================================================
Background and solution
-----------------------
Currently the sample offload actions send the encapsulated packet
to software. This series de-capsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.
If de-capsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.
Post action infrastructure
--------------------------
Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flow table.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.
Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.
This series also introduces post action infrastructure. Both ct and
sample use it.
Sample for tunnel in TC SW
--------------------------
tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \
flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \
enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \
enc_dst_port 4789 enc_key_id 4 \
action sample rate 1 group 6 \
action tunnel_key unset \
action mirred egress redirect dev enp4s0f0_1
MLX5 sample HW offload
----------------------
For the following typical flow table:
+-------------------------------+
+ original flow table +
+-------------------------------+
+ original match +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+
We translate the tc filter with sample action to the following HW model:
+---------------------+
+ original flow table +
+---------------------+
+ original match +
+---------------------+
| set fte_id (if reg_c preserve cap)
| do decap
v
+------------------------------------------------+
+ Flow Sampler Object +
+------------------------------------------------+
+ sample ratio +
+------------------------------------------------+
+ sample table id | default table id +
+------------------------------------------------+
| |
v v
+-----------------------------+ +-------------------+
+ sample table + + default table +
+-----------------------------+ +-------------------+
+ forward to management vport + |
+-----------------------------+ |
+-------+------+
| |reg_c preserve cap
| |or decap action
v v
+-----------------+ +-------------+
+ per vport table + + post action +
+-----------------+ +-------------+
+ original match +
+-----------------+
+ other actions +
+-----------------+
2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
=======================================================================
HIGH-LEVEL OVERVIEW
Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
in switchdev mode on devlink port registration.
Implement devlink ops callbacks to create/destroy rate groups, set TX
rate values of the vport/group, assign vport to the group.
Driver accepts TX rate values as fraction of 1Mbps.
Refactor existing eswitch QoS infrastructure to be accessible by legacy
NDO rate API and new devlink rate API. NDO rate API is not
removed/disabled in switchdev mode to not break existing users. Rate
values configured with NDO rate API are not visible for devlink
infrastructure, therefore APIs should not be used simultaneously.
IMPLEMENTATION DETAILS
Driver provide two level rate hierarchy to manage bandwidth - group
level and vport level. Initially each vport added to internal unlimited
group created by default. Each rate element (vport or group) receive
bandwidth relative to its parent element (for groups the parent is a
physical link itself) in a Round Robin manner, where element get
bandwidth value according to its weight. Example:
Created four rate groups with tx_share limits:
$ devlink port function rate add \
pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_4 tx_share 10gbit
Weights created in HW for each group are relative to the bigest tx_share
value, which is 30gbit:
<group_1> 1.0
<group_2> 0.67
<group_3> 0.67
<group_4> 0.33
Assuming link speed is 50 Gbit/sec and each group can sustain such
amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
= ~18.75 Gbit/sec. Normilized bandwidth values for groups:
<group_1> 18.75 * 1.0 = 18.75 Gbit/sec
<group_2> 18.75 * 0.67 = 12.5 Gbit/sec
<group_3> 18.75 * 0.67 = 12.5 Gbit/sec
<group_4> 18.75 * 0.33 = 6.25 Gbit/sec
If in example above group_1 doesn't produce any traffic, then maximum
bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
values:
<group_2> 30.0 * 0.67 = 20.0 Gbit/sec
<group_3> 30.0 * 0.67 = 20.0 Gbit/sec
<group_4> 30.0 * 0.33 = 10.0 Gbit/sec
Same normalization applied to each vport in the group.
Normalized values are internal, therefore driver provides QoS
tracepoints for next events:
* vport rate element creation/deletion:
* vport rate element configuration;
* group rate element creation/deletion;
* group rate element configuration.
PATCHES OVERVIEW
1 - Moving and isolation of eswitch QoS logic in separate file;
2 - Implement devlink leaf rate object support for vports;
3 - Implement rate groups creation/deletion;
4 - Implement TX rate management for the groups;
5 - Implement parent set for vports;
6 - Eswitch QoS tracepoints.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEfNKEACgkQSD+KveBX
+j4DVgf/ZTX3n7xDVrqNgM3hkUOT7QKVCq5zUlDKw1IizE6I+8xB6LO3KmUPyWIn
+VBXE3c+aqUNZu4XlqdkVn1JskVdfTdAGXIIeBgMQskJ/VFCNqU4E9uieEmpiHnI
DUUkEI6eBURJSu1KPD0xdMqtdGJE+/KjwmfZFnCsa4uxmRuV7B0BdxzGIA6AMFKn
+jNS/PFbQM6bUDqP2UUwd97sThtTzDIVH86gu36yK/mdcwdLreqKeuxoHJWePWHC
qLReBC5OQ9zXD2F1Dv2u3WU7EJT7qyCLNrBUrTwHcR9N0Di+2a6lGvjRL5tjWKKC
KOrNkkviurmPp+VieJU+rHHYQwjoGQ==
=XfOY
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-08-19
This series introduces the support for two new mlx5 features:
1) Sample offload for tunneled traffic
2) devlink rate objects support
1) From Chris Mi: Sample offload for tunneled traffic
=====================================================
Background and solution
-----------------------
Currently the sample offload actions send the encapsulated packet
to software. This series de-capsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.
If de-capsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.
Post action infrastructure
--------------------------
Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flow table.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.
Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.
This series also introduces post action infrastructure. Both ct and
sample use it.
Sample for tunnel in TC SW
--------------------------
tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \
flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \
enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \
enc_dst_port 4789 enc_key_id 4 \
action sample rate 1 group 6 \
action tunnel_key unset \
action mirred egress redirect dev enp4s0f0_1
MLX5 sample HW offload
----------------------
For the following typical flow table:
+-------------------------------+
+ original flow table +
+-------------------------------+
+ original match +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+
We translate the tc filter with sample action to the following HW model:
+---------------------+
+ original flow table +
+---------------------+
+ original match +
+---------------------+
| set fte_id (if reg_c preserve cap)
| do decap
v
+------------------------------------------------+
+ Flow Sampler Object +
+------------------------------------------------+
+ sample ratio +
+------------------------------------------------+
+ sample table id | default table id +
+------------------------------------------------+
| |
v v
+-----------------------------+ +-------------------+
+ sample table + + default table +
+-----------------------------+ +-------------------+
+ forward to management vport + |
+-----------------------------+ |
+-------+------+
| |reg_c preserve cap
| |or decap action
v v
+-----------------+ +-------------+
+ per vport table + + post action +
+-----------------+ +-------------+
+ original match +
+-----------------+
+ other actions +
+-----------------+
2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
=======================================================================
HIGH-LEVEL OVERVIEW
Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
in switchdev mode on devlink port registration.
Implement devlink ops callbacks to create/destroy rate groups, set TX
rate values of the vport/group, assign vport to the group.
Driver accepts TX rate values as fraction of 1Mbps.
Refactor existing eswitch QoS infrastructure to be accessible by legacy
NDO rate API and new devlink rate API. NDO rate API is not
removed/disabled in switchdev mode to not break existing users. Rate
values configured with NDO rate API are not visible for devlink
infrastructure, therefore APIs should not be used simultaneously.
IMPLEMENTATION DETAILS
Driver provide two level rate hierarchy to manage bandwidth - group
level and vport level. Initially each vport added to internal unlimited
group created by default. Each rate element (vport or group) receive
bandwidth relative to its parent element (for groups the parent is a
physical link itself) in a Round Robin manner, where element get
bandwidth value according to its weight. Example:
Created four rate groups with tx_share limits:
$ devlink port function rate add \
pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_4 tx_share 10gbit
Weights created in HW for each group are relative to the bigest tx_share
value, which is 30gbit:
<group_1> 1.0
<group_2> 0.67
<group_3> 0.67
<group_4> 0.33
Assuming link speed is 50 Gbit/sec and each group can sustain such
amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
= ~18.75 Gbit/sec. Normilized bandwidth values for groups:
<group_1> 18.75 * 1.0 = 18.75 Gbit/sec
<group_2> 18.75 * 0.67 = 12.5 Gbit/sec
<group_3> 18.75 * 0.67 = 12.5 Gbit/sec
<group_4> 18.75 * 0.33 = 6.25 Gbit/sec
If in example above group_1 doesn't produce any traffic, then maximum
bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
values:
<group_2> 30.0 * 0.67 = 20.0 Gbit/sec
<group_3> 30.0 * 0.67 = 20.0 Gbit/sec
<group_4> 30.0 * 0.33 = 10.0 Gbit/sec
Same normalization applied to each vport in the group.
Normalized values are internal, therefore driver provides QoS
tracepoints for next events:
* vport rate element creation/deletion:
* vport rate element configuration;
* group rate element creation/deletion;
* group rate element configuration.
PATCHES OVERVIEW
1 - Moving and isolation of eswitch QoS logic in separate file;
2 - Implement devlink leaf rate object support for vports;
3 - Implement rate groups creation/deletion;
4 - Implement TX rate management for the groups;
5 - Implement parent set for vports;
6 - Eswitch QoS tracepoints.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* kvm-arm64/pkvm-fixed-features-prologue:
: Rework a bunch of common infrastructure as a prologue
: to Fuad Tabba's protected VM fixed feature series.
KVM: arm64: Upgrade trace_kvm_arm_set_dreg32() to 64bit
KVM: arm64: Add config register bit definitions
KVM: arm64: Add feature register flag definitions
KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch
KVM: arm64: Keep mdcr_el2's value as set by __init_el2_debug
KVM: arm64: Restore mdcr_el2 from vcpu
KVM: arm64: Refactor sys_regs.h,c for nVHE reuse
KVM: arm64: Fix names of config register fields
KVM: arm64: MDCR_EL2 is a 64-bit register
KVM: arm64: Remove trailing whitespace in comment
KVM: arm64: placeholder to check if VM is protected
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/mmu/vmid-cleanups:
: Cleanup the stage-2 configuration by providing a single helper,
: and tidy up some of the ordering requirements for the VMID
: allocator.
KVM: arm64: Upgrade VMID accesses to {READ,WRITE}_ONCE
KVM: arm64: Unify stage-2 programming behind __load_stage2()
KVM: arm64: Move kern_hyp_va() usage in __load_guest_stage2() into the callers
Signed-off-by: Marc Zyngier <maz@kernel.org>
Switch KVM/arm64 to the generic entry code, courtesy of Oliver Upton
* kvm-arm64/generic-entry:
KVM: arm64: Use generic KVM xfer to guest work function
entry: KVM: Allow use of generic KVM entry w/o full generic support
KVM: arm64: Record number of signal exits as a vCPU stat
Signed-off-by: Marc Zyngier <maz@kernel.org>
PSCI fixes from Oliver Upton:
- Plug race on reset
- Ensure that a pending reset is applied before userspace accesses
- Reject PSCI requests with illegal affinity bits
* kvm-arm64/psci/cpu_on:
selftests: KVM: Introduce psci_cpu_on_test
KVM: arm64: Enforce reserved bits for PSCI target affinities
KVM: arm64: Handle PSCI resets before userspace touches vCPU state
KVM: arm64: Fix read-side race on updates to vcpu reset state
Signed-off-by: Marc Zyngier <maz@kernel.org>
- Add support for Foxconn Mediatek Chip
- Add support for LG LGSBWAC92/TWCM-K505D
- hci_h5 flow control fixes and suspend support
- Switch to use lock_sock for SCO and RFCOMM
- Various fixes for extended advertising
- Reword Intel's setup on btusb unifying the supported generations
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmEe1xEZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKX/YEACMlYxmWJn2birrH5h4c+FA
6hzoDw+Kp+/Qo0FYPgWw6ady+cKuh50itKz8W050JR+n9eVdRehZ3Rlr/Yv2ol51
TSTjRKPbeDmtkGzC9h+dVBgkEERF88mF8FZiFXp+9vG/dfS4Lq2WdWzEFuYmfZyD
ZMuI9PsepmprORVI37B1WjZfdUo2XeA9ZKHUVSesgarNg55mZ4T/WEFnEc8KH2rX
HiqAeX+H2lt38ZEru7l5Jp6mNnzJJKLcnFjWMHXia865B8dHqC++goMXdJ8Tqcm8
NLs2W1RZgZocVwovwQ17bTiu41VnN7LdVpCig5RGcn1YtQUPcYzqBI971ixQCJUN
7vjqyMV3i+nLLD3FZmD+qYMYH/M2LaLH6fbaN0KBDlElCDHT7/Qu9N2nGreyiqKc
uuEXVHbGou3sj/LkBpNKJOGtmNkUo0XN93/giu89ZHGc7BLN1tUJM9NYWaiO1TcD
YiD0LO/lqmggCs9SQH0DBTUDNZ1vUDOzmVeD/tu/NqnixzSMseyqeThshZhxz6UT
7fBXvwixl+AhrN2lIxmS4WAtEwOPvaayUW8af7kESlrC4RoFvq+QaghT1D4NDpcq
llYlg/gt97Wy3AnIsnvEjd0s+lxGN6byIOBgTfC4jAfPAYA4oxd7N+1vMPFyChUV
MwatwE+IE1u2hjQWhMhVuA==
=INb2
-----END PGP SIGNATURE-----
Merge tag 'for-net-next-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Add support for Foxconn Mediatek Chip
- Add support for LG LGSBWAC92/TWCM-K505D
- hci_h5 flow control fixes and suspend support
- Switch to use lock_sock for SCO and RFCOMM
- Various fixes for extended advertising
- Reword Intel's setup on btusb unifying the supported generations
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent kmemleak from peeking into the HYP data, which is fatal
in protected mode.
* kvm-arm64/mmu/kmemleak-pkvm:
KVM: arm64: Unregister HYP sections from kmemleak in protected mode
arm64: Move .hyp.rodata outside of the _sdata.._edata range
Signed-off-by: Marc Zyngier <maz@kernel.org>