Commit graph

1105317 commits

Author SHA1 Message Date
Mark Brown
795dd8d3b8
Clean up usage of the endianness flag
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>:

Before componentisation any part registered as a CODEC would have
automatically supported both little and big endian, ie. the core
would duplicate any supported LE or BE PCM format to support the other
endian as well. As componentisation removed the distinction between
CODEC drivers and platform drivers, a flag was added to specify
if this behaviour is required for a particular component. However,
as most systems tend to use little endian the absence of the flag
is rarely noticed. Also the naming of the flag "endianness" is a
little unobvious as to if it should be applied to a particular
component.

This series adds a comment to better explain the meaning of the
flag and then tidys up the usage of the flag. A couple of uses
of the flag are removed where is has been used inappropriately
on the CPU side of the DAI link, this is clearly not valid in the
cases it has been used, and I suspect never would be valid. Then
some redundant formats are removed, since they would be covered by
existing endianness flags. And finally a bunch of devices that are
missing the flag have it added.

It is worth noting that since componenisation there are now a couple
of cases where it is not entire clear to me that the flag should
be applied to all CODECs as it was before. In those cases I haven't
updated the driver to add the flag and they are outlined here:

1) Build into the AP CODECs, these are actual silicon inside the main
processor and they typically receive audio directly from an internal
bus. It is not obvious to me that these can happily ignore endian. On
the CODEC side these include: jz4725b.c, jz4760.c, jz4770.c,
rk3328_codec.c, lpass-va-macro.c, lpass-rx-macro.c, lpass-tx-macro.c,
lpass-wsa-macro.c. There are also some examples of this scattered
around the various platform support directories in sound/soc.

2) Devices behind non-audio buses, SPI just moves bits and doesn't
really define an endian for audio data on the bus. Thus it seems the
CODEC probably can care about the endian. The only devices that fall
into this group (mostly for AoV) are: rt5514-spi.c, rt5677-spi.c,
cros_ec_codec.c (only the AoV).

3) CODECs with no DAIs, these could specify the flag and plenty of
them do; CODECs from the initial conversion to componentisation. But
the flag makes no difference here since there is nothing for it to
apply to. This includes purely analogue CODECs: aw8738.c, ssm2305.c,
tpa6130a2.c, tda7419.c, max9759.c, max9768.c, max9877.c, lm4857.c,
simple-mux.c, simple-amplifier.c. And devices that only do jack
detection: ts3a227e.c, mt6359-accdet.c.

If there are any opinions on adding the flag to any of those three
groups they would be greatfully received. But I am leaning towards
leaving 1,2 without endianness flags since it feels inappropriate,
and removing the endian flag from devices in catagory 3 that already
have it. Assuming no one objects to that I will do a follow up
series for that.
2022-05-10 12:12:50 +01:00
Joey Gouly
205f3991a2 arm64: vdso: fix makefile dependency on vdso.so
There is currently no dependency for vdso*-wrap.S on vdso*.so, which means that
you can get a build that uses a stale vdso*-wrap.o.

In commit a5b8ca97fb, the file that includes the vdso.so was moved and renamed
from arch/arm64/kernel/vdso/vdso.S to arch/arm64/kernel/vdso-wrap.S, when this
happened the Makefile was not updated to force the dependcy on vdso.so.

Fixes: a5b8ca97fb ("arm64: do not descend to vdso directories twice")
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220510102721.50811-1-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-05-10 12:08:40 +01:00
Robin Murphy
628bf55b62 iommu/arm-smmu: Force identity domains for legacy binding
When using the legacy "mmu-masters" DT binding, we reject DMA domains
since we have no guarantee of driver probe order and thus can't rely on
client drivers getting the correct DMA ops. However, we can do better
than fall back to the old no-default-domain behaviour now, by forcing an
identity default domain instead. This also means that detaching from a
VFIO domain can actually work - that looks to have been broken for over
6 years, so clearly isn't something that legacy binding users care
about, but we may as well make the driver code make sense anyway.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9805e4c492cb972bdcdd57999d2d001a2d8b5aab.1652171938.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-05-10 12:01:31 +01:00
Vincent Whitchurch
f7b6fc3273 mmc: core: Support zeroout using TRIM for eMMC
If an eMMC card supports TRIM and indicates that it erases to zeros, we can
use it to support hardware offloading of REQ_OP_WRITE_ZEROES, so let's add
support for this.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Link: https://lore.kernel.org/r/20220429152118.3617303-1-vincent.whitchurch@axis.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-05-10 12:51:32 +02:00
Kees Cook
dc5306a8c0 decnet: Use container_of() for struct dn_neigh casts
Clang's structure layout randomization feature gets upset when it sees
struct neighbor (which is randomized) cast to struct dn_neigh:

net/decnet/dn_route.c:1123:15: error: casting from randomized structure pointer type 'struct neighbour *' to 'struct dn_neigh *'
			gateway = ((struct dn_neigh *)neigh)->addr;
				   ^

Update all the open-coded casts to use container_of() to do the conversion
instead of depending on strict member ordering.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202205041247.WKBEHGS5-lkp@intel.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Yajun Deng <yajun.deng@linux.dev>
Cc: Zheng Yongjun <zhengyongjun3@huawei.com>
Cc: Bill Wendling <morbo@google.com>
Cc: linux-decnet-user@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220508102217.2647184-1-keescook@chromium.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 12:21:51 +02:00
Jing Xia
846a3351dd writeback: Avoid skipping inode writeback
We have run into an issue that a task gets stuck in
balance_dirty_pages_ratelimited() when perform I/O stress testing.
The reason we observed is that an I_DIRTY_PAGES inode with lots
of dirty pages is in b_dirty_time list and standard background
writeback cannot writeback the inode.
After studing the relevant code, the following scenario may lead
to the issue:

task1                                   task2
-----                                   -----
fuse_flush
 write_inode_now //in b_dirty_time
  writeback_single_inode
   __writeback_single_inode
                                 fuse_write_end
                                  filemap_dirty_folio
                                   __xa_set_mark:PAGECACHE_TAG_DIRTY
    lock inode->i_lock
    if mapping tagged PAGECACHE_TAG_DIRTY
    inode->i_state |= I_DIRTY_PAGES
    unlock inode->i_lock
                                   __mark_inode_dirty:I_DIRTY_PAGES
                                      lock inode->i_lock
                                      -was dirty,inode stays in
                                      -b_dirty_time
                                      unlock inode->i_lock

   if(!(inode->i_state & I_DIRTY_All))
      -not true,so nothing done

This patch moves the dirty inode to b_dirty list when the inode
currently is not queued in b_io or b_more_io list at the end of
writeback_single_inode.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
Signed-off-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220510023514.27399-1-jing.xia@unisoc.com
2022-05-10 12:04:21 +02:00
Colin Ian King
ecd17a87eb x25: remove redundant pointer dev
Pointer dev is being assigned a value that is never used, the assignment
and the variable are redundant and can be removed. Also replace null check
with the preferred !ptr idiom.

Cleans up clang scan warning:
net/x25/x25_proc.c:94:26: warning: Although the value stored to 'dev' is
used in the enclosing expression, the value is never actually read
from 'dev' [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220508214500.60446-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 11:59:22 +02:00
Oliver Upton
249838b766 KVM: arm64: pkvm: Don't mask already zeroed FEAT_SVE
FEAT_SVE is already masked by the fixed configuration for
ID_AA64PFR0_EL1; don't try and mask it at runtime.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220509162559.2387784-3-oupton@google.com
2022-05-10 10:58:43 +01:00
Oliver Upton
4d2e469e16 KVM: arm64: pkvm: Drop unnecessary FP/SIMD trap handler
The pVM-specific FP/SIMD trap handler just calls straight into the
generic trap handler. Avoid the indirection and just call the hyp
handler directly.

Note that the BUILD_BUG_ON() pattern is repeated in
pvm_init_traps_aa64pfr0(), which is likely a better home for it.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220509162559.2387784-2-oupton@google.com
2022-05-10 10:58:42 +01:00
Paolo Abeni
a12af6f860 Merge branch 'this-is-a-patch-series-for-ethernet-driver-of-sunplus-sp7021-soc'
Wells Lu says:

====================
This is a patch series for Ethernet driver of Sunplus SP7021 SoC.

Sunplus SP7021 is an ARM Cortex A7 (4 cores) based SoC. It integrates
many peripherals (ex: UART, I2C, SPI, SDIO, eMMC, USB, SD card and
etc.) into a single chip. It is designed for industrial control
applications.

Refer to:
https://sunplus.atlassian.net/wiki/spaces/doc/overview
https://tibbo.com/store/plus1.html
====================

Link: https://lore.kernel.org/r/1652004800-3212-1-git-send-email-wellslutw@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 11:31:34 +02:00
Wells Lu
fd3040b939 net: ethernet: Add driver for Sunplus SP7021
Add driver for Sunplus SP7021 SoC.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 11:31:32 +02:00
Wells Lu
0cfeca62b5 devicetree: bindings: net: Add bindings doc for Sunplus SP7021.
Add bindings documentation for Sunplus SP7021 SoC.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 11:31:32 +02:00
Charan Teja Reddy
ef3a6b7050 dma-buf: call dma_buf_stats_setup after dmabuf is in valid list
When dma_buf_stats_setup() fails, it closes the dmabuf file which
results into the calling of dma_buf_file_release() where it does
list_del(&dmabuf->list_node) with out first adding it to the proper
list. This is resulting into panic in the below path:
__list_del_entry_valid+0x38/0xac
dma_buf_file_release+0x74/0x158
__fput+0xf4/0x428
____fput+0x14/0x24
task_work_run+0x178/0x24c
do_notify_resume+0x194/0x264
work_pending+0xc/0x5f0

Fix it by moving the dma_buf_stats_setup() after dmabuf is added to the
list.

Fixes: bdb8d06dfe ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com>
Tested-by: T.J. Mercier <tjmercier@google.com>
Acked-by: T.J. Mercier <tjmercier@google.com>
Cc: <stable@vger.kernel.org> # 5.15.x+
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1652125797-2043-1-git-send-email-quic_charante@quicinc.com
2022-05-10 11:30:51 +02:00
Wan Jiabing
85f5b3c437 cpufreq: mediatek: Fix potential deadlock problem in mtk_cpufreq_set_target
Fix following coccichek error:
./drivers/cpufreq/mediatek-cpufreq.c:199:2-8: preceding lock on line
./drivers/cpufreq/mediatek-cpufreq.c:208:2-8: preceding lock on line

mutex_lock is acquired but not released before return.
Use 'goto out' to help releasing the mutex_lock.

Fixes: c210063b40 ("cpufreq: mediatek: Add opp notification support")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-05-10 14:40:56 +05:30
Ravi Bangoria
3d47083b9f perf/amd/ibs: Use interrupt regs ip for stack unwinding
IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220429051441.14251-1-ravi.bangoria@amd.com
2022-05-10 11:00:45 +02:00
Manuel Ullmann
1809c30b6e net: atlantic: always deep reset on pm op, fixing up my null deref regression
The impact of this regression is the same for resume that I saw on
thaw: the kernel hangs and nothing except SysRq rebooting can be done.

Fixes regression in commit cbe6c3a8f8 ("net: atlantic: invert deep
par in pm functions, preventing null derefs"), where I disabled deep
pm resets in suspend and resume, trying to make sense of the
atl_resume_common() deep parameter in the first place.

It turns out, that atlantic always has to deep reset on pm
operations. Even though I expected that and tested resume, I screwed
up by kexec-rebooting into an unpatched kernel, thus missing the
breakage.

This fixup obsoletes the deep parameter of atl_resume_common, but I
leave the cleanup for the maintainers to post to mainline.

Suspend and hibernation were successfully tested by the reporters.

Fixes: cbe6c3a8f8 ("net: atlantic: invert deep par in pm functions, preventing null derefs")
Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
Reported-by: Jordan Leppert <jordanleppert@protonmail.com>
Reported-by: Holger Hoffstaette <holger@applied-asynchrony.com>
Tested-by: Jordan Leppert <jordanleppert@protonmail.com>
Tested-by: Holger Hoffstaette <holger@applied-asynchrony.com>
CC: <stable@vger.kernel.org> # 5.10+
Signed-off-by: Manuel Ullmann <labre@posteo.de>
Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 10:29:54 +02:00
Hyeonggon Yoo
f8d9f46e87 MAINTAINERS: add myself as reviewer for slab
Recently I was involved in slab subsystem (reviewing struct slab,
SLUB debugfs and etc). I would like to help maintainers and people
working on slab allocators by reviewing and testing their work.

Let me be Cc'd on patches related to slab.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220507073506.241963-1-42.hyeyoo@gmail.com
2022-05-10 10:08:16 +02:00
Matthew Gerlach
ae23f746d7
fpga: dfl: Allow Port to be linked to FME's DFL
Currently we use PORTn_OFFSET to locate PORT DFLs, and PORT DFLs are not
connected FME DFL. But for some cases (e.g. Intel Open FPGA Stack device),
PORT DFLs are connected to FME DFL directly, so we don't need to search
PORT DFLs via PORTn_OFFSET again. If BAR value of PORTn_OFFSET is 0x7
(FME_PORT_OFST_BAR_SKIP) then driver will skip searching the DFL for that
port. If BAR value is invalid, return -EINVAL.

Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Link: https://lore.kernel.org/r/20220505100617.703672-1-tianfei.zhang@intel.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
2022-05-10 16:05:38 +08:00
Tianfei zhang
2b28c9e0fe
Documentation: fpga: dfl: add link address of feature id table
This patch adds the link address of feature id table in documentation.

Signed-off-by: Tianfei zhang <tianfei.zhang@intel.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Acked-by: Moritz Fischer <mdf@kernel.org>
Acked-by: Wu Hao <hao.wu@intel.com>
Link: https://lore.kernel.org/r/20220419032942.427429-3-tianfei.zhang@intel.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
2022-05-10 16:05:27 +08:00
Tianfei zhang
88b3f3ff38
fpga: dfl: check feature type before parse irq info
Previously the feature IDs defined are unique, no matter
which feature type. But currently we want to extend its
usage to have a per-type feature ID space, so this patch
adds feature type checking as well just before look into
feature ID for different features which have irq info.

Signed-off-by: Tianfei zhang <tianfei.zhang@intel.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Acked-by: Moritz Fischer <mdf@kernel.org>
Link: https://lore.kernel.org/r/20220419032942.427429-2-tianfei.zhang@intel.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
2022-05-10 16:05:15 +08:00
Nava kishore Manne
838a84382a
fpga: fpga-region: fix kernel-doc formatting issues
To fix below kernel-doc warnings this patch does the following
->Replaced Return\Returns with 'Return:' keyword.
->Added 'Return' description For __init of_fpga_region_init()' API.
->Added description for 'child_regions_with_firmware()' API.

warning: No description found for return value of
'of_fpga_region_find'.
warning: No description found for return value of
'of_fpga_region_get_bridges'.
warning: missing initial short description on line:
* child_regions_with_firmware
warning: No description found for return value of
'child_regions_with_firmware'.
warning: No description found for return value of
'of_fpga_region_notify_pre_apply'.
warning: No description found for return value of
'of_fpga_region_notify'.
warning: No description found for return value of
'of_fpga_region_init'.

Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-6-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
2022-05-10 16:05:00 +08:00
Nava kishore Manne
baf7d27d03
fpga: Use tab instead of space indentation
In FPGA Makefile has both space and tab indentation, to
make them align use tab instead of space indentation.

Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-5-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
2022-05-10 16:04:50 +08:00
Nava kishore Manne
3f3f9cb67f
fpga: fpga-mgr: fix kernel-doc warnings
warnings: No description found for return value of 'xxx'

In-order to fix the above kernel-doc warnings added the
'Return' description for 'devm_fpga_mgr_register_full()'
and 'devm_fpga_mgr_register()' APIs.

Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-4-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
2022-05-10 16:04:11 +08:00
Nava kishore Manne
57ce2e406f
fpga: fix for coding style issues
fixes the below checks reported by checkpatch.pl:
- Lines should not end with a '('
- Alignment should match open parenthesis

Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-3-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
2022-05-10 16:03:52 +08:00
Xiubo Li
642d51fb07 ceph: check folio PG_private bit instead of folio->private
The pages in the file mapping maybe reclaimed and reused by other
subsystems and the page->private maybe used as flags field or
something else, if later that pages are used by page caches again
the page->private maybe not cleared as expected.

Here will check the PG_private bit instead of the folio->private.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-10 09:48:31 +02:00
Jeff Layton
620239d9a3 ceph: fix setting of xattrs on async created inodes
Currently when we create a file, we spin up an xattr buffer to send
along with the create request. If we end up doing an async create
however, then we currently pass down a zero-length xattr buffer.

Fix the code to send down the xattr buffer in req->r_pagelist. If the
xattrs span more than a page, however give up and don't try to do an
async create.

Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=2063929
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Reported-by: John Fortin <fortinj66@gmail.com>
Reported-by: Sri Ramanujam <sri@ramanujam.io>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-10 09:48:31 +02:00
Paolo Abeni
827634531e Merge branch 'ptp-support-hardware-clocks-with-additional-free-running-cycle-counter'
Gerhard Engleder says:

====================
ptp: Support hardware clocks with additional free running cycle counter

ptp vclocks require a clock with free running time for the timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.

If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized while
vclocks are in use.

The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.

One year ago I thought for two time domains within a TSN network also
two physical clocks are required. This would lead to new kernel
interfaces for asking for the second clock, ... . But actually for a
time triggered system like TSN there can be only one time domain that
controls the system itself. All other time domains belong to other
layers, but not to the time triggered system itself. So other time
domains can be based on a free running counter if similar mechanisms
like 2 step synchroisation are used.

Synchronisation was tested with two time domains between two directly
connected hosts. Each host run two ptp4l instances, the first used the
physical clock and the second used the virtual clock. I used my FPGA
based network controller as network device. ptp4l was used in
combination with the virtual clock support patches from Miroslav
Lichvar.

v4:
- if_index of 0 is invalid (Jonathan Lemon)
- set if_index to 0 in the SOF_TIMESTAMPING_RAW_HARDWARE block (Jonathan
  Lemon)
- add helper function for netdev_get_tstamp() call (Jonathan Lemon)
- update SKBTX_ANY_TSTAMP (Paolo Abeni)
- use separate bits for new tx_flags (Richard Cochran)

v3:
- optimize ptp_convert_timestamp (Richard Cochran)
- call dev_get_by_napi_id() only if needed (Richard Cochran)
- use non-negated logical test (Richard Cochran)
- add comment for skipped output (Richard Cochran)
- add comment for SKBTX_HW_TSTAMP_USE_CYCLES masking (Richard Cochran)

v2:
- rename ptp_clock cycles to has_cycles (Richard Cochran)
- call it free running cycle counter (Richard Cochran)
- update struct skb_shared_hwtstamps kdoc (Richard Cochran)
- optimize timestamp address/cookie processing path (Richard Cochran,
  Vinicius Costa Gomes)

v1:
- complete rework based on suggestions (Richard Cochran)
====================

Link: https://lore.kernel.org/r/20220506200142.3329-1-gerhard@engleder-embedded.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:48:11 +02:00
Gerhard Engleder
0abb62b682 tsnep: Add free running cycle counter support
The TSN endpoint Ethernet MAC supports a free running counter
additionally to its clock. This free running counter can be read and
hardware timestamps are supported. As the name implies, this counter
cannot be set and its frequency cannot be adjusted.

Add free running cycle counter support based on this free running
counter to physical clock. This also requires hardware time stamps
based on that free running counter.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:48:09 +02:00
Gerhard Engleder
fcf308e509 ptp: Speed up vclock lookup
ptp_convert_timestamp() is called in the RX path of network messages.
The current implementation takes ~5000ns on 1.2GHz A53. This is too much
for the hot path of packet processing.

Introduce hash table for fast vclock lookup in ptp_convert_timestamp().
The execution time of ptp_convert_timestamp() is reduced to ~700ns on
1.2GHz A53.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:48:09 +02:00
Gerhard Engleder
97dc7cd92a ptp: Support late timestamp determination
If a physical clock supports a free running cycle counter, then
timestamps shall be based on this time too. For TX it is known in
advance before the transmission if a timestamp based on the free running
cycle counter is needed. For RX it is impossible to know which timestamp
is needed before the packet is received and assigned to a socket.

Support late timestamp determination by a network device. Therefore, an
address/cookie is stored within the new netdev_data field of struct
skb_shared_hwtstamps. This address/cookie is provided to a new network
device function called ndo_get_tstamp(), which returns a timestamp based
on the normal/adjustable time or based on the free running cycle
counter. If function is not supported, then timestamp handling is not
changed.

This mechanism is intended for RX, but TX use is also possible.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:48:08 +02:00
Gerhard Engleder
d58809d854 ptp: Pass hwtstamp to ptp_convert_timestamp()
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.

Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:48:08 +02:00
Gerhard Engleder
51eb7492af ptp: Request cycles for TX timestamp
The free running cycle counter of physical clocks called cycles shall be
used for hardware timestamps to enable synchronisation.

Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
provide a TX timestamp based on cycles if cycles are supported.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:48:08 +02:00
Gerhard Engleder
42704b26b0 ptp: Add cycles support for virtual clocks
ptp vclocks require a free running time for their timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.

If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized
while vclocks are in use.

The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.

Introduce support for a free running cycle counter called cycles to
physical clocks. Rework ptp vclocks to use this free running cycle
counter. Default implementation is based on time of physical clock.
Thus, behavior of ptp vclocks based on physical clocks without free
running cycle counter is identical to previous behavior.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:48:08 +02:00
Jakub Kicinski
b3552d6a3b eth: dpaa2-mac: remove a dead-code NULL check on fwnode parent
Since commit 4e30e98c4b ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set")
@parent can't be NULL after the if. It's either the address
of the ->fwnode of @dpmacs or @fwnode in case of ACPI.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220506200029.852310-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10 09:19:56 +02:00
Mark Bloch
7f46a0b732 net/mlx5: Lag, add debugfs to query hardware lag state
Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.

Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag

For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag

The following files are exposed:

- state: Returns "active" or "disabled". If "active" it means hardware
         lag is active.

- members: Returns the BDFs of all the members of lag object.

- type: Returns the type of the lag currently configured. Valid only
	if hardware lag is active.
	* "roce" - Members are bare metal PFs.
	* "switchdev" - Members are in switchdev mode.
	* "multipath" - ECMP offloads.

- port_sel_mode: Returns the egress port selection method, valid
		 only if hardware lag is active.
		 * "queue_affinity" - Egress port is selected by
		   the QP/SQ affinity.
		 * "hash" - Egress port is selected by hash done on
		   each packet. Controlled by: xmit_hash_policy of the
		   bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
	 hardware lag is active.
	 * "shared_fdb" - "on" or "off", if "on" single FDB is used.

- mapping: Returns the mapping which is used to select egress port.
	   Valid only if hardware lag is active.
	   If @port_sel_mode is "hash" returns the active egress ports.
	   The hash result will select only active ports.
	   if @port_sel_mode is "queue_affinity" returns the mapping
	   between the configured port affinity of the QP/SQ and actual
	   egress port. For example:
	   * 1:1 - Mapping means if the configured affinity is port 1
	           traffic will egress via port 1.
	   * 1:2 - Mapping means if the configured affinity is port 1
		   traffic will egress via port 2. This can happen
		   if port 1 is down or in active/backup mode and port 1
		   is backup.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:04 -07:00
Mark Bloch
352899f384 net/mlx5: Lag, use buckets in hash mode
When in hardware lag and the NIC has more than 2 ports when one port
goes down need to distribute the traffic between the remaining
active ports.

For better spread in such cases instead of using 1-to-1 mapping and only
4 slots in the hash, use many.

Each port will have many slots that point to it. When a port goes down
go over all the slots that pointed to that port and spread them between
the remaining active ports. Once the port comes back restore the default
mapping.

We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots.
Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port.
The native mapping is such that:

port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1]
which means if a packet is hased into one of this slots it will hit the
wire via port 1.

port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2]
which means if a packet is hased into one of this slots it will hit the
wire via port2.

and this mapping is the same of the rest of the ports.
On a failover, lets say port 2 goes down (port 1, 3, 4 are still up).
the new mapping for port 2 will be:

port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4]
which means the mapping was changed from the native mapping to a mapping
that consists of only the active ports.

With this if a port goes down the traffic will be split between the
active ports randomly

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:03 -07:00
Mark Bloch
24b3599eff net/mlx5: Lag, refactor dmesg print
Combine dmesg lag prints into a single function.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:03 -07:00
Mark Bloch
4cd14d44b1 net/mlx5: Support devices with more than 2 ports
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:03 -07:00
Mark Bloch
7e978e7714 net/mlx5: Lag, use actual number of lag ports
Refactor the entire lag code to use ldev->ports instead of hard-coded
defines (like MLX5_MAX_PORTS) for its operations.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:02 -07:00
Mark Bloch
cdf611d170 net/mlx5: Lag, use hash when in roce lag on 4 ports
Downstream patches will add support for lag over 4 ports.
In that mode we will only use hash as the uplink selection method.
Using hash instead of queue affinity (before this patch) offers key
advantages like:

- Align ports selection method with the method used by the bond device

- Better packets distribution where a single queue can transmit from
  multiple ports (with queue affinity a queue is bound to a single port
  regardless of the packet being sent).

- In case of failover we traffic is split between multiple ports and not
  a single one like in queue affinity.

Going forward it was decided that queue affinity will be deprecated
as using hash provides a better user experience which means on 4 ports
HCAs hash will always be used.

Future work will add hash support for 2 ports HCAs as well.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:02 -07:00
Mark Bloch
e2c45931ff net/mlx5: Lag, support single FDB only on 2 ports
E-Switch currently doesn't support more than 2 E-Switch managers
being aggregated under a single hardware lag. Have specific checks
to disallow creating lag when the code doesn't support it.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:02 -07:00
Mark Bloch
e9d5bb51c5 net/mlx5: Lag, store number of ports inside lag object
Store the number of lag ports inside the lag object. Lag object is a single
shared object managing the lag state of multiple mlx5 devices on the same
physical HCA.

Downstream patches will allow hardware lag to be created over devices with
more than 2 ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:02 -07:00
Mark Bloch
bc4c2f2e01 net/mlx5: Lag, filter non compatible devices
When search for a peer lag device we can filter based on that
device's capabilities.

Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:01 -07:00
Mark Bloch
ec2fa47d7b net/mlx5: Lag, use lag lock
Use a lag specific lock instead of depending on external locks to
synchronise the lag creation/destruction.

With this, taking E-Switch mode lock is no longer needed for syncing
lag logic.

Cleanup any dead code that is left over and don't export functions that
aren't used outside the E-Switch core code.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:01 -07:00
Mark Bloch
4202ea95a6 net/mlx5: Lag, move E-Switch prerequisite check into lag code
There is no need to expose E-Switch function for something that can be
checked with already present API inside lag code.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:01 -07:00
Mark Bloch
8a6e75e5f5 net/mlx5: devcom only supports 2 ports
Devcom API is intended to be used between 2 devices only add this
implied assumption into the code and check when it's no true.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Mark Bloch
34a30d7635 net/mlx5: Lag, expose number of lag ports
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Gavin Li
37ca95e62e net/mlx5: Increase FW pre-init timeout for health recovery
Currently, health recovery will reload driver to recover it from fatal
errors. During the driver's load process, it would wait for FW to set the
pre-init bit for up to 120 seconds, beyond this threshold it would abort
the load process. In some cases, such as a FW upgrade on the DPU, this
timeout period is insufficient, and the user has no way to recover the
host device.

To solve this issue, introduce a new FW pre-init timeout for health
recovery, which is set to 2 hours.

The timeout for devlink reload and probe will use the original one because
they are user triggered flows, and therefore should not have a
significantly long timeout, during which the user command would hang.

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Gavin Li
8324a02c34 net/mlx5: Add exit route when waiting for FW
Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.

To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Slark Xiao
13b9b814da bus: mhi: host: Add support for Foxconn T99W373 and T99W368
Product's enumeration align with previous Foxconn
SDX55, so T99W373(SDX62)/T99W368(SDX65) would use
 the same config as Foxconn SDX55.
Remove fw and edl for this new commit.

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/20220503024349.4486-1-slark_xiao@163.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2022-05-10 11:10:56 +05:30