Use proper tables and RST markup to document the atomic instructions
in a structured way.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220131183638.3934982-6-hch@lst.de
In addition to the normal 64-bit instruction encoding, eBPF also has
a single instruction that uses a second 64-bit bits for a second
immediate value. Instead of only documenting this format deep down
in the document mention it in the instruction encoding section.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220131183638.3934982-5-hch@lst.de
Use consistent terminology and structured RST elements to better document
these two oddball instructions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220131183638.3934982-4-hch@lst.de
Add a separate section and a little intro blurb for the regular load and
store instructions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220131183638.3934982-3-hch@lst.de
Pull cgroup fixes from Tejun Heo:
- Eric's fix for a long standing cgroup1 permission issue where it only
checks for uid 0 instead of CAP which inadvertently allows
unprivileged userns roots to modify release_agent userhelper
- Fixes for the fallout from Waiman's recent cpuset work
* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
cgroup-v1: Require capabilities to set release_agent
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
Alex Elder says:
====================
net: ipa: enable register retention
With runtime power management in place, we sometimes need to issue
a command to enable retention of IPA register values before power
collapse. This requires a new Device Tree property, whose presence
will also be used to signal that the command is required.
====================
Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In some cases, the IPA hardware needs to request the always-on
subsystem (AOSS) to coordinate with the IPA microcontroller to
retain IPA register values at power collapse. This is done by
issuing a QMP request to the AOSS microcontroller. A similar
request ondoes that request.
We must get and hold the "QMP" handle early, because we might get
back EPROBE_DEFER for that. But the actual request should be sent
while we know the IPA clock is active, and when we know the
microcontroller is operational.
Fixes: 1aac309d32 ("net: ipa: use autosuspend")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For some systems, the IPA driver must make a request to ensure that
its registers are retained across power collapse of the IPA hardware.
On such systems, we'll use the existence of the "qcom,qmp" property
as a signal that this request is required.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks(). It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.
Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.
Fixes: 4716909cc5 ("cpuset: Track cpusets that use parent's effective_cpus")
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
should not use fast commit log data directly, add le32_to_cpu().
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 0b5b5a62b9 ("ext4: use ext4_ext_remove_space() for fast commit replay delete range")
Cc: stable@kernel.org
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220126063146.2302-1-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add the description of @shrink and @sc in jbd2_journal_shrink_scan() and
jbd2_journal_shrink_count() kernel-doc comment to remove warnings found
by running scripts/kernel-doc, which is caused by using 'make W=1'.
fs/jbd2/journal.c:1296: warning: Function parameter or member 'shrink'
not described in 'jbd2_journal_shrink_scan'
fs/jbd2/journal.c:1296: warning: Function parameter or member 'sc' not
described in 'jbd2_journal_shrink_scan'
fs/jbd2/journal.c:1320: warning: Function parameter or member 'shrink'
not described in 'jbd2_journal_shrink_count'
fs/jbd2/journal.c:1320: warning: Function parameter or member 'sc' not
described in 'jbd2_journal_shrink_count'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220110132841.34531-1-yang.lee@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
By mistake we fail to return an error from ext4_fill_super() in case
that ext4_alloc_sbi() fails to allocate a new sbi. Instead we just set
the ret variable and allow the function to continue which will later
lead to a NULL pointer dereference. Fix it by returning -ENOMEM in the
case ext4_alloc_sbi() fails.
Fixes: cebe85d570 ("ext4: switch to the new mount api")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220119130209.40112-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
No functionality change as such in this patch. This only refactors the
common piece of code which waits for t_updates to finish into a common
function named as jbd2_journal_wait_updates(journal_t *)
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/8c564f70f4b2591171677a2a74fccb22a7b6c3a4.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Current code does not fully takes care of krealloc() error case, which
could lead to silent memory corruption or a kernel bug. This patch
fixes that.
Also it cleans up some duplicated error handling logic from various
functions in fast_commit.c file.
Reported-by: luo penghao <luo.penghao@zte.com.cn>
Suggested-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/62e8b6a1cce9359682051deb736a3c0953c9d1e9.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
ext4_prepare_inline_data() already checks for ext4_get_max_inline_size()
and returns -ENOSPC. So there is no need to check it twice within
ext4_da_write_inline_data_begin(). This patch removes the extra check.
It also makes it more clean.
No functionality change in this patch.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/cdd1654128d5105550c65fd13ca5da53b2162cc4.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
While running "./check -I 200 generic/475" it sometimes gives below
kernel BUG(). Ideally we should not call ext4_write_inline_data() if
ext4_create_inline_data() has failed.
<log snip>
[73131.453234] kernel BUG at fs/ext4/inline.c:223!
<code snip>
212 static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
213 void *buffer, loff_t pos, unsigned int len)
214 {
<...>
223 BUG_ON(!EXT4_I(inode)->i_inline_off);
224 BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);
This patch handles the error and prints out a emergency msg saying potential
data loss for the given inode (since we couldn't restore the original
inline_data due to some previous error).
[ 9571.070313] EXT4-fs (dm-0): error restoring inline_data for inode -- potential data loss! (inode 1703982, error -30)
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/9f4cd7dfd54fa58ff27270881823d94ddf78dd07.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
in the follow scenario:
1. jbd start transaction n
2. task A get new handle for transaction n+1
3. task A do some actions and add inode to FC_Q_MAIN fc_q
4. jbd complete transaction n and clear FC_Q_MAIN fc_q
5. task A call fsync
Fast commit will lost the file actions during a full commit.
we should also add updates to staging queue during a full commit.
and in ext4_fc_cleanup(), when reset a inode's fc track range, check
it's i_sync_tid, if it bigger than current transaction tid, do not
rest it, or we will lost the track range.
And EXT4_MF_FC_COMMITTING is not needed anymore, so drop it.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Link: https://lore.kernel.org/r/20220117093655.35160-3-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
For the follow scenario:
1. jbd start commit transaction n
2. task A get new handle for transaction n+1
3. task A do some ineligible actions and mark FC_INELIGIBLE
4. jbd complete transaction n and clean FC_INELIGIBLE
5. task A call fsync
In this case fast commit will not fallback to full commit and
transaction n+1 also not handled by jbd.
Make ext4_fc_mark_ineligible() also record transaction tid for
latest ineligible case, when call ext4_fc_cleanup() check
current transaction tid, if small than latest ineligible tid
do not clear the EXT4_MF_FC_INELIGIBLE.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Suggested-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
For now in ext4_mb_new_blocks_simple, if we found a block which
should be excluded then will switch to next group, this may
probably cause 'group' run out of range.
Change to check next block in the same group when get a block should
be excluded. Also change the search range to EXT4_CLUSTERS_PER_GROUP
and add error checking.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20220110035141.1980-3-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
During fast commit replay procedure, we clear inode blocks bitmap in
ext4_ext_clear_bb(), this may cause ext4_mb_new_blocks_simple() allocate
blocks still in use.
Make ext4_fc_record_regions() also record physical disk regions used by
inodes during replay procedure. Then ext4_mb_new_blocks_simple() can
excludes these blocks in use.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Link: https://lore.kernel.org/r/20220110035141.1980-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
If the copy back to userland fails for the FASTRPC_IOCTL_ALLOC_DMA_BUFF
ioctl(), we shouldn't assume that 'buf->dmabuf' is still valid. In fact,
dma_buf_fd() called fd_install() before, i.e. "consumed" one reference,
leaving us with none.
Calling dma_buf_put() will therefore put a reference we no longer own,
leading to a valid file descritor table entry for an already released
'file' object which is a straight use-after-free.
Simply avoid calling dma_buf_put() and rely on the process exit code to
do the necessary cleanup, if needed, i.e. if the file descriptor is
still valid.
Fixes: 6cffd79504 ("misc: fastrpc: Add support for dmabuf exporter")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20220127130218.809261-1-minipli@grsecurity.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrii Nakryiko says:
====================
Clean up remaining missed uses of deprecated libbpf APIs across samples/bpf,
selftests/bpf, libbpf, and bpftool.
Also fix uninit variable warning in bpftool.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Switch to using new bpf_xdp_*() APIs across all selftests. Take
advantage of a more straightforward and user-friendly semantics of
old_prog_fd (0 means "don't care") in few places.
This is a redo of 544356524d ("selftests/bpf: switch to new libbpf XDP
APIs"), which was previously reverted to minimize conflicts during bpf
and bpf-next tree merge.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-6-andrii@kernel.org
Newer GCC complains about capturing the address of unitialized variable.
While there is nothing wrong with the code (the variable is filled out
by the kernel), initialize the variable anyway to make compiler happy.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-4-andrii@kernel.org
libbpf 1.0 is not going to support passing ifindex to BPF
prog/map/helper feature probing APIs. Remove the support for BPF offload
feature probing.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-3-andrii@kernel.org
Open-code bpf_map__is_offload_neutral() logic in one place in
to-be-deprecated bpf_prog_load_xattr2.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-2-andrii@kernel.org
When building with 'make -s', there is some output from resolve_btfids:
$ make -sj"$(nproc)" oldconfig prepare
MKDIR .../tools/bpf/resolve_btfids/libbpf/
MKDIR .../tools/bpf/resolve_btfids//libsubcmd
LINK resolve_btfids
Silent mode means that no information should be emitted about what is
currently being done. Use the $(silent) variable from Makefile.include
to avoid defining the msg macro so that there is no information printed.
Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
Do not always copy ethernet header in mt{7615,7915,7921}_reverse_frag0_hdr_trans
and use a pointer instead.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Client mode on DFS channels was broken, because the old code was activating
the DFS detector on radar channels while leaving it in CAC state.
This was caused by making the decision based on the channel radar flag,
instead of hw->conf.radar_enabled.
In order to properly deal with the various corner cases, rip out the state
handling code and replace it with something that's much easier to reason
about.
Tested-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
The 2711 pixel valve can't produce odd horizontal timings, and
checks were added to vc4_hdmi_encoder_atomic_check and
vc4_hdmi_encoder_mode_valid to filter out/block selection of
such modes.
Modes with DRM_MODE_FLAG_DBLCLK double all the horizontal timing
values before programming them into the PV. The PV values,
therefore, can not be odd, and so the modes can be supported.
Amend the filtering appropriately.
Fixes: 57fb32e632 ("drm/vc4: hdmi: Block odd horizontal timings")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127135116.298278-1-maxime@cerno.tech
The code that set the scdc_enabled flag to ensure it was
disabled at boot time also ran on Pi0-3 where there is no
SCDC support. This lead to a warning in vc4_hdmi_encoder_post_crtc_disable
due to vc4_hdmi_disable_scrambling being called and trying to
read (and write) register HDMI_SCRAMBLER_CTL which doesn't
exist on those platforms.
Only set the flag should the interface be configured to support
more than HDMI 1.4.
Fixes: 1998646129 ("drm/vc4: hdmi: Introduce a scdc_enabled flag")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127134559.292778-1-maxime@cerno.tech
The existing logic was flawed in that it could try reading the
2711 specific registers for HPD on a CM1/3 where the HPD GPIO
hadn't been defined in DT.
Ensure we don't do the 2711 register read on invalid hardware,
and then
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127131754.236074-1-maxime@cerno.tech
This reverts commit 54d516b1d6
That commit did a refactoring that effectively combined fast and slow
gup paths (again). And that was again incorrect, for two reasons:
a) Fast gup and slow gup get reference counts on pages in different
ways and with different goals: see Linus' writeup in commit
cd1adf1b63 ("Revert "mm/gup: remove try_get_page(), call
try_get_compound_head() directly""), and
b) try_grab_compound_head() also has a specific check for
"FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller
can fall back to slow gup. This resulted in new failures, as
recently report by Will McVicker [1].
But (a) has problems too, even though they may not have been reported
yet. So just revert this.
Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com [1]
Fixes: 54d516b1d6 ("mm/gup: small refactoring: simplify try_grab_page()")
Reported-and-tested-by: Will McVicker <willmcvicker@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Minchan Kim <minchan@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King says:
====================
net: dsa: mv88e6xxx: convert to phylink_generic_validate()
The overall objective of this series is to convert the mv88e6xxx DSA
driver to use phylink_generic_validate().
Patch 1 adds a new helper mv88e6352_g2_scratch_port_has_serdes() which
indicates whether an 88e6352 port has a serdes associated with it. This
is necessary as ports 4 and 5 will normally be in automedia mode, where
the CMODE field in the port status register will change e.g. between 15
(internal PHY) and 9 (1000base-X) depending on whether the serdes has
link.
The existing code caches the cmode field, and depending whether the
serdes has link at probe time, determines whether we allow things such
as the serdes statistics to be accessed. This means if the link isn't
up at probe time, the serdes is essentially unavailable.
Patch 1 addresses this by reading the pin configuration to find out
whether the serdes is attached to port 4 or port 5.
Patch 2 is a joint effort between myself and Marek Behún, adding the
supported interfaces and MAC capabilities to all mv88e6xxx supported
switch devices. This is slightly more restrictive than the original
code as we didn't used to care too much about the interface mode, but
with this we do - which is why we must know if there's a serdes
associated now.
Patch 3 switches mv88e6xxx to use the generic validation by removing
the initialisation of the phylink_validate pointer in the dsa_ops
struct.
Patch 4 updates the statistics code to use the new helper in patch 1,
so the serdes statistics are available even if the link was down at
driver probe time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The decision whether to report serdes statistics currently depends on
the cached C_Mode value for the port, read at probe time or updated by
configuration. However, port 4 can be in "automedia" mode when it is
used as a serdes port, meaning it switches between the internal PHY and
the serdes, changing the read-only C_Mode value depending on which
first gains link. Consequently, the C_Mode value read at probe does not
accurately reflect whether the port has the serdes associated with it.
In "net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()",
we added a way to read the hardware configuration to determine which
port has the serdes associated with it. Use this to determine which
port reports the serdes statistics.
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the mv88e6xxx chip drivers are supplying the supported
interfaces and MAC capabilities, switch the driver to use the generic
phylink validation implementation by removing our own validation
implementations. This causes DSA to call phylink_generic_validate()
on our behalf.
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Populate the supported interfaces and MAC capabilities for the
Marvell MV88E6xxx DSA switches in preparation to using these for the
validation functionality.
Patch co-authored by Marek.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org> [ fixed 6341 and 6393x ]
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Read the hardware configuration to determine which port is attached
to the serdes.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Waldekranz says:
====================
net: dsa: mv88e6xxx: Improve standalone port isolation
The ideal isolation between standalone ports satisfies two properties:
1. Packets from one standalone port must not be forwarded to any other
port.
2. Packets from a standalone port must be sent to the CPU port.
mv88e6xxx solves (1) by isolating standalone ports using the PVT. Up
to this point though, (2) has not guaranteed; as the ATU is still
consulted, there is a chance that incoming packets never reach the CPU
if its DA has previously been used as the SA of an earlier packet (see
1/5 for more details). This is typically not a problem, except for one
very useful setup in which switch ports are looped in order to run the
bridge kselftests in tools/testing/selftests/net/forwarding. This
series attempts to solve (2).
Ideally, we could simply use the "ForceMap" bit of more modern chips
(Agate and newer) to classify all incoming packets as MGMT. This is
not available on older silicon that is still widely used (Opal Plus
chips like the 6097 for example).
Instead, this series takes a two pronged approach:
1/5: Always clear MapDA on standalone ports to make sure that no ATU
entry can lead packets astray. This solves (2) for single-chip
systems.
2/5: Trivial prep work for 4/5.
3/5: Trivial prep work for 4/5.
4/5: On multi-chip systems though, this is not enough. On the incoming
chip, the packet will be forced out towards the CPU thanks to
1/5, but on any intermediate chips the ATU is still consulted. We
override this behavior by marking the reserved standalone VID (0)
as a policy VID, the DSA ports' VID policy is set to TRAP. This
will cause the packet to be reclassified as MGMT on the first
intermediate chip, after which it's a straight shot towards the
CPU.
Finally, we allow more tests to be run on mv88e6xxx:
5/5: The bridge_vlan{,un}aware suites sets an ageing_time of 10s on
the bridge it creates, but mv88e6xxx has a minimum supported time
of 15s. Allow this time to be overridden in forwarding.config.
With this series in place, mv88e6xxx passes the following kselftest
suites:
- bridge_port_isolation.sh
- bridge_sticky_fdb.sh
- bridge_vlan_aware.sh
- bridge_vlan_unaware.sh
v1 -> v2:
- Wording/spelling (Vladimir)
- Use standard iterator in dsa_switch_upstream_port (Vladimir)
- Limit enabling of VTU port policy to downstream DSA ports (Vladimir)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the ageing timeout that is set on bridges to be customized from
forwarding.config. This allows the tests to be run on hardware which
does not support a 10s timeout (e.g. mv88e6xxx).
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>