Commit graph

1059881 commits

Author SHA1 Message Date
Hao Xu
8d4af6857c io_uring: return boolean value for io_alloc_async_data
boolean value is good enough for io_alloc_async_data.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101522.9179-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:53 -06:00
Pavel Begunkov
68fe256aad io_uring: optimise io_req_init() sqe flags checks
IOSQE_IO_DRAIN is quite marginal and we don't care too much about
IOSQE_BUFFER_SELECT. Save to ifs and hide both of them under
SQE_VALID_FLAGS check. Now we first check whether it uses a "safe"
subset, i.e. without DRAIN and BUFFER_SELECT, and only if it's not
true we test the rest of the flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dccfb9ab2ab0969a2d8dc59af88fa0ce44eeb1d5.1631703764.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:53 -06:00
Pavel Begunkov
a3f349071e io_uring: remove ctx referencing from complete_post
Now completions are done from task context, that means that it's either
the task itself, task_work or io-wq worker. In all those cases the ctx
will be staying alive by mutexing, explicit referencing or req references
by iowq. Remove extra ctx pinning from io_req_complete_post().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60a0e96434c16ab4fe587651448290d61ec9a113.1631703756.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:53 -06:00
Hao Xu
83f84356bc io_uring: add more uring info to fdinfo for debug
Developers may need some uring info to help themselves debug and address
issues in production. This includes sqring/cqring head/tail and the
detailed sqe/cqe info, which is very useful when an application is hung
on a ring.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210913130854.38542-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Pavel Begunkov
d97ec6239a io_uring: kill extra wake_up_process in tw add
TWA_SIGNAL already wakes the thread, no need in wake_up_process() after
it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e90cf643f633e857443e0c9e72471b221735c50.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Pavel Begunkov
c450178d9b io_uring: dedup CQE flushing non-empty checks
We don't do io_submit_flush_completions() when there is no requests
enqueued, and every single caller checks for it. Hide that check into
the function not forgetting about inlining. That will make it much
easier for changing the empty check condition in the future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7ff8cef5da1b38e8ea648f5aad9a315ddfc7b57.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Pavel Begunkov
d81499bfcd io_uring: inline linked part of io_req_find_next
Inline part of __io_req_find_next() that returns a request but doesn't
need io_disarm_next(). It's just two places, but makes links a bit
faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4126d13f23d0e91b39b3558e16bd86cafa7fcef2.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Pavel Begunkov
6b639522f6 io_uring: inline io_dismantle_req
io_dismantle_req() is hot, and not _too_ huge. Inline it, there are 3
call sites, which hopefully will turn into 2 in the future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bdd2dc30716cac270c2403e99bccd6286e4ae201.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Pavel Begunkov
4b628aeb69 io_uring: kill off ios_left
->ios_left is only used to decide whether to plug or not, kill it to
avoid this extra accounting, just use the initial submission number.
There is no much difference in regards of enabling plugging, where this
one does it in a few more cases, but all major ones should be covered
well.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f13993bcf5b477f9a7d52881fc49f9457ea9870a.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Bixuan Cui
71e1cef2d7 io-wq: Remove duplicate code in io_workqueue_create()
While task_work_add() in io_workqueue_create() is true,
then duplicate code is executed:

  -> clear_bit_unlock(0, &worker->create_state);
  -> io_worker_release(worker);
  -> atomic_dec(&acct->nr_running);
  -> io_worker_ref_put(wq);
  -> return false;

  -> clear_bit_unlock(0, &worker->create_state); // back to io_workqueue_create()
  -> io_worker_release(worker);
  -> kfree(worker);

The io_worker_release() and clear_bit_unlock() are executed twice.

Fixes: 3146cba99a ("io-wq: make worker creation resilient against signals")
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Link: https://lore.kernel.org/r/20210911085847.34849-1-cuibixuan@huawei.com
Reviwed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Jens Axboe
a87acfde94 io_uring: dump sqe contents if issue fails
I recently had to look at a production problem where a request ended
up getting the dreaded -EINVAL error on submit. The most used and
hence useless of error codes, as it just tells you that something
was wrong with your request, but not more than that.

Let's dump the full sqe contents if we run into an issue failure,
that'll allow easier diagnosing of a wide variety of issues.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
luo penghao
1bd297988b e1000e: Remove redundant statement
This assignment statement is meaningless, because the statement
will execute to the tag "set_itr_now".

The clang_analyzer complains as follows:

drivers/net/ethernet/intel/e1000e/netdev.c:2552:3 warning:

Value stored to 'current_itr' is never read.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:48:40 +01:00
Jens Axboe
e0d78afeb8 block: fix too broad elevator check in blk_mq_free_request()
We added RQF_ELV to tell whether there's an IO scheduler attached, and
RQF_ELVPRIV tells us whether there's an IO scheduler with private data
attached. Don't check RQF_ELV in blk_mq_free_request(), what we care
about here is just if we have scheduler private data attached.

This fixes a boot crash

Fixes: 2ff0682da6 ("block: store elevator state in request")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:48:15 -06:00
David S. Miller
f4e728ff94 Merge branch 'eth_hw_addr_gen-for-switches'
Jakub Kicinski says:

====================
ethernet: add eth_hw_addr_gen() for switches

While doing the last polishing of the drivers/ethernet
changes I realized we have a handful of drivers offsetting
some base MAC addr by an id. So I decided to add a helper
for it. The helper takes care of wrapping which is probably
not 100% necessary but seems like a good idea. And it saves
driver side LoC (the diffstat is actually negative if we
compare against the changes I'd have to make if I was to
convert all these drivers to not operate directly on
netdev->dev_addr).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:46:25 +01:00
Jakub Kicinski
07a7ec9bda ethernet: sparx5: use eth_hw_addr_gen()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:46:25 +01:00
Jakub Kicinski
be7550549e ethernet: mlxsw: use eth_hw_addr_gen()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:46:25 +01:00
Jakub Kicinski
ba3fdfe32b ethernet: fec: use eth_hw_addr_gen()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:46:25 +01:00
Jakub Kicinski
8eb8192ea2 ethernet: prestera: use eth_hw_addr_gen()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Vadym and Taras report that the current behavior of the driver
is not exactly expected and it's better to add the port id in
like other drivers do.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:46:24 +01:00
Jakub Kicinski
53fdcce6ab ethernet: ocelot: use eth_hw_addr_gen()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:46:24 +01:00
Jakub Kicinski
e80094a473 ethernet: add a helper for assigning port addresses
We have 5 drivers which offset base MAC addr by port id.
Create a helper for them.

This helper takes care of overflows, which some drivers
did not do, please complain if that's going to break
anything!

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:46:24 +01:00
Aharon Landau
ae0579acde RDMA/mlx5: Attach ndescs to mlx5_ib_mkey
Generalize the use of ndescs by adding it to mlx5_ib_mkey.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:42:53 +03:00
David S. Miller
867a92846e Merge branch 'dev_addr-conversions-part-two'
Jakub Kicinski says:

====================
ethernet: manual netdev->dev_addr conversions (part 2)

Manual conversions of Ethernet drivers writing directly
to netdev->dev_addr (part 2 out of 3).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:48 +01:00
Jakub Kicinski
f15fef4c06 ethernet: smsc: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Break the address up into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:48 +01:00
Jakub Kicinski
02bfb6beb6 ethernet: smc91x: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:48 +01:00
Jakub Kicinski
74fad215ee ethernet: sis900: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:48 +01:00
Jakub Kicinski
f60e8b06e0 ethernet: sis190: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:48 +01:00
Jakub Kicinski
15fa05bf41 ethernet: sxgbe: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:48 +01:00
Jakub Kicinski
298b0e0c5f ethernet: rocker: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:47 +01:00
Jakub Kicinski
0b08956cd5 ethernet: renesas: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Break the address up into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:47 +01:00
Jakub Kicinski
1c5d09d587 ethernet: r8169: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:47 +01:00
Jakub Kicinski
88e102e877 ethernet: netxen: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Invert the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:47 +01:00
Jakub Kicinski
b814d32869 ethernet: lpc: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:47 +01:00
Jakub Kicinski
4789b57af3 ethernet: sky2/skge: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:47 +01:00
Jakub Kicinski
15c343eb05 ethernet: mv643xx: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:41:47 +01:00
Aharon Landau
4123bfb0b2 RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib
Move mlx5_core_mkey struct to mlx5_ib, as the mlx5_core doesn't use it
at this point.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:35:28 +03:00
Aharon Landau
83fec3f12a RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key
In mlx5_core and vdpa there is no use of mlx5_core_mkey members except
for the key itself.

As preparation for moving mlx5_core_mkey to mlx5_ib, the occurrences of
struct mlx5_core_mkey in all modules except for mlx5_ib are replaced by
a u32 key.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:34:12 +03:00
Aharon Landau
c64674168b RDMA/mlx5: Remove pd from struct mlx5_core_mkey
There is no read of mkey->pd, only write. Remove it.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:33:58 +03:00
Aharon Landau
062fd731e5 RDMA/mlx5: Remove size from struct mlx5_core_mkey
mkey->size is already stored in ibmr->length, no need to store it here.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:33:44 +03:00
Aharon Landau
cf6a8b1b24 RDMA/mlx5: Remove iova from struct mlx5_core_mkey
iova is already stored in ibmr->iova, no need to store it here.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:33:20 +03:00
David S. Miller
641a305b88 Merge branch 'mlxsw-multi-level-qdisc-offload'
Ido Schimmel says:

====================
mlxsw: Multi-level qdisc offload

Petr says:

Currently, mlxsw admits for offload a suitable root qdisc, and its
children. Thus up to two levels of hierarchy are offloaded. Often, this is
enough: one can configure TCs with RED and TCs with a shaper on, and can
even see counters for each TC by looking at a qdisc at a sufficiently
shallow position.

While simple, the system has obvious shortcomings. It is not possible to
configure both RED and shaping on one TC. It is not possible to place a
PRIO below root TBF, which would then be offloaded as port shaper. FIFOs
are only offloaded at root or directly below, which is confusing to users,
because RED and TBF of course have their own FIFO.

This patch set lifts assumptions that prevent offloading multi-level qdisc
trees.

In patch #1, offload of a graft operation is added to TBF. Grafts are
issued as another qdisc is linked to the qdisc in question, and give
drivers a chance to react to the linking. The absence of this event was not
a major issue so far, because TBF was not considered classful, which
changes with this patchset.

The codebase currently assumes that ETS and PRIO are the only classful
qdiscs. The following patches gradually lift this assumption.

In patch #2, calculation of traffic class and priomap of a qdisc is fixed.

Patch #3 fixes handling of future FIFOs. Child FIFO qdiscs may be created
and notified before their parent qdisc exists and therefore need special
handling.

Patches #4, #5 and #6 unify, respectively, child destruction, child
grafting, and cleanup of statistics.

Patch #7 adds a function that validates whether a given qdisc topology is
offloadable.

Finally in patch #8, TBF and RED become classful. At this point, FIFO
qdiscs grafted to an offloaded qdisc should always be offloaded.

Patch #9 adds a selftest to verify some offloadable and unoffloadable qdisc
trees.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:52 +01:00
Petr Machata
29c1eac2e6 selftests: mlxsw: Add a test for un/offloadable qdisc trees
This checks that various qdisc configurations either are or are not
offloaded.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:52 +01:00
Petr Machata
2a18c08d75 mlxsw: spectrum_qdisc: Make RED, TBF offloads classful
Permit offloading qdiscs below RED and TBF. In order to avoid having to
implement trivial propagating callbacks for get_prio_bitmap and
get_tclass_num, extend mlxsw_sp_qdisc_get_prio_bitmap() and
..._get_tclass_num() to handle the lack of the callback as a cue to forward
the request to the parent.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:52 +01:00
Petr Machata
c2792f38ca mlxsw: spectrum_qdisc: Validate qdisc topology
A following patch will enable offloading qdiscs that are deeper than
directly under root qdisc. Currently the topology validation consists of
demanding a root qdisc position for ETS and PRIO. Since RED and TBF are
considered classless, this is enough. In order to prevent some nonsensical
combinations when RED and TBF become classful, introduce a more general
topology validator.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:52 +01:00
Petr Machata
01164dda0a mlxsw: spectrum_qdisc: Clean stats recursively when priomap changes
On Spectrum, there are no per-TC TX counters. Instead, mlxsw uses per-prio
counters and aggregates them according to the priomap. Therefore when
priomap changes, the counter base values need to be reset to reflect the
change. Previously, this was only done for the sole child qdisc, but a
following patch makes RED and TBF classful. Thus apply the request to the
whole sub-tree.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:51 +01:00
Petr Machata
be7e2a5a58 mlxsw: spectrum_qdisc: Unify graft validation
Qdisc graft operations have so far been reported at PRIO, ETS and RED, with
RED events ignored, because RED was not considered a classful qdisc. A
following patch will make mlxsw recognize RED and TBF as classful qdiscs,
and thus it is necessary to validate grafting at these qdiscs as well.
Rename the existing graft validator to make it clear that it is a generic
function, and invoke for RED and TBF graft events as well. Drop the
unnecessary PRIO helper and invoke the graft validator directly for PRIO as
well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:51 +01:00
Petr Machata
65626e0757 mlxsw: spectrum_qdisc: Destroy children in mlxsw_sp_qdisc_destroy()
Currently ETS and PRIO are the only offloaded classful qdiscs. Since they
are both similar, their destroy handler is the same, and it handles
children destruction itself. But now it is possible to do it generically
for any classful qdisc. Therefore promote the recursive destruction from
the ETS handler to mlxsw_sp_qdisc_destroy(), so that RED and TBF pick it up
in follow-up patches.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:51 +01:00
Petr Machata
91796f507a mlxsw: spectrum_qdisc: Extract two helpers for handling future FIFOs
Extract from __mlxsw_sp_qdisc_ets_replace() two helpers for handling of one
future FIFO resp. reinitializing the array of future FIFOs.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:51 +01:00
Petr Machata
76ff72a720 mlxsw: spectrum_qdisc: Query tclass / priomap instead of caching it
Currently when keeping track of qdiscs, mlxsw notes the TC and priomap
corresponding to each qdisc. That is fine currently, as there only ever is
one level of qdiscs to update: the direct children of ETS / PRIO. However
as deeper structures are made offloadable, ETS would need to update these
values for the complete subtree, and interim qdiscs would need to remember
to propagate the value.

Instead, reverse the responsibility: child qdiscs can ask their parent what
their TC and priomap are. ETS / PRIO know the answer right away, or there
are defaults for when the root qdisc does not assign them (e.g. when RED is
used as root qdisc). When RED and TBF become classful, they will simply
forward the request up to their parent.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:51 +01:00
Petr Machata
6b3efbfa4e net: sch_tbf: Add a graft command
As another qdisc is linked to the TBF, the latter should issue an event to
give drivers a chance to react to the grafting. In other qdiscs, this event
is called GRAFT, so follow suit with TBF as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:24:51 +01:00
David S. Miller
aaa5570612 mlx5-updates-2021-10-18
Maor Maor Gottlieb says:
 ========================
 Use hash to select the affinity port in VF LAG
 
 Current VF LAG architecture is based on QP association with a port.
 QP must be created after LAG is enabled to allow association with non-native port.
 VM Packets going on slow-path to eSwicth manager (SW path or hairpin) will be transmitted
 through a different QP than the VM. This means that Different packets of the same flow might
 egress from different physical ports.
 
 This patch-set solves this issue by moving the port selection to be based on the hash function
 defined by the bond.
 
 When the device is moved to VF LAG mode, the driver creates TTC (traffic type classifier) flow
 tables in order to classify the packet and steer it to the relevant hash function. Similar to what
 is done in the mlx5 RSS implementation.
 
 Each rule in the TTC table, forwards the packet to port selection flow table which has one hash
 split flow group which contains two "catch all" flow table entries. Each entry point to the
 relative uplink port. As shown below:
 
 		-------------------
 		| FT              |
 TTC rule ->	|     ----------- |
 		|   FG|   FTE --|-|-----> uplink of port #1
 		|     |   FTE --|-|-----> uplink of port #2
 		|     ----------- |
 		-------------------
 
 Hash split flow group is flow group that created as type of HASH_SPLIT and associated with match definer.
 The match definer define the fields which included in the hash calculation.
 
 The driver creates the match definer according to the xmit hash policy of the bond driver.
 
 Patches overview:
 ========================
 
 Minor E-Switch updates:
 - Patch #12, dynamic  allocation of dest array
 - Patch #13, increase number of forward destinations to 32
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFuOPIACgkQSD+KveBX
 +j4Ejgf/ScZmLSMvPu8doQ+eLG6nSiA5EAXkJqx0dwZZzzB4hSJleYuTveab/rgA
 HNhSZiVI8YXrscqvBAWNVAE8wQ0DFgYtDFs5UfUc/Pd+dZsqk7+ecHlo+kBCkYSn
 3fNTKSkdzZsGz5hOu0eP3rteIvTf9JrtB07rfBLbma/nuTnSGxIFYQpDe7H52jW2
 pov9LEonara9kjJ7BFtaupQMwCpVwYuPkMPTnt/qO1IOE18GHnK5SXgdMSlLBdjY
 HKfBF6jXWooDlN9nAxvH+2RWsumng0pRyujw0uvHTQ6SDNoOKf1ucj8znOHCFC1h
 sTfzAX6jUdrRtJ6hWLo+p9YlkLtjSw==
 =TVAs
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

mlx5-updates-2021-10-18

Maor Maor Gottlieb says:
========================
Use hash to select the affinity port in VF LAG

Current VF LAG architecture is based on QP association with a port.
QP must be created after LAG is enabled to allow association with non-native port.
VM Packets going on slow-path to eSwicth manager (SW path or hairpin) will be transmitted
through a different QP than the VM. This means that Different packets of the same flow might
egress from different physical ports.

This patch-set solves this issue by moving the port selection to be based on the hash function
defined by the bond.

When the device is moved to VF LAG mode, the driver creates TTC (traffic type classifier) flow
tables in order to classify the packet and steer it to the relevant hash function. Similar to what
is done in the mlx5 RSS implementation.

Each rule in the TTC table, forwards the packet to port selection flow table which has one hash
split flow group which contains two "catch all" flow table entries. Each entry point to the
relative uplink port. As shown below:

		-------------------
		| FT              |
TTC rule ->	|     ----------- |
		|   FG|   FTE --|-|-----> uplink of port #1
		|     |   FTE --|-|-----> uplink of port #2
		|     ----------- |
		-------------------

Hash split flow group is flow group that created as type of HASH_SPLIT and associated with match definer.
The match definer define the fields which included in the hash calculation.

The driver creates the match definer according to the xmit hash policy of the bond driver.

Patches overview:
========================

Minor E-Switch updates:
- Patch #12, dynamic  allocation of dest array
- Patch #13, increase number of forward destinations to 32

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 12:16:34 +01:00