Commit graph

766102 commits

Author SHA1 Message Date
Prarit Bhargava
012350411b tools/power turbostat: Add Node in output
Output a Node column if there is more than one node/socket.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:47 -04:00
Prarit Bhargava
40f5cfe7b8 tools/power turbostat: add node information into turbostat calculations
The previous patches have added node information to turbostat, but the
counters code does not take it into account.

Add node information from cpu_topology calculations to turbostat
counters.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:47 -04:00
Prarit Bhargava
70a9c6e8ed tools/power turbostat: remove num_ from cpu_topology struct
Cleanup, remove num_ from num_nodes_per_pkg, num_cores_per_node, and
num_threads_per_node.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:47 -04:00
Prarit Bhargava
139dd0e07c tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
turbostat incorrectly assumes that there is one node per package.  As a
result num_cores_per_pkg is not correctly named and is actually
num_cores_per_node.

Rename num_cores_per_pkg to num_cores_per_node.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:46 -04:00
Prarit Bhargava
8cb48b32a5 tools/power turbostat: track thread ID in cpu_topology
The code can be simplified if the cpu_topology *cpus tracks the thread
IDs.  This removes an additional file lookup and simplifies the counter
initialization code.

Add thread ID to cpu_topology information and cleanup the counter
initialization code.

v2: prevent thread_id from being overwritten

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:46 -04:00
Prarit Bhargava
ef6057417a tools/power turbostat: Calculate additional node information for a package
The code currently assumes each package has exactly one node.  This is not
the case for AMD systems and Intel systems with COD.  AMD systems also
may re-enumerate each node's core IDs starting at 0 (for example, an AMD
processor may have two nodes, each with core IDs from 0 to 7).  In order
to properly enumerate the cores we need to track both the physical and
logical node IDs.

Add physical_node_id to track the node ID assigned by the kernel, and
logical_node_id used by turbostat to track the nodes per package ie) a
0-based count within the package.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:46 -04:00
Len Brown
0e2d8f058f tools/power turbostat: Fix node and siblings lookup data
The turbostat code only looks at thread_siblings_list to determine if
processing units/threads are on the same the core.  This works well on
Intel systems which have a shared L1 instruction and data cache.  This
does not work on AMD systems which have shared L1 instruction cache but
separate L1 data caches.  Other utilities also check sibling's core ID
to determine if the processing unit shares the same core.

Additionally, the cpu_topology *cpus list used in topology_probe() can
be used elsewhere in the code to simplify things.

Export *cpus to the entire turbostat code, and add Processing Unit/Thread
IDs information to each cpu_topology struct.  Confirm that the thread
is on the same core as indicated by thread_siblings_list.

[v2]: Fixup CPU_* usage that caused gcc malloc error.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:46 -04:00
Prarit Bhargava
843c57916d tools/power turbostat: set max_num_cpus equal to the cpumask length
Future fixes will use sysfs files that contain cpumask output.  The code
needs to know the length of the cpumask in order to determine which cpus
are set in a cpumask.  Currently topo.max_cpu_num is the maximum cpu
number.  It can be increased the the maximum value of cpus represented in
cpumasks.

Set max_num_cpus to the length of a cpumask.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:46 -04:00
Chen Yu
023fe0ac97 tools/power turbostat: if --num_iterations, print for specific number of iterations
There's a use case during test to only print specific round of iterations
if --num_iterations is specified, for example, with this patch applied:

turbostat -i 5 -n 4
will capture 4 samples with 5 seconds interval.

[lenb: renamed to --num_iterations from --iterations]

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Srinivas Pandruvada
997e53950e tools/power turbostat: Add Cannon Lake support
All MSRs related to turbostat are same as Kabylake.
Even though SDM claims that core C3 residency can be read from MSR 0x662,
the read on this MSR fails on CNL platform. Hence disabled C3 MSR read
and display.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Len Brown
9d4eab02a7 tools/power turbostat: delete duplicate #defines
The SNB_C1_AUTO_UNDEMOTE definition should have been deleted once
it was copied into msr-index.h.  One copy of the truth is better --
particularly when Matt needs to fix it:-)

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Matt Turner
a00072a24a x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added in the
initial turbostat commit. The reversed definitions were presumably
copied from turbostat.c to this file.

Fixes: 9c63a650bb ("tools/power/x86/turbostat: share kernel MSR #defines")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Matt Turner
e0d34648b4 tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added.

Fixes: 889facbee3 ("tools/power turbostat: v3.0: monitor Watts and Temperature")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:44 -04:00
Len Brown
0748eaf0cf tools/power turbostat: add POLL and POLL% column
Like the "C1" and "C1%" column, the new POLL and POLL% columns
show invocations and residency% during the measurement interval.

While it didn't seem important to track in the past,
we've recently found some Linux cpuidle bugs related to POLL%.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:44 -04:00
Len Brown
4bd1f8f21a tools/power turbostat: Fix --hide Pk%pc10
The column header for PC10 residency is "Pk%pc10"
This is missing the 'g' that others have, eg Pkg%pc6,
to allow tab-delimited columns to fit into 8-columns.

However, --hide Pk%pc10 did not work, it was still looking for the 'g'.
This was confusing, because --list shows the correct "Pk%pc10"

Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:44 -04:00
Len Brown
be0e54c4eb tools/power turbostat: Build-in "Low Power Idle" counters support
Linux 4.15 exports the ACPI Low Power Idle Table's
counters in /sys/devices/system/cpu/cpuidle/

low_power_idle_cpu_residency_us

	Show this in the "CPU%LPI" column.

	Today this reflects the "North Complex"
	residency in PC10, so expect it to
	closely follow "Pk%pc10".

low_power_idle_system_residency_us

	Show this in the "SYS%LPI" column.

	Today, this reflects the North is in PC10,
	plus the PCH is sufficiently quiescent
	to save additional power via the "S0ix"
	system state, as measured by the
	PCH SLP_S0 counter.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:40 -04:00
Greg Kroah-Hartman
c1c2873df0 clk: remove clk_debugfs_add_file()
No one was using this api call, so remove it.  If it is ever needed in
the future, a "raw" debugfs call can be used.

Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-06-01 19:25:53 -07:00
Greg Kroah-Hartman
df500f22d7 clk: tegra: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

The return value of these functions were never checked in the end
anyway, so it is obvious this does not change any functionality :)

Cc: Peter De Schrijver <pdeschrijver@nvidia.com>
Cc: Prashant Gaikwad <pgaikwad@nvidia.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-06-01 19:25:51 -07:00
Greg Kroah-Hartman
bcee76731c clk: davinci: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Acked-by: David Lechner <david@lechnology.com>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-06-01 19:22:56 -07:00
Greg Kroah-Hartman
c0526a111a clk: bcm2835: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Michael Turquette <mturquette@baylibre.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: Phil Elwell <phil@raspberrypi.org>
Cc: Boris Brezillon <boris.brezillon@bootlin.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Danilo Krummrich <danilokrummrich@dk-develop.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-06-01 19:22:29 -07:00
Greg Kroah-Hartman
8a26bbbb93 clk: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

This cleans up the init code a lot, and there's no need to return an
error value based on the debugfs calls, especially as it turns out no
one was even looking at that return value.  So it obviously wasn't that
important :)

Cc: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-06-01 19:22:26 -07:00
Colin Didier
b1569380a6 clk: imx6: add EPIT clock support
Add EPIT clock support to the i.MX6Q clocking infrastructure.

Signed-off-by: Colin Didier <colin.didier@devialet.com>
Signed-off-by: Clément Peron <clement.peron@devialet.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-06-01 19:19:30 -07:00
Christoph Hellwig
afd9d6a1df fs: use ->is_partially_uptodate in page_cache_seek_hole_data
This way the implementation doesn't depend on buffer_head internals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Christoph Hellwig
bd56b3e144 fs: remove the buffer_unwritten check in page_seek_hole_data
We only call into this function through the iomap iterators, so we already
know the buffer is unwritten.  In addition to that we always require the
uptodate flag that is ORed with the result anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Christoph Hellwig
8a78cb1f1b fs: move page_cache_seek_hole_data to iomap.c
This function is only used by the iomap code, depends on being called
from it, and will soon stop poking into buffer head internals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Christoph Hellwig
b84e772299 xfs: use iomap_bmap
Switch to the iomap based bmap implementation to get rid of one of the
last users of xfs_get_blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Christoph Hellwig
89eb1906a9 iomap: add an iomap-based bmap implementation
This adds a simple iomap-based implementation of the legacy ->bmap
interface.  Note that we can't easily add checks for rt or reflink
files, so these will have to remain in the callers.  This interface
just needs to die..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Christoph Hellwig
57fc505d11 iomap: add a iomap_sector helper
Factor the repeated calculation of the on-disk sector for a given logical
block into a littler helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Christoph Hellwig
6533b4e40d iomap: use __bio_add_page in iomap_dio_zero
We don't need any merging logic, and this also replaces a BUG_ON with a
WARN_ON_ONCE inside __bio_add_page for the impossible overflow condition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Christoph Hellwig
7ee66c03e4 iomap: move IOMAP_F_BOUNDARY to gfs2
Just define a range of fs specific flags and use that in gfs2 instead of
exposing this internal flag globally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Christoph Hellwig
9ecac0ef22 iomap: fix the comment describing IOMAP_NOWAIT
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Christoph Hellwig
19319b5321 iomap: inline data should be an iomap type, not a flag
Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Christoph Hellwig
b3751e6ab4 mm: split ->readpages calls to avoid non-contiguous pages lists
That way file systems don't have to go spotting for non-contiguous pages
and work around them.  It also kicks off I/O earlier, allowing it to
finish earlier and reduce latency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Christoph Hellwig
c534aa3fdd mm: return an unsigned int from __do_page_cache_readahead
We never return an error, so switch to returning an unsigned int.  Most
callers already did implicit casts to an unsigned type, and the one that
didn't can be simplified now.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Christoph Hellwig
836978b35f mm: give the 'ret' variable a better name __do_page_cache_readahead
It counts the number of pages acted on, so name it nr_pages to make that
obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Christoph Hellwig
0aa69fd32a block: add a lower-level bio_add_page interface
For the upcoming removal of buffer heads in XFS we need to keep track of
the number of outstanding writeback requests per page.  For this we need
to know if bio_add_page merged a region with the previous bvec or not.
Instead of adding additional arguments this refactors bio_add_page to
be implemented using three lower level helpers which users like XFS can
use directly if they care about the merge decisions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Tariq Toukan
f65a59ffbc net/mlx5e: TX, Separate cachelines of xmit and completion stats
Avoid false sharing of cachelines by separating the cachelines of
TX stats that are dertied in xmit flow and in completion flow.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
5ffd81943d net/mlx5e: RX, Always prefer Linear SKB configuration
Prefer the linear SKB configuration of Legacy RQ over the
non-linear one of Striding RQ.

This implies that ConnectX-4 LX now uses legacy RQ by default,
as it does not support the linear configuration of Striding RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
069d11465a net/mlx5e: RX, Enhance legacy Receive Queue memory scheme
Enhance the memory scheme of the legacy RQ, such that
only order-0 pages are used.

Whenever possible, prefer using a linear SKB, and build it
wrapping the WQE buffer.

Otherwise (for example, jumbo frames on x86), use non-linear SKB,
with as many frags as needed. In this case, multiple WQE
scatter entries are used, up to a maximum of 4 frags and 10KB of MTU.

This implied to remove support of HW LRO in legacy RQ, as it would
require large number of page allocations and scatter entries per WQE
on archs with PAGE_SIZE = 4KB, yielding bad performance.

In earlier patches, we guaranteed that all completions are in-order,
and that we use a cyclic WQ.
This creates an oppurtunity for a performance optimization:
The mapping between a "struct mlx5e_dma_info", and the
WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant
across different cycles of a WQ. This allows initializing
the mapping in the time of RQ creation, and not handle it
in datapath.

A struct mlx5e_dma_info that is shared between different WQEs
is allocated by the first WQE, and freed by the last one.
This implies an important requirement: WQEs that share the same
struct mlx5e_dma_info must be posted within the same NAPI.
Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly
point to the new struct mlx5e_dma_info, not the one that was posted
(and the HW wrote to).
This bulking requirement is actually good also for performance reasons,
hence we extend the bulk beyong the minimal requirement above.

With this memory scheme, the RQs memory footprint is reduce by a
factor of 2 on x86, and by a factor of 32 on PowerPC.
Same factors apply for the number of pages in a GRO session.

Performance tests:
ConnectX-4, single core, single RX ring, default MTU.

x86:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Packet rate (early drop in TC): no degradation
TCP streams: ~5% improvement

PowerPC:
CPU: POWER8 (raw), altivec supported

Packet rate (early drop in TC): 20% gain
TCP streams: 25% gain

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
99cbfa93a6 net/mlx5e: RX, Use cyclic WQ in legacy RQ
Now that LRO is not supported for Legacy RQ, there is no source of
out-of-order completions in the WQ, and we can use a cyclic one.
This has multiple advantages:
- reduces the WQE size (smaller PCI transactions).
- lower overhead in datapath (no handling of 'next' pointers).
- no reserved WQE for the WQ head (was need in linked-list).
- allows using a constant map between frag and dma_info struct, in downstream patch.

Performance tests:
ConnectX-4, single core, single RX ring.
Major gain in packet rate of single ring XDP drop.
Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps).

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
422d4c401e net/mlx5e: RX, Split WQ objects for different RQ types
Replace the common RQ WQ object with two separate ones for the
different RQ types.
This is in preparation for switching to using a cyclic WQ type
in Legacy RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
6c3a823e1e net/mlx5e: RX, Remove HW LRO support in legacy RQ
Current LRO implementation in Legacy RQ uses high-order pages.
In downstream patches of this series we complete the transition
to using only order-0 pages in RX datapath (which was already done
in Striding RQ).

Unlike the more advanced Striding RQ, Legacy RQ does not make reuse
of any non-consumed buffers of non-full LRO sessions, and combining
it with order-0 pages has many performance drawbacks.

Hence, here we totally remove LRO support in Legacy RQ.
This guarantees having no out-of-order completions, which allows using
a cyclic work queue (instead of a linked-list) in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
386471f16b net/mlx5e: RX, Dedicate a function for copying SKB header
Get the logic of copying the packet header into the SKB linear part
into a generic function. Function does copy length alignment
and dma buffer sync.

It is currently called only within the MPWQE flow.
In a downstream patch, it will be called within the legacy RQ flow
as well.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
fa698366b7 net/mlx5e: RX, Generalise function of SKB frag addition
Rename it and pass truesize as an extra argument, as it will be used also
in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
75aa889fb9 net/mlx5e: RX, Generalise name of non-linear SKB head size
Make name more generic by dropping MPWRQ from it, as it will be
used also in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
5e7d77a9c5 net/mlx5e: TX, Obsolete maintaining local copies of skb->len/data
Instead of maintaining a local copy of skb->len/data and updating
it upon every copy to the WQE inline part, just calculate it once
when needed, using the ihs.

This obsoletes the function mlx5e_tx_skb_pull_inline.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Ilan Tayari
98db16bab5 net/mlx5: FPGA, Handle QP error event
Add handlers for this event to perform graceful teardown of the device.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Adi Nissim
250a42b6a7 net/mlx5e: Support configurable MTU for vport representors
The representor MTU was hard coded to 1500 bytes.
Allow setting arbitrary MTU values up to the max supported by the FW.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Maor Gottlieb
93edcb3a75 net/mlx5e: Increase aRFS flow tables size
Increase the aRFS flow table size to 64k so it could contain up to 64k
different streams.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Eran Ben Elisha
6c63efe4cf net/mlx5e: Remove redundant active_channels indication
Now, when all channels stats are saved regardless of the channel's state
{open, closed}, we can safely remove this indication and the stats spin
lock which protects it.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00