Commit graph

1059881 commits

Author SHA1 Message Date
Ariel Levkovich
b16eb3c81f net/mlx5: Support internal port as decap route device
When performing route device lookup for decap action, support
the case of ovs internal port as the lookup result.

In such case, an internal port struct is mapped and attached
to the flow attributes so that the source port matching of the
rule will match on the internal port's metadata value.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:31 -07:00
Ariel Levkovich
5e99427217 net/mlx5e: Term table handling of internal port rules
Adjust termination table logic to handle rules which
involve internal port as filter or forwarding device.

For cases where the rule forwards from internal port
to uplink, always choose to go via termination table.
This is because it is not known from where the packet
originally arrived to the internal port and it is possible
that it came from the uplink itself, in which case
a term table is required to perform hairpin.
If the packet arrived from a vport, going via term
table has no effect.

For cases where the rule forwards to an internal port
from uplink the rep pointer will point to the uplink rep,
avoid going via termination table as it is not required.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:31 -07:00
Ariel Levkovich
166f431ec6 net/mlx5e: Add indirect tc offload of ovs internal port
Register callbacks for tc blocks of ovs internal port devices.

This allows an indirect offloading rules that apply on
such devices as the filter device.

In case a rule is added to a tc block of an internal port,
the mlx5 driver will implicitly add a matching on the internal
port's unique vport metadata value to the rule's matching list.
Therefore, only packets that previously hit a rule that redirects
to an internal port and got the vport metadata overwritten to the
internal port's unique metadata, can match on such indirect rule.

Offloading of both ingress and egress tc blocks of internal ports
is supported as opposed to other devices where only ingress block
offloading is supported.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:30 -07:00
Ariel Levkovich
100ad4e2d7 net/mlx5e: Offload internal port as encap route device
When pefroming encap action, a route lookup is performed
to find the routing device the packet should be forwarded
to after the encapsulation. This is the device that has the
local tunnel ip address.

This change adds support to offload an encap rule where the
route device ends up being an ovs internal port.
In such case, the driver will add a HW rule that will encapsulate
the packet with the tunnel header and will overwrite the vport
metadata in reg_c0 to the internal port metadata value.
Finally, the packet will be forwarded to the root table to be
processed again with the indication that it came from an internal
port.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:30 -07:00
Ariel Levkovich
27484f7170 net/mlx5e: Offload tc rules that redirect to ovs internal port
Allow offloading rules that redirect to ovs internal port
ingress and egress.

To support redirect to ingress device, offloading of REDIRECT_INGRESS
action is added.

When a tc rule redirects to ovs internal port, the hw rule will
overwrite the input vport value in reg_c0 with a new vport metadata
value that is mapped for this internal port using the internal
port mapping api that is introduce in previous patches.
After that the hw rule will redirect the packet to the root table
to continue processing with the new vport metadata value.

The new vport metadata value indicates that this packet is now
arriving through an internal port and therefore should be processed
using rules that apply on the same internal port as the filter device.
Therefore, following rules that apply on this internal port will have
to match on the same vport metadata value as part of their matching
keys to make sure the packet belongs to the internal port.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:30 -07:00
Ariel Levkovich
dbac71f229 net/mlx5e: Accept action skbedit in the tc actions list
Setting the skb packet type field to host is usually
done when performing forwarding to ingress device.

This is required since the receive handling that is used
by the redirect to ingress action checks whether the packet
doesn't belong to this host and drops the packet in such case.

In order to be able to offload action redirect ingress, tc offload
code needs to accept the skbedit ptype action as well.

There's no special handling in HW for such action since it will
be followed by a redirect action and therefore, this code
only allows us to accept such action in the actions list but
not performing anything specific in HW for it.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:29 -07:00
Ariel Levkovich
4f4edcc2b8 net/mlx5: E-Switch, Add ovs internal port mapping to metadata support
Adding infrastructure to map ovs internal port device to vport
match metadata to support offload of rules with internal port as
the filter device or as the destination device.

The infrastructure allows adding and removing internal port device
to an eswitch database and getting a unique vport metadata value to
be placed and match on in reg_c0 when offloading rules that are coming
from or going to an internal port.

The new int port metadata can be written to the source port register
in HW to indicate that current source port of the packet is the
internal port and not one of the actual HW vports (uplink or VF).
Using this method, it is possible to offload TC rules with an OVS
internal port as their destination port (overwriting the src vport
register) or as the filter port (matching on the value of the src
vport register and making sure it matches to the internal port's
value).

There is also a need to handle a miss case where the packet's
src port value was changed in HW to an internal port but a following
rule which matches on this new src port value wasn't found in HW.

In such case, the packet will be forwarded to the driver with
metadata which allows driver to restore the info of the internal
port's netdevice. Once this info is restored, the uplink driver
can forward the packet to the relevant netdevice in SW.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:29 -07:00
Ariel Levkovich
189ce08ebf net/mlx5e: Use generic name for the forwarding dev pointer
Rename tun_dev to fwd_dev within mlx5e_tc_update_priv struct
since future implementation may introduce other device types
which the handler is forwarding to.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:29 -07:00
Ariel Levkovich
28e7606fa8 net/mlx5e: Refactor rx handler of represetor device
Move the ownership of skb forwarding to network stack to the
tc update_skb handler as different cases will require different
handling of the skb.

While the tc handler will take care of the various cases and
properly handle the handover of the skb to the network stack
and freeing the skb, the main rx handler will be kept clean
from branches and usage of flags.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:29 -07:00
Muhammad Sammar
941f19798a net/mlx5: DR, Add check for unsupported fields in match param
When a matcher is being built, we "consume" (clear) mask fields one by one,
and to verify that we do support all the required fields we check if the
whole mask was consumed, else the matching request includes unsupported
fields.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-10-29 13:53:28 -07:00
Paul Blakey
504e157248 net/mlx5: Allow skipping counter refresh on creation
CT creates a counter for each CT rule, and for each such counter,
fs_counters tries to queue mlx5_fc_stats_work() work again via
mod_delayed_work(0) call to refresh all counters. This call has a
large performance impact when reaching high insertion rate and
accounts for ~8% of the insertion time when using software steering.

Allow skipping the refresh of all counters during counter creation.
Change CT to use this refresh skipping for it's counters.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:28 -07:00
Raed Salem
428ffea071 net/mlx5e: IPsec: Refactor checksum code in tx data path
Part of code that is related solely to IPsec is always compiled in the
driver code regardless if the IPsec functionality is enabled or disabled
in the driver code, this will add unnecessary branch in case IPsec is
disabled at Tx data path.

Move IPsec related code to IPsec related file such that in case of IPsec
is disabled and because of unlikely macro the compiler should be able to
optimize and omit the checksum IPsec code all together from Tx data path

Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:28 -07:00
Paul Blakey
ae2ee3be99 net/mlx5: CT: Remove warning of ignore_flow_level support for VFs
ignore_flow_level isn't supported for VFs, and so it causes
post_act and ct to warn about it.

Instead of disabling CT for VFs, and a driver update will be need
to enable CT again once firmware support this, remove this warning
specifically for VFs. This way, it could be automatically enabled on
future firmwares where VFs support ignore_flow_level capability.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:27 -07:00
Nathan Chancellor
1aec85974a net/mlx5: Add esw assignment back in mlx5e_tc_sample_unoffload()
Clang warns:

drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c:635:34: error: variable 'esw' is uninitialized when used here [-Werror,-Wuninitialized]
        mlx5_eswitch_del_offloaded_rule(esw, sample_flow->pre_rule, sample_flow->pre_attr);
                                        ^~~
drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c:626:26: note: initialize the variable 'esw' to silence this warning
        struct mlx5_eswitch *esw;
                                ^
                                 = NULL
1 error generated.

It appears that the assignment should have been shuffled instead of
removed outright like in mlx5e_tc_sample_offload(). Add it back so there
is no use of esw uninitialized.

Fixes: a64c5edbd2 ("net/mlx5: Remove unnecessary checks for slow path flag")
Link: https://github.com/ClangBuiltLinux/linux/issues/1494
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-29 13:53:27 -07:00
Asmaa Mnebhi
6c2a6ddca7 net: mellanox: mlxbf_gige: Replace non-standard interrupt handling
Since the GPIO driver (gpio-mlxbf2.c) supports interrupt handling,
replace the custom routine with simple IRQ request.

Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-10-29 22:34:03 +02:00
Asmaa Mnebhi
2b725265cb gpio: mlxbf2: Introduce IRQ support
Introduce standard IRQ handling in the gpio-mlxbf2.c
driver.

Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-10-29 22:33:40 +02:00
Przemyslaw Patynowski
605ca7c5c6 iavf: Fix kernel BUG in free_msi_irqs
Fix driver not freeing VF's traffic irqs, prior to calling
pci_disable_msix in iavf_remove.
There were possible 2 erroneous states in which, iavf_close would
not be called.
One erroneous state is fixed by allowing netdev to register, when state
is already running. It was possible for VF adapter to enter state loop
from running to resetting, where iavf_open would subsequently fail.
If user would then unload driver/remove VF pci, iavf_close would not be
called, as the netdev was not registered, leaving traffic pcis still
allocated.
Fixed this by breaking loop, allowing netdev to open device when adapter
state is __IAVF_RUNNING and it is not explicitily downed.
Other possiblity is entering to iavf_remove from __IAVF_RESETTING state,
where iavf_close would not free irqs, but just return 0.
Fixed this by checking for last adapter state and then removing irqs.

Kernel panic:
[ 2773.628585] kernel BUG at drivers/pci/msi.c:375!
...
[ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0
...
[ 2773.640939] Call Trace:
[ 2773.641572]  pci_disable_msix+0xf7/0x120
[ 2773.642224]  iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf]
[ 2773.642897]  iavf_remove+0x12e/0x500 [iavf]
[ 2773.643578]  pci_device_remove+0x3b/0xc0
[ 2773.644266]  device_release_driver_internal+0x103/0x1f0
[ 2773.644948]  pci_stop_bus_device+0x69/0x90
[ 2773.645576]  pci_stop_and_remove_bus_device+0xe/0x20
[ 2773.646215]  pci_iov_remove_virtfn+0xba/0x120
[ 2773.646862]  sriov_disable+0x2f/0xe0
[ 2773.647531]  ice_free_vfs+0x2f8/0x350 [ice]
[ 2773.648207]  ice_sriov_configure+0x94/0x960 [ice]
[ 2773.648883]  ? _kstrtoull+0x3b/0x90
[ 2773.649560]  sriov_numvfs_store+0x10a/0x190
[ 2773.650249]  kernfs_fop_write+0x116/0x190
[ 2773.650948]  vfs_write+0xa5/0x1a0
[ 2773.651651]  ksys_write+0x4f/0xb0
[ 2773.652358]  do_syscall_64+0x5b/0x1a0
[ 2773.653075]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 22ead37f8a ("i40evf: Add longer wait after remove module")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29 13:11:53 -07:00
Karen Sornek
247aa001b7 iavf: Add helper function to go from pci_dev to adapter
Add helper function to go from pci_dev to adapter to make work simple -
to go from a pci_dev to the adapter structure and make netdev assignment
instead of having to go to the net_device then the adapter.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29 13:11:53 -07:00
Brett Creeley
4a15022f82 virtchnl: Use the BIT() macro for capability/offload flags
Currently raw hex values are used to define specific bits for each
capability/offload in virtchnl.h. Using raw hex values makes it
unclear which bits are used/available. Fix this by using the BIT()
macro so it's immediately obvious which bits are used/available.

Also, move the VIRTCHNL_VF_CAP_ADV_LINK_SPEED define in the correct
place to line up with the other bit values and add a comment for its
purpose.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29 13:11:53 -07:00
Brett Creeley
5bf84b2993 virtchnl: Remove unused VIRTCHNL_VF_OFFLOAD_RSVD define
Remove unused define that is currently marked as reserved. This will
open up space for a new feature if/when it's introduced. Also, there is
no reason to keep unused defines around.

Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29 13:11:31 -07:00
Christophe JAILLET
7f98960c04 i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
A successful 'clk_prepare()' call should be balanced by a corresponding
'clk_unprepare()' call in the error handling path of the probe, as already
done in the remove function.

More specifically, 'clk_prepare_enable()' is used, but 'clk_disable()' is
also already called. So just the unprepare step has still to be done.

Update the error handling path accordingly.

Fixes: 75d31c2372 ("i2c: xlr: add support for Sigma Designs controller variant")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-10-29 22:04:35 +02:00
Tian Tao
5fe058b04d i2c: qup: move to use request_irq by IRQF_NO_AUTOEN flag
disable_irq() after request_irq() still has a time gap in which
interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will
disable IRQ auto-enable because of requesting.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-10-29 21:57:33 +02:00
Randy Dunlap
8e98c4f5c3 i2c: qup: fix a trivial typo
Correct the typo of "reamining" to "remaining".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-10-29 21:53:32 +02:00
Dmitry Osipenko
ef3fe574d4 i2c: tegra: Ensure that device is suspended before driver is removed
Tegra I2C device isn't guaranteed to be suspended after removal of
the driver since driver uses pm_runtime_put() that is asynchronous and
pm_runtime_disable() cancels pending power-change requests. This means
that potentially refcount of the clocks may become unbalanced after
removal of the driver. This a very minor problem which unlikely to
happen in practice and won't cause any visible problems, nevertheless
let's replace pm_runtime_disable() with pm_runtime_force_suspend() and
use pm_runtime_put_sync() which disables RPM of the device and puts it
into suspend before driver is removed.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-10-29 21:51:31 +02:00
Wolfram Sang
e4f2647585 Merge tag 'at24-updates-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-mergewindow
at24 updates for v5.16

- add two new compatible entries to the DT bindings
2021-10-29 21:33:33 +02:00
Eric W. Biederman
e21294a7aa signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
0fdc0c4279 exit/r8188eu: Replace the macro thread_exit with a simple return 0
The macro thread_exit is called is at the end of functions started
with kthread_run.  The code in kthread_run has arranged things so a
kernel thread can just return and do_exit will be called.

So just have rtw_cmd_thread and mp_xmit_packet_thread return instead
of calling complete_and_exit.

Link: https://lkml.kernel.org/r/20211020174406.17889-20-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
99d7ef1e47 exit/rtl8712: Replace the macro thread_exit with a simple return 0
The macro thread_exit is called is at the end of a function started
with kthread_run.  The code in kthread_run has arranged things so a
kernel thread can just return and do_exit will be called.

So just have the cmd_thread return instead of calling complete_and_exit.

Link: https://lkml.kernel.org/r/20211020174406.17889-19-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
501c887227 exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
Every place thread_exit is called is at the end of a function started
with kthread_run.  The code in kthread_run has arranged things so a
kernel thread can just return and do_exit will be called.

So just have the threads return instead of calling complete_and_exit.

Link: https://lkml.kernel.org/r/20211020174406.17889-18-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
695dd0d634 signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
Directly calling do_exit with a signal number has the problem that
all of the side effects of the signal don't happen, such as
killing all of the threads of a process instead of just the
calling thread.

So replace do_exit(SIGSYS) with force_fatal_sig(SIGSYS) which
causes the signal handling to take it's normal path and work
as expected.

Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20211020174406.17889-17-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
086ec444f8 signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
Modify the 32bit version of setup_rt_frame and setup_frame to act
similar to the 64bit version of setup_rt_frame and fail with a signal
instead of calling do_exit.

Replacing do_exit(SIGILL) with force_fatal_signal(SIGILL) ensures that
the process will be terminated cleanly when the stack frame is
invalid, instead of just killing off a single thread and leaving the
process is a weird state.

Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Link: https://lkml.kernel.org/r/20211020174406.17889-16-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
c317d306d5 signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
The function try_to_clear_window_buffer is only called from
rtrap_32.c.  After it is called the signal pending state is retested,
and signals are handled if TIF_SIGPENDING is set.  This allows
try_to_clear_window_buffer to call force_fatal_signal and then rely on
the signal being delivered to kill the process, without any danger of
returning to userspace, or otherwise using possible corrupt state on
failure.

The functional difference between force_fatal_sig and do_exit is that
do_exit will only terminate a single thread, and will never trigger a
core-dump.  A multi-threaded program for which a single thread
terminates unexpectedly is hard to reason about.  Calling force_fatal_sig
does not give userspace a chance to catch the signal, but otherwise
is an ordinary fatal signal exit, and it will trigger a coredump
of the offending process if core dumps are enabled.

Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Link: https://lkml.kernel.org/r/20211020174406.17889-15-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:33 -05:00
Eric W. Biederman
941edc5bf1 exit/syscall_user_dispatch: Send ordinary signals on failure
Use force_fatal_sig instead of calling do_exit directly.  This ensures
the ordinary signal handling path gets invoked, core dumps as
appropriate get created, and for multi-threaded processes all of the
threads are terminated not just a single thread.

When asked Gabriel Krisman Bertazi <krisman@collabora.com> said [1]:
> ebiederm@xmission.com (Eric W. Biederman) asked:
>
> > Why does do_syscal_user_dispatch call do_exit(SIGSEGV) and
> > do_exit(SIGSYS) instead of force_sig(SIGSEGV) and force_sig(SIGSYS)?
> >
> > Looking at the code these cases are not expected to happen, so I would
> > be surprised if userspace depends on any particular behaviour on the
> > failure path so I think we can change this.
>
> Hi Eric,
>
> There is not really a good reason, and the use case that originated the
> feature doesn't rely on it.
>
> Unless I'm missing yet another problem and others correct me, I think
> it makes sense to change it as you described.
>
> > Is using do_exit in this way something you copied from seccomp?
>
> I'm not sure, its been a while, but I think it might be just that.  The
> first prototype of SUD was implemented as a seccomp mode.

If at some point it becomes interesting we could relax
"force_fatal_sig(SIGSEGV)" to instead say
"force_sig_fault(SIGSEGV, SEGV_MAPERR, sd->selector)".

I avoid doing that in this patch to avoid making it possible
to catch currently uncatchable signals.

Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
[1] https://lkml.kernel.org/r/87mtr6gdvi.fsf@collabora.com
Link: https://lkml.kernel.org/r/20211020174406.17889-14-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:33 -05:00
Eric W. Biederman
26d5badbcc signal: Implement force_fatal_sig
Add a simple helper force_fatal_sig that causes a signal to be
delivered to a process as if the signal handler was set to SIG_DFL.

Reimplement force_sigsegv based upon this new helper.  This fixes
force_sigsegv so that when it forces the default signal handler
to be used the code now forces the signal to be unblocked as well.

Reusing the tested logic in force_sig_info_to_task that was built for
force_sig_seccomp this makes the implementation trivial.

This is interesting both because it makes force_sigsegv simpler and
because there are a couple of buggy places in the kernel that call
do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight
forward way today for those places to simply force the exit of a
process with the chosen signal.  Creating force_fatal_sig allows
those places to be implemented with normal signal exits.

Link: https://lkml.kernel.org/r/20211020174406.17889-13-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:33 -05:00
Eric W. Biederman
111e70490d exit/kthread: Have kernel threads return instead of calling do_exit
In 2009 Oleg reworked[1] the kernel threads so that it is not
necessary to call do_exit if you are not using kthread_stop().  Remove
the explicit calls of do_exit and complete_and_exit (with a NULL
completion) that were previously necessary.

[1] 63706172f3 ("kthreads: rework kthread_stop()")
Link: https://lkml.kernel.org/r/20211020174406.17889-12-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:33 -05:00
Eric W. Biederman
9bc508cf07 signal/s390: Use force_sigsegv in default_trap_handler
Reading the history it is unclear why default_trap_handler calls
do_exit.  It is not even menthioned in the commit where the change
happened.  My best guess is that because it is unknown why the
exception happened it was desired to guarantee the process never
returned to userspace.

Using do_exit(SIGSEGV) has the problem that it will only terminate one
thread of a process, leaving the process in an undefined state.

Use force_sigsegv(SIGSEGV) instead which effectively has the same
behavior except that is uses the ordinary signal mechanism and
terminates all threads of a process and is generally well defined.

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Fixes: ca2ab03237ec ("[PATCH] s390: core changes")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lkml.kernel.org/r/20211020174406.17889-11-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:05 -05:00
Jarkko Nikula
1ad5dc3540 i2c: i801: Fix incorrect and needless software PEC disabling
Commit a6b8bb6a81 ("i2c: i801: Fix handling SMBHSTCNT_PEC_EN")
attempts to disable software PEC by clearing the SMBHSTCNT_PEC_EN (bit 7)
in the SMBus Host Control register (I/O SMBHSTCNT) but incorrectly
clears it in the PCI Host Configuration register (PCI SMBHSTCFG).

This clearing is actually needless since after above commit the
SMBHSTCNT_PEC_EN is never set and the register is initialized with known
values.

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-10-29 21:30:26 +02:00
Shuah Khan
f35dcaa0a8 selftests/core: fix conflicting types compile error for close_range()
close_range() test type conflicts with close_range() library call in
x86_64-linux-gnu/bits/unistd_ext.h. Fix it by changing the name to
core_close_range().

gcc -g -I../../../../usr/include/    close_range_test.c  -o ../tools/testing/selftests/core/close_range_test
In file included from close_range_test.c:16:
close_range_test.c:57:6: error: conflicting types for ‘close_range’; have ‘void(struct __test_metadata *)’
   57 | TEST(close_range)
      |      ^~~~~~~~~~~
../kselftest_harness.h:181:21: note: in definition of macro ‘__TEST_IMPL’
  181 |         static void test_name(struct __test_metadata *_metadata); \
      |                     ^~~~~~~~~
close_range_test.c:57:1: note: in expansion of macro ‘TEST’
   57 | TEST(close_range)
      | ^~~~
In file included from /usr/include/unistd.h:1204,
                 from close_range_test.c:13:
/usr/include/x86_64-linux-gnu/bits/unistd_ext.h:56:12: note: previous declaration of ‘close_range’ with type ‘int(unsigned int,  unsigned int,  int)’
   56 | extern int close_range (unsigned int __fd, unsigned int __max_fd,
      |            ^~~~~~~~~~~

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-10-29 13:09:42 -06:00
Daniel Latypov
52a5d80a22 kunit: tool: fix typecheck errors about loading qemu configs
Currently, we have these errors:
$ mypy ./tools/testing/kunit/*.py
tools/testing/kunit/kunit_kernel.py:213: error: Item "_Loader" of "Optional[_Loader]" has no attribute "exec_module"
tools/testing/kunit/kunit_kernel.py:213: error: Item "None" of "Optional[_Loader]" has no attribute "exec_module"
tools/testing/kunit/kunit_kernel.py:214: error: Module has no attribute "QEMU_ARCH"
tools/testing/kunit/kunit_kernel.py:215: error: Module has no attribute "QEMU_ARCH"

exec_module
===========

pytype currently reports no errors, but that's because there's a comment
disabling it on 213.

This is due to https://github.com/python/typeshed/pull/2626.
The fix is to assert the loaded module implements the ABC
(abstract base class) we want which has exec_module support.

QEMU_ARCH
=========

pytype is fine with this, but mypy is not:
https://github.com/python/mypy/issues/5059

Add a check that the loaded module does indeed have QEMU_ARCH.
Note: this is not enough to appease mypy, so we also add a comment to
squash the warning.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-10-29 13:05:47 -06:00
Ben Widawsky
c6d7e1341c ocxl: Use pci core's DVSEC functionality
Reduce maintenance burden of DVSEC query implementation by using the
centralized PCI core implementation.

There are two obvious places to simply drop in the new core
implementation. There remains find_dvsec_from_pos() which would benefit
from using a core implementation. As that change is less trivial it is
reserved for later.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> (v1)
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Link: https://lore.kernel.org/r/163379789065.692348.7117946955275586530.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:52 -07:00
Ben Widawsky
55006a2c94 cxl/pci: Use pci core's DVSEC functionality
Reduce maintenance burden of DVSEC query implementation by using the
centralized PCI core implementation.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: kill cxl_pci_dvsec()]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163379788528.692348.11581080806976608802.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:52 -07:00
Ben Widawsky
ee12203746 PCI: Add pci_find_dvsec_capability to find designated VSEC
Add pci_find_dvsec_capability to locate a Designated Vendor-Specific
Extended Capability with the specified Vendor ID and Capability ID.

The Designated Vendor-Specific Extended Capability (DVSEC) allows one or
more "vendor" specific capabilities that are not tied to the Vendor ID
of the PCI component. Where the DVSEC Vendor may be a standards body
like CXL.

Cc: David E. Box <david.e.box@linux.intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-pci@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163379787943.692348.6814373487017444007.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Ben Widawsky
85afc3175a cxl/pci: Split cxl_pci_setup_regs()
In preparation for moving parts of register mapping to cxl_core, split
cxl_pci_setup_regs() into a helper that finds register blocks,
(cxl_find_regblock()), and a generic wrapper that probes the precise
register sets within a block (cxl_setup_regs()).

Move the actual mapping (cxl_map_regs()) of the only register-set that
cxl_pci cares about (memory device registers) up a level from the former
cxl_pci_setup_regs() into cxl_pci_probe().

With this change the unused component registers are no longer mapped,
but the helpers are primed to move into the core.

[djbw: drop cxl_map_regs() for component registers]

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: rebase on the cxl_register_map refactor]
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163434053788.914258.18412599112859205220.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Dan Williams
a261e9a157 cxl/pci: Add @base to cxl_register_map
In addition to carrying @barno, @block_offset, and @reg_type, add @base
to keep all map/unmap parameters in one object. The helpers
cxl_{map,unmap}_regblock() handle adjusting @base to the @block_offset
at map and unmap time.

Document that @base incorporates @block_offset so that downstream
consumers of a mapped cxl_register_map instance do not need perform any
fixups / can use @base directly.

Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163433497228.889435.11271988238496181536.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Ben Widawsky
7dc7a64de2 cxl/pci: Make more use of cxl_register_map
The structure exists to pass around information about register mapping.
Use it for passing @barno and @block_offset, and eliminate duplicate
local variables.

The helpers that use @map do not care about @cxlm, so just pass them a
pdev instead.

[djbw: reorder before cxl_pci_setup_regs() refactor to improver readability]

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[djbw: separate @base conversion]
Link: https://lore.kernel.org/r/163416901172.806743.10056306321247850914.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Ben Widawsky
84e36a9d1b cxl/pci: Remove pci request/release regions
Quoting Dan, "... the request + release regions should probably just be
dropped. It's not like any of the register enumeration would collide
with someone else who already has the registers mapped. The collision
only comes when the registers are mapped for their final usage, and that
will have more precision in the request."

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/163379785872.692348.8981679111988251260.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Dan Williams
ca76a3a805 cxl/pci: Fix NULL vs ERR_PTR confusion
cxl_pci_map_regblock() may return an ERR_PTR(), but cxl_pci_setup_regs()
is only prepared for NULL as the error case. Pick the minimal fix for
-stable backport purposes and just have cxl_pci_map_regblock() return
NULL for errors.

Fixes: f8a7e8c29b ("cxl/pci: Reserve all device regions at once")
Cc: <stable@vger.kernel.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/163433325724.834522.17809774578178224149.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Ben Widawsky
d22fed9c2b cxl/pci: Remove dev_dbg for unknown register blocks
While interesting to driver developers, the dev_dbg message doesn't do
much except clutter up logs. This information should be attainable
through sysfs, and someday lspci like utilities. This change
additionally helps reduce the LOC in a subsequent patch to refactor some
of cxl_pci register mapping.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/163379784717.692348.3478221381958300790.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Ben Widawsky
cdcce47cb3 cxl/pci: Convert register block identifiers to an enum
In preparation for passing around the Register Block Indicator (RBI) as
a parameter, it is desirable to convert the type to an enum so that the
interface can use a well defined parameter.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: changelog fixups ]
Link: https://lore.kernel.org/r/163379784199.692348.4366131432595112771.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-29 11:53:51 -07:00
Marcin Szycik
bfaaba99e6 ice: Hide bus-info in ethtool for PRs in switchdev mode
Disable showing bus-info information for port representors in switchdev
mode. This fixes a bug that caused displaying wrong netdev descriptions in
lshw tool - one port representor displayed PF branding string, and in turn
one PF displayed a "generic" description. The bug occurs when many devices
show the same bus-info in ethtool, which was the case in switchdev mode (PF
and its port representors displayed the same bus-info). The bug occurs only
if a port representor netdev appears before PF netdev in /proc/net/dev.

In the examples below:
ens6fX is PF
ens6fXvY is VF
ethX is port representor
One irrelevant column was removed from output

Before:
$ sudo lshw -c net -businfo
Bus info          Device      Description
=========================================
pci@0000:02:00.0  eth102       Ethernet Controller E810-XXV for SFP
pci@0000:02:00.1  ens6f1       Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0  ens6f0v0     Ethernet Adaptive Virtual Function
pci@0000:02:01.1  ens6f0v1     Ethernet Adaptive Virtual Function
pci@0000:02:01.2  ens6f0v2     Ethernet Adaptive Virtual Function
pci@0000:02:00.0  ens6f0       Ethernet interface

Notice that eth102 and ens6f0 have the same bus-info and their descriptions
are swapped.

After:
$ sudo lshw -c net -businfo
Bus info          Device      Description
=========================================
pci@0000:02:00.0  ens6f0      Ethernet Controller E810-XXV for SFP
pci@0000:02:00.1  ens6f1      Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0  ens6f0v0    Ethernet Adaptive Virtual Function
pci@0000:02:01.1  ens6f0v1    Ethernet Adaptive Virtual Function
pci@0000:02:01.2  ens6f0v2    Ethernet Adaptive Virtual Function

Fixes: 7aae80cef7 ("ice: add port representor ethtool ops and stats")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29 11:43:15 -07:00