Commit graph

1042671 commits

Author SHA1 Message Date
Fuad Tabba
dabb1667d8 KVM: arm64: Fix names of config register fields
Change the names of hcr_el2 register fields to match the Arm
Architecture Reference Manual. Easier for cross-referencing and
for grepping.

Also, change the name of CPTR_EL2_RES1 to CPTR_NVHE_EL2_RES1,
because res1 bits are different for VHE.

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-5-tabba@google.com
2021-08-20 11:12:17 +01:00
Fuad Tabba
d6c850dd6c KVM: arm64: MDCR_EL2 is a 64-bit register
Fix the places in KVM that treat MDCR_EL2 as a 32-bit register.
More recent features (e.g., FEAT_SPEv1p2) use bits above 31.

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-4-tabba@google.com
2021-08-20 11:12:17 +01:00
Fuad Tabba
e6bc555c96 KVM: arm64: Remove trailing whitespace in comment
Remove trailing whitespace from comment in trap_dbgauthstatus_el1().

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210817081134.2918285-3-tabba@google.com
2021-08-20 11:12:16 +01:00
Fuad Tabba
2ea7f65580 KVM: arm64: placeholder to check if VM is protected
Add a function to check whether a VM is protected (under pKVM).
Since the creation of protected VMs isn't enabled yet, this is a
placeholder that always returns false. The intention is for this
to become a check for protected VMs in the future (see Will's RFC).

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/kvmarm/20210603183347.1695-1-will@kernel.org/
Link: https://lore.kernel.org/r/20210817081134.2918285-2-tabba@google.com
2021-08-20 11:12:05 +01:00
Mark Brown
83e5dcbece kselftest/arm64: mte: Fix misleading output when skipping tests
When skipping the tests due to a lack of system support for MTE we
currently print a message saying FAIL which makes it look like the test
failed even though the test did actually report KSFT_SKIP, creating some
confusion. Change the error message to say SKIP instead so things are
clearer.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210819172902.56211-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-20 11:11:05 +01:00
Luke D. Jones
c63d44ae60 asus-wmi: Add support for platform_profile
Add initial support for platform_profile where the support is
based on availability of ASUS_THROTTLE_THERMAL_POLICY.

Because throttle_thermal_policy is used by platform_profile and is
writeable separately to platform_profile any userspace changes to
throttle_thermal_policy need to notify platform_profile.

In future throttle_thermal_policy sysfs should be removed so that
only one method controls the laptop power profile.

Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20210818190731.19170-2-luke@ljones.dev
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:42 +02:00
Matan Ziv-Av
ae26278829 platform/x86: lg-laptop: Use correct event for keyboard backlight FN-key
Use led_classdev_notify_brightness_hw_changed() instead of F16 key.

Signed-off-by: Matan Ziv-Av <matan@svgalib.org>
Link: https://lore.kernel.org/r/2196990f167efe6a42d51fb85f4db4cdf4d9e80e.1629291912.git.matan@svgalib.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:42 +02:00
Matan Ziv-Av
85973bf4c1 platform/x86: lg-laptop: Use correct event for touchpad toggle FN-key
Send F21 which is the standard for this key, instead of F13.

Signed-off-by: Matan Ziv-Av <matan@svgalib.org>
Link: https://lore.kernel.org/r/b847895c1f170e2e59df5757a4d603d28149f648.1629291912.git.matan@svgalib.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:41 +02:00
Matan Ziv-Av
8983bfd58d platform/x86: lg-laptop: Support for battery charge limit on newer models
Add support for the difference between various models:

- Use dmi to detect laptop model.
- 2019 and newer models use _wmbb method to set battery charge limit.

Signed-off-by: Matan Ziv-Av <matan@svgalib.org>
Link: https://lore.kernel.org/r/bd6922a412e50c2dcfb7ce24fc8687f577181d65.1629291912.git.matan@svgalib.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:41 +02:00
Shravan S
dcfbd31ef4 platform/x86: BIOS SAR driver for Intel M.2 Modem
Dynamic BIOS SAR driver exposing dynamic SAR information from BIOS

The Dynamic SAR (Specific Absorption Rate) driver uses ACPI DSM
(Device Specific Method) to communicate with BIOS and retrieve
dynamic SAR information and change notifications. The driver uses
sysfs to expose this data to userspace via read and notify.

Sysfs interface is documented in detail under:
Documentation/ABI/testing/sysfs-driver-intc_sar

Signed-off-by: Shravan S <s.shravan@intel.com>
Link: https://lore.kernel.org/r/20210723211452.27995-2-s.shravan@intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:41 +02:00
David E. Box
ef195e8a7f platform/x86: intel_pmt_telemetry: Ignore zero sized entries
Some devices may expose non-functioning entries that are reserved for
future use. These entries have zero size. Ignore them during probe.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20210817224018.1013192-5-david.e.box@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:41 +02:00
Meng Dong
3ae86d2d47 platform/x86: ideapad-laptop: Fix Legion 5 Fn lock LED
This patch fixes the bug 212671.
Althrough the Fn lock (Fn + Esc) works on Legion 5 (R7000P), its LED
light does not change with the state. This modification sets the Fn lock
state to its current value on receiving the wmi event
8FC0DE0C-B4E4-43FD-B0F3-8871711C1294 to update the LED state.

Signed-off-by: Meng Dong <whenov@gmail.com>
Link: https://lore.kernel.org/r/20210817171203.12855-1-whenov@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:37 +02:00
Thomas Weißschuh
30f64e2066 platform/x86: gigabyte-wmi: add support for B450M S2H V2
Reported as working here:
https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-901207693

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20210818164435.99821-1-linux@weissschuh.net
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-20 12:09:37 +02:00
Eddie James
239f32b4f1 leds: pca955x: Switch to i2c probe_new
The deprecated i2c probe functionality doesn't work with OF
compatible strings, as it only checks for the i2c device id. Switch
to the new way of probing and grab the match data to select the
chip type.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 11:00:08 +02:00
Eddie James
7c48159292 leds: pca955x: Let the core process the fwnode
Much of the fwnode processing in the PCA955x driver is now in the
LEDs core driver, so pass the fwnode in the init data when
registering the LED device. In order to preserve the existing naming
scheme, check for an empty name and set it to the LED number.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 11:00:08 +02:00
Eddie James
e46cb6d0c7 leds: pca955x: Implement the default-state property
In order to retain the LED state after a system reboot, check the
documented default-state device tree property during initialization.
Modify the behavior of the probe according to the property.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 11:00:07 +02:00
Eddie James
7086625fde leds: pca955x: Add brightness_get function
Add a function to fetch the state of the hardware LED.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 11:00:07 +02:00
Eddie James
2420ae02ce leds: pca955x: Clean up code formatting
Format the code. Add some variables to help shorten lines.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 11:00:06 +02:00
Eddie James
419066324e leds: leds-core: Implement the retain-state-shutdown property
Read the retain-state-shutdown device tree property to set the
existing LED_RETAIN_AT_SHUTDOWN flag. Then check the flag when
unregistering, and if set, don't set the brightness to OFF. This
is useful for systems that want to keep the HW state of the LED
across reboots.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 11:00:06 +02:00
Eddie James
5d823d6d69 dt-bindings: leds: Add retain-state-shutdown boolean
Document the retain-state-shutdown property that indicates that a LED
should not be turned off or changed during system shutdown.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 11:00:05 +02:00
Pavel Machek
09f1273064 Documentation: leds: standartizing LED names
We have a list of valid functions, but LED names in sysfs are still
far from being consistent. Create list of "well known" LED names so we
nudge people towards using same LED names (except color) for same
functionality.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-20 10:26:24 +02:00
Marc Zyngier
cf364e08ea KVM: arm64: Upgrade VMID accesses to {READ,WRITE}_ONCE
Since TLB invalidation can run in parallel with VMID allocation,
we need to be careful and avoid any sort of load/store tearing.
Use {READ,WRITE}_ONCE consistently to avoid any surprise.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jade Alglave <jade.alglave@arm.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20210806113109.2475-6-will@kernel.org
2021-08-20 09:12:24 +01:00
Marc Zyngier
4efc0ede4f KVM: arm64: Unify stage-2 programming behind __load_stage2()
The protected mode relies on a separate helper to load the
S2 context. Move over to the __load_guest_stage2() helper
instead, and rename it to __load_stage2() to present a unified
interface.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jade Alglave <jade.alglave@arm.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210806113109.2475-5-will@kernel.org
2021-08-20 09:11:28 +01:00
Marc Zyngier
923a547d71 KVM: arm64: Move kern_hyp_va() usage in __load_guest_stage2() into the callers
It is a bit awkward to use kern_hyp_va() in __load_guest_stage2(),
specially as the helper is shared between VHE and nVHE.

Instead, move the use of kern_hyp_va() in the nVHE code, and
pass a pointer to the kvm->arch structure instead. Although
this may look a bit awkward, it allows for some further simplification.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jade Alglave <jade.alglave@arm.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210806113109.2475-4-will@kernel.org
2021-08-20 09:03:42 +01:00
Marc Zyngier
3134cc8beb KVM: arm64: vgic: Resample HW pending state on deactivation
When a mapped level interrupt (a timer, for example) is deactivated
by the guest, the corresponding host interrupt is equally deactivated.
However, the fate of the pending state still needs to be dealt
with in SW.

This is specially true when the interrupt was in the active+pending
state in the virtual distributor at the point where the guest
was entered. On exit, the pending state is potentially stale
(the guest may have put the interrupt in a non-pending state).

If we don't do anything, the interrupt will be spuriously injected
in the guest. Although this shouldn't have any ill effect (spurious
interrupts are always possible), we can improve the emulation by
detecting the deactivation-while-pending case and resample the
interrupt.

While we're at it, move the logic into a common helper that can
be shared between the two GIC implementations.

Fixes: e40cc57bac ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts")
Reported-by: Raghavendra Rao Ananta <rananta@google.com>
Tested-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org
2021-08-20 08:53:22 +01:00
Steve French
e7a10ed7d7
Merge pull request #66 from namjaejeon/cifsd-for-next
ksmbd-fixes
2021-08-20 02:07:30 -05:00
Kajol Jain
f9addd85fb powerpc/perf/hv-gpci: Fix counter value parsing
H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in
the result buffer. Result buffer has specific format defined in the PAPR
specification. One of the fields is counter offset and width of the
counter data returned.

Counter data are returned in a unsigned char array in big endian byte
order. To get the final counter data, the values must be left shifted
byte at a time. But commit 220a0c609a ("powerpc/perf: Add support for
the hv gpci (get performance counter info) interface") made the shifting
bitwise and also assumed little endian order. Because of that, hcall
counters values are reported incorrectly.

In particular this can lead to counters go backwards which messes up the
counter prev vs now calculation and leads to huge counter value
reporting:

  #: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
           -C 0 -I 1000
        time             counts unit events
     1.000078854 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     2.000213293                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     3.000320107                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     4.000428392                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     5.000537864                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     6.000649087                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     7.000760312                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     8.000865218             16,448      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     9.000978985 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    10.001088891             16,384      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    11.001201435                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    12.001307937 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/

Fix the shifting logic to correct match the format, ie. read bytes in
big endian order.

Fixes: e4f226b158 ("powerpc/perf/hv-gpci: Increase request buffer size")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813082158.429023-1-kjain@linux.ibm.com
2021-08-20 17:00:54 +10:00
Finn Thain
6cd717fe9b powerpc/tau: Add 'static' storage qualifier to 'tau_work' definition
This patch prevents the following sparse warning.

arch/powerpc/kernel/tau_6xx.c:199:1: sparse: sparse: symbol 'tau_work'
was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/44ab381741916a51e783c4a50d0b186abdd8f280.1629334014.git.fthain@linux-m68k.org
2021-08-20 17:00:53 +10:00
Namjae Jeon
a9a27d4ab3 ksmbd: don't set FILE DELETE and FILE_DELETE_CHILD in access mask by default
When there is no dacl in request, ksmbd send dacl that coverted by using
file permission. This patch don't set FILE DELETE and FILE_DELETE_CHILD
in access mask by default.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-08-20 15:45:44 +09:00
Linus Lüssing
a006aa51ea batman-adv: bcast: remove remaining skb-copy calls
We currently have two code paths for broadcast packets:

A) self-generated, via batadv_interface_tx()->
   batadv_send_bcast_packet().
B) received/forwarded, via batadv_recv_bcast_packet()->
   batadv_forw_bcast_packet().

For A), self-generated broadcast packets:

The only modifications to the skb data is the ethernet header which is
added/pushed to the skb in
batadv_send_broadcast_skb()->batadv_send_skb_packet(). However before
doing so, batadv_skb_head_push() is called which calls skb_cow_head() to
unshare the space for the to be pushed ethernet header. So for this
case, it is safe to use skb clones.

For B), received/forwarded packets:

The same applies as in A) for the to be forwarded packets. Only the
ethernet header is added. However after (queueing for) forwarding the
packet in batadv_recv_bcast_packet()->batadv_forw_bcast_packet(), a
packet is additionally decapsulated and is sent up the stack through
batadv_recv_bcast_packet()->batadv_interface_rx().

Protocols higher up the stack are already required to check if the
packet is shared and create a copy for further modifications. When the
next (protocol) layer works correctly, it cannot happen that it tries to
operate on the data behind the skb clone which is still queued up for
forwarding.

Co-authored-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-08-20 08:17:10 +02:00
Sven Eckelmann
a2b7b148d9 batman-adv: Drop NULL check before dropping references
The check if a batman-adv related object is NULL or not is now directly in
the batadv_*_put functions. It is not needed anymore to perform this check
outside these function:

The changes were generated using a coccinelle semantic patch:

  @@
  expression E;
  @@
  - if (likely(E != NULL))
  (
  batadv_backbone_gw_put
  |
  batadv_claim_put
  |
  batadv_dat_entry_put
  |
  batadv_gw_node_put
  |
  batadv_hardif_neigh_put
  |
  batadv_hardif_put
  |
  batadv_nc_node_put
  |
  batadv_nc_path_put
  |
  batadv_neigh_ifinfo_put
  |
  batadv_neigh_node_put
  |
  batadv_orig_ifinfo_put
  |
  batadv_orig_node_put
  |
  batadv_orig_node_vlan_put
  |
  batadv_softif_vlan_put
  |
  batadv_tp_vars_put
  |
  batadv_tt_global_entry_put
  |
  batadv_tt_local_entry_put
  |
  batadv_tt_orig_list_entry_put
  |
  batadv_tt_req_node_put
  |
  batadv_tvlv_container_put
  |
  batadv_tvlv_handler_put
  )(E);

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-08-20 08:17:10 +02:00
Sven Eckelmann
e78783da56 batman-adv: Check ptr for NULL before reducing its refcnt
The commit b37a466837 ("netdevice: add the case if dev is NULL") changed
the way how the NULL check for net_devices have to be handled when trying
to reduce its reference counter. Before this commit, it was the
responsibility of the caller to check whether the object is NULL or not.
But it was changed to behave more like kfree. Now the callee has to handle
the NULL-case.

The batman-adv code was scanned via cocinelle for similar places. These
were changed to use the paradigm

  @@
  identifier E, T, R, C;
  identifier put;
  @@
   void put(struct T *E)
   {
  +	if (!E)
  +		return;
  	kref_put(&E->C, R);
   }

Functions which were used in other sources files were moved to the header
to allow the compiler to inline the NULL check and the kref_put call.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-08-20 08:17:10 +02:00
Sven Eckelmann
5520722718 batman-adv: Switch to kstrtox.h for kstrtou64
The commit 4c52729377 ("kernel.h: split out kstrtox() and simple_strtox()
to a separate header") moved the kstrtou64 function to a new header called
linux/kstrtox.h.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-08-20 08:17:10 +02:00
Sven Eckelmann
3baa9f522a batman-adv: Move IRC channel to hackint.org
Due to recent developments around the Freenode.org IRC network, the
opinions about the usage of this service shifted dramatically. The majority
of the still active users of the #batman channel prefers a move to the
hackint.org network.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-08-20 08:17:05 +02:00
Dave Airlie
daa7772d47 Merge tag 'amd-drm-fixes-5.14-2021-08-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.14-2021-08-18:

amdgpu:
- vega10 SMU workload fix
- DCN VM fix
- DCN 3.01 watermark fix

amdkfd:
- SVM fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210818225137.4070-1-alexander.deucher@amd.com
2021-08-20 15:13:56 +10:00
Dmytro Linkin
3202ea65f8 net/mlx5: E-switch, Add QoS tracepoints
Add tracepoints to log QoS enabling/disabling/configuration for vports
and rate groups.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:41 -07:00
Dmytro Linkin
0fe132eac3 net/mlx5: E-switch, Allow to add vports to rate groups
Implement eswitch API that allows updating rate groups. If group
pointer is NULL, then move the vport to internal unlimited group zero.

Implement devlink_ops->rate_parent_node_set() callback in the terms of
the new eswitch group update API.

Enable QoS for all group's elements if a group has allocated BW share.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:40 -07:00
Dmytro Linkin
f47e04eb96 net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups
Provide eswitch API to allow controlling group rate limits. Use it to
implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set().

The share rate will create relative bandwidth share on the groups level
while within the group the user can set shared rate on the member vports
of that group and this rate will be relative to the group's share rate.
The group with the highest shared rate will get a BW share of 100 and
the rest of the groups will get a value that reflects the ratio between
their share rate and the maximum share rate.

Example:
Created four rate groups with tx_share limits:

$ devlink port function rate add \
    pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_4 tx_share 10gbit

Assuming link speed is 50 Gbit/sec ratio divider will be
50 / (30+20+20+10) = 0.625. Normalized rate values for the groups:

<group_1> 30 * 0.625 = 18.75 Gbit/sec
<group_2> 20 * 0.625 = 12.5 Gbit/sec
<group_3> 20 * 0.625 = 12.5 Gbit/sec
<group_4> 10 * 0.625 = 6.25 Gbit/sec

Rate group with unlimited tx_share rate will receive minimum BW value
(1Mbit/sec) if presented any group with tx_share rate limit. This allow
to not drop all packets in case of heavy traffic.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:40 -07:00
Dmytro Linkin
1ae258f8b3 net/mlx5: E-switch, Introduce rate limiting groups API
Extend eswitch API with rate limiting groups:

- Define new struct mlx5_esw_rate_group that is used to hold all
  internal group data.

- Implement functions that allow creation, destruction and cleanup of
  groups.

- Assign all vports to internal unlimited zero group by default.

This commit lays the groundwork for group rate limiting by implementing
devlink_ops->rate_node_{new|del}() callbacks to support creating and
deleting groups through devlink rate node objects. APIs that allows
setting rates and adding/removing members are implemented in following
patches.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:40 -07:00
Dmytro Linkin
ad34f02fe2 net/mlx5: E-switch, Enable devlink port tx_{share|max} rate control
Register devlink rate leaf object for every eswitch vport.
Implement devlink ops that enable setting shared and max tx rates
through devlink API.
Extract common eswitch code from existing tx rate set function that is
accessed through NDO to be reused for the devlink. Values configured
with NDO API are not visible for the devlink API, therefore shouldn't be
used simultaneously.

When normalizing the BW share value, dividing the desired minimum rate
by the common divider results in losing information since the quotient
is rounded down. This has a significant affect on configurations of low
rate where the round down eliminates a large percentage of the total
rate. To improve the formula, round up the division result to make sure
that the BW share is at least the value it was supposed to be and won't
lost a significant amount of the expected value.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:39 -07:00
Dmytro Linkin
2d116e3e7e net/mlx5: E-switch, Move QoS related code to dedicated file
Move eswitch QoS related code into dedicated file. Provide eswitch API
to access this code meaning it is isolated and restricted to be used
only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
esw/legacy.c.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:39 -07:00
Chris Mi
2741f22309 net/mlx5e: TC, Support sample offload action for tunneled traffic
Currently the sample offload actions send the encapsulated packet
to software. This commit decapsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.

If decapsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:38 -07:00
Chris Mi
ee950e5db1 net/mlx5e: TC, Restore tunnel info for sample offload
Currently the sample offload actions send the encapsulated packet
to software. sFlow expects tunneled packets to be decapsulated while
having the tunnel properties on the skb metadata fields.

Reuse the functions used by connection tracking to map the outer
header properties to a unique id. The next patch  will use that id
to restore the tunnel information of decapsulated packets onto the
skb.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:38 -07:00
Chris Mi
d12e20ac06 net/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel
CONFIG_NET_TC_SKB_EXT controls the SKB extension support for
restoring chain ids. SKB extension is not required for tunnel
restoration.

Remove the CONFIG_NET_TC_SKB_EXT dependency as a pre-step for
using the tunnel restore methods for sample offload use cases.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:38 -07:00
Chris Mi
f0da4daa34 net/mlx5e: Refactor ct to use post action infrastructure
Move post action table management to common library providing
add/del/get API. Refactor the ct action offload to use the common
API.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:37 -07:00
Chris Mi
6f0b692a5a net/mlx5e: Introduce post action infrastructure
Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flowtable.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.

Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.

Currently the post-action design is implemented only by the ct
action. Introduce post action infrastructure as a pre-step for
reusing it with the sFlow offload feature. Init and destroy the
common post action table. Refactor the ct offload to use the
common post table infrastructure in the next patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:37 -07:00
Chris Mi
2799797845 net/mlx5e: CT, Use xarray to manage fte ids
IDR is deprecated. Use xarray instead.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:37 -07:00
Chris Mi
bcd6740c6b net/mlx5e: Move sample attribute to flow attribute
Currently it is in eswitch attribute. Move it to flow attribute to
reflect the change in previous patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:36 -07:00
Chris Mi
0027d70c73 net/mlx5e: Move esw/sample to en/tc/sample
Module sample belongs to en/tc instead of esw. Move it and rename
accordingly.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:36 -07:00
Saeed Mahameed
5024fa95a1 net/mlx5e: Remove mlx5e dependency from E-Switch sample
mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
this redundant dependency by passing the eswitch object directly to
the sample object constructor.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
2021-08-19 21:50:35 -07:00