Since commit f11a04464a ("i2c: gpio: Enable working over slow
can_sleep GPIOs"), probing the i2c RTC connected to an i2c-gpio bus on
r8a7740/armadillo fails with:
rtc-s35390a 0-0030: error resetting chip
rtc-s35390a: probe of 0-0030 failed with error -5
More debug code reveals:
i2c i2c-0: master_xfer[0] R, addr=0x30, len=1
i2c i2c-0: NAK from device addr 0x30 msg #0
s35390a_get_reg: ret = -6
Commit 02e479808b ("gpio: Alter semantics of *raw* operations to
actually be raw") moved open drain/source handling from
gpiod_set_raw_value_commit() to gpiod_set_value(), but forgot to take
into account that gpiod_set_value_cansleep() also needs this handling.
The i2c protocol mandates that i2c signals are open drain, hence i2c
communication fails.
Fix this by adding the missing handling to gpiod_set_value_cansleep(),
using a new common helper gpiod_set_value_nocheck().
Fixes: 02e479808b ("gpio: Alter semantics of *raw* operations to actually be raw")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
[removed underscore syntax, added kerneldoc]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Jakub Kicinski says:
====================
This series starts with a fix to Jesper's recent work, somehow I forgot
about control rings during review. Second patch is cleaning up a vNIC
header, in kdoc we should not use @ for #define constants. Aligning of
the top of the stack as well as bottom (last bytes will be unused) helps
the performance. We should check offload datapath's max MTU when program
is loaded and we can allow TC hw offload flag to be changed freely while
XDP offload is active.
Next group of patches adds more fully featured relocation support. Due
to limited amount of code space we only load the image to NIC's memory
when program is attached. Since we can't predict which programs are
loaded later, we should translate as if image was to be loaded at offset
zero and only apply relocations at load time. Many more advanced features
(eg. tail class, subprograms, dynamic allocation of program space and
sharing it between ports) will depend on this.
Nic adds support for signed comparison instructions.
Quentin makes use of the verifier log in our driver, the verifier print
function (verbose()) has to be renamed and exported.
v2:
- replace #define by function aliasing for verbose() in patch 13
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Now that `bpf_verifier_log_write()` is exported from the verifier and
makes it possible to reuse the verifier log to print messages to the
standard output, use this instead of the kernel logs in the nfp driver
for printing error messages occurring at verification time.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Rename the BPF verifier `verbose()` to `bpf_verifier_log_write()` and
export it, so that other components (in particular, drivers for BPF
offload) can reuse the user buffer log to dump error messages at
verification time.
Renaming `verbose()` was necessary in order to avoid a name so generic
to be exported to the global namespace. However to prevent too much pain
for backports, the calls to `verbose()` in the kernel BPF verifier were
not changed. Instead, use function aliasing to make `verbose` point to
`bpf_verifier_log_write`. Another solution could consist in making a
wrapper around `verbose()`, but since it is a variadic function, I don't
see a clean way without creating two identical wrappers, one for the
verifier and one to export.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds signed jump instructions (jsgt, jsge, jslt, jsle)
to the nfp jit. As well as adding the additional required raw
assembler branch mask to nfp_asm.h
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Instead of having an app callback per message type hand off
all offload-related handling to apps with one "rest of ndo_bpf"
callback.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To make absolute relocated branches (branches which will be completely
rewritten with br_set_offset()) distinguishable in user space dumps
from normal jumps add a large offset to them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The translator pre-allocates a buffer of maximal program size.
Due to HW/FW limitations the program buffer can't currently be
longer than 128Kb, so we used to kmalloc() it, and then map for
DMA directly.
Now that the late branch resolution is copying the program image
anyway, we can just kvmalloc() the buffer. While at it, after
translation reallocate the buffer to save space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Don't translate the program assuming it will be loaded at a given
address. This will be required for sharing programs between ports
of the same NIC, tail calls and subprograms. It will also make the
jump targets easier to understand when dumping the program to user
space.
Translate the program as if it was going to be loaded at address
zero. When load happens add the load offset in and set addresses
of special branches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for better handling of relocations move existing
helper for setting branch offset to nfp_asm.c and add two more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jump target resolution should be in jit.c not offload.c.
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
TC BPF offload was added first, so we used to assume that
the ethtool TC HW offload flag cannot be touched whenever
any BPF program is loaded on the NIC. This unncessarily
limits changes to the TC flag when offloaded program is XDP.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When BPF offload is active we need may need to restrict the MTU
changes more than just to the limitation of the kernel XDP datapath.
Allow the BPF code to veto a MTU change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Kernel enforces the alignment of the bottom of the stack, NFP
deals with positive offsets better so we should align the top
of the stack. Round the stack size to NFP word size (4B).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We should use % instead of @ for documenting preprocessor defines.
Add missing documentation of __NFP_REPR_TYPE_MAX. This gets rid
of all remaining kdoc warnings in the driver.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some RX rings are used for control messages, those will not have
a netdev pointer in dp. Skip XDP rxq handling on those rings.
Fixes: 7f1c684a89 ("nfp: setup xdp_rxq_info")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently the wakeup_path status flag becomes propagated from a child
device to its parent device at __device_suspend(). This allows a driver
dealing with a parent device to act on the flag from its ->suspend()
callback.
However, in situations when the wakeup_path status flag needs to be set
from a ->suspend_late() callback, its value doesn't get propagated to the
parent by the PM core. Let's address this limitation, by also propagating
the flag at __device_suspend_late().
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To make the code more consistent, let's clear the parent's direct_complete
flag along with clearing it for suppliers, instead of as currently, when
propagating the wakeup_path flag to parents.
While changing this, let's take the opportunity to rename the affected
internal functions, to make them self-explanatory. Like this:
dpm_clear_suppliers_direct_complete -> dpm_clear_superiors_direct_complete
dpm_propagate_to_parent -> dpm_propagate_wakeup_to_parent
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The SOR0 found on Tegra124 and Tegra210 only supports eDP and LVDS and
therefore has a slightly different clock tree than the SOR1 which does
not support eDP, but HDMI and DP instead.
Commit e1335e2f0c ("drm/tegra: sor: Reimplement pad clock") breaks
setups with eDP because the sor->clk_out clock is uninitialized and
therefore setting the parent clock (either the safe clock or either of
the display PLLs) fails, which can cause hangs later on since there is
no clock driving the module.
Fix this by falling back to the module clock for sor->clk_out on those
setups. This guarantees that the module will always be clocked by an
enabled clock and hence prevents those hangs.
Fixes: e1335e2f0c ("drm/tegra: sor: Reimplement pad clock")
Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Use aligned DMA on Tegra30, and USB Ethernet gadget now works on it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJaVWkqAAoJEEhZKYFQ1nG78J4H/2tMTfAHjF4orRsxysrhqmyp
mQpIYjFLWo4BbMn58W85WR/kne9hcF9pA9FgVVWjVZrJK0689BtJVylN13sYl5tN
3BfYzkthmHtMQEzqQyyG+CmUPWjJCrvmYy2K7r1iI+scK3ThAXHVZgJafvIf0Cuu
7V83xKWUIHojGRtlEvBTz8PcHU+XnrkwyTLIAkRV4SKIeiOyTIXWtAnP2Qyy4RkT
GYq/9JgNKCbRbLw9stg0WTFEA1/iQutYveCSwuZFWY5PvK1fib0hmKDWvm14S0xb
vuSdOqeEWxWewkjjdIWQzsDG8ATQMWmOaFsVo00neD//socbWew0676INJo2OwY=
=Bp26
-----END PGP SIGNATURE-----
Merge tag 'usb-ci-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-next
Peter writes:
Just one small update:
Use aligned DMA on Tegra30, and USB Ethernet gadget now works on it.
It is possible that more than one legacy IRQ may be set at the same
time, therefore iterate and handle all the pending INTx interrupts
before clearing the status and exiting the IRQ handler. Otherwise, some
interrupts would be lost.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Legacy INTD IRQ handling is broken on dra7xx due to fact that driver
uses hwirq in range of 1-4 for INTA, INTD whereas IRQ domain is of size
4 which is numbered 0-3. Therefore when INTD IRQ line is used with
pci-dra7xx driver following warning is seen:
WARNING: CPU: 0 PID: 1 at kernel/irq/irqdomain.c:342 irq_domain_associate+0x12c/0x1c4
error: hwirq 0x4 is too large for dummy
Fix this by using pci_irqd_intx_xlate() helper to translate the INTx 1-4
range into the 0-3 as done in other PCIe drivers.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Reported-by: Chris Welch <Chris.Welch@viavisolutions.com>
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
We need to run xfrm_resolve_and_create_bundle() with
bottom halves off. Otherwise we may reuse an already
released dst_enty when the xfrm lookup functions are
called from process context.
Fixes: c30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache")
Reported-by: Darius Ski <darius.ski@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Jakub Kicinski says:
====================
Two more trivial fixes to the recent XDP RXQ series.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Looks like commit e817f85652 ("xdp: generic XDP handling of
xdp_rxq_info") replaced kvfree(dev->_rx) in free_netdev() with
a call to netif_free_rx_queues() which doesn't actually free
the rings?
While at it remove the unnecessary temporary variable.
Fixes: e817f85652 ("xdp: generic XDP handling of xdp_rxq_info")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
kvzalloc'ed memory should be kvfree'd.
Fixes: e817f85652 ("xdp: generic XDP handling of xdp_rxq_info")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The micbias1/2 are connected to route as SUPPLY usage. It was not
take effect since they were MICBIAS type. To keep the same register
settings, we have to remove it once the micbias1/2 widget is converted
to SUPPLY type.
Signed-off-by: Bard Liao <bardliao@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This codec was used in MFLD systems in the PMIC chip, we no longer have
users for this, so remove it
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
mfld_machine was not getting compiled due to missed Makefile changes.
Since no one complained it is safe to assume that it is not being used,
so remove it
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add driver to handle DAI interface for PDM microphones connected
to Digital Filter for Sigma Delta Modulators IP.
Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add bindings that describes audio settings to support
Digital Filter for pulse density modulation(PDM) microphone.
Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Since commit aef9a7bd9b ("serial/uart/8250: Add tunable RX interrupt
trigger I/F of FIFO buffers"), the port's default FCR value isn't used
in serial8250_do_set_termios anymore, but copied over once in
serial8250_config_port and then modified as needed.
Unfortunately, serial8250_config_port will never be called if the port
is shared between kernel and userspace, and the port's flag doesn't have
UPF_BOOT_AUTOCONF, which would trigger a serial8250_config_port as well.
This causes garbled output from userspace:
[ 5.220000] random: procd urandom read with 49 bits of entropy available
ers
[kee
Fix this by forcing it to be configured on boot, resulting in the
expected output:
[ 5.250000] random: procd urandom read with 50 bits of entropy available
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
Fixes: aef9a7bd9b ("serial/uart/8250: Add tunable RX interrupt trigger I/F of FIFO buffers")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Nicolas Schichan <nschichan@freebox.fr>
Cc: linux-mips@linux-mips.org
Cc: linux-serial@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17544/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch adds the possibility of getting the delivery of a SIGXCPU
signal whenever there is a runtime overrun. The request is done through
the sched_flags field within the sched_attr structure.
Forward port of https://lkml.org/lkml/2009/10/16/170
Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Link: http://lkml.kernel.org/r/1513077024-25461-1-git-send-email-claudio@evidence.eu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If waking from an idle CPU due to an interrupt then it's possible that
the waker task will be pulled to wake on the current CPU. Unfortunately,
depending on the type of interrupt and IRQ configuration, there may not
be a strong relationship between the CPU an interrupt was delivered on
and the CPU a task was running on. For example, the interrupts could all
be delivered to CPUs on one particular node due to the machine topology
or IRQ affinity configuration. Another example is an interrupt for an IO
completion which can be delivered to any CPU where there is no guarantee
the data is either cache hot or even local.
This patch was motivated by the observation that an IO workload was
being pulled cross-node on a frequent basis when IO completed. From a
wakeup latency perspective, it's still useful to know that an idle CPU is
immediately available for use but lets only consider an automatic migration
if the CPUs share cache to limit damage due to NUMA migrations. Migrations
may still occur if wake_affine_weight determines it's appropriate.
These are the throughput results for dbench running on ext4 comparing
4.15-rc3 and this patch on a 2-socket machine where interrupts due to IO
completions can happen on any CPU.
4.15.0-rc3 4.15.0-rc3
vanilla lessmigrate
Hmean 1 854.64 ( 0.00%) 865.01 ( 1.21%)
Hmean 2 1229.60 ( 0.00%) 1274.44 ( 3.65%)
Hmean 4 1591.81 ( 0.00%) 1628.08 ( 2.28%)
Hmean 8 1845.04 ( 0.00%) 1831.80 ( -0.72%)
Hmean 16 2038.61 ( 0.00%) 2091.44 ( 2.59%)
Hmean 32 2327.19 ( 0.00%) 2430.29 ( 4.43%)
Hmean 64 2570.61 ( 0.00%) 2568.54 ( -0.08%)
Hmean 128 2481.89 ( 0.00%) 2499.28 ( 0.70%)
Stddev 1 14.31 ( 0.00%) 5.35 ( 62.65%)
Stddev 2 21.29 ( 0.00%) 11.09 ( 47.92%)
Stddev 4 7.22 ( 0.00%) 6.80 ( 5.92%)
Stddev 8 26.70 ( 0.00%) 9.41 ( 64.76%)
Stddev 16 22.40 ( 0.00%) 20.01 ( 10.70%)
Stddev 32 45.13 ( 0.00%) 44.74 ( 0.85%)
Stddev 64 93.10 ( 0.00%) 93.18 ( -0.09%)
Stddev 128 184.28 ( 0.00%) 177.85 ( 3.49%)
Note the small increase in throughput for low thread counts but also
note that the standard deviation for each sample during the test run is
lower. The throughput figures for dbench can be misleading so the benchmark
is actually modified to time the latency of the processing of one load
file with many samples taken. The difference in latency is
4.15.0-rc3 4.15.0-rc3
vanilla lessmigrate
Amean 1 21.71 ( 0.00%) 21.47 ( 1.08%)
Amean 2 30.89 ( 0.00%) 29.58 ( 4.26%)
Amean 4 47.54 ( 0.00%) 46.61 ( 1.97%)
Amean 8 82.71 ( 0.00%) 82.81 ( -0.12%)
Amean 16 149.45 ( 0.00%) 145.01 ( 2.97%)
Amean 32 265.49 ( 0.00%) 248.43 ( 6.42%)
Amean 64 463.23 ( 0.00%) 463.55 ( -0.07%)
Amean 128 933.97 ( 0.00%) 935.50 ( -0.16%)
Stddev 1 1.58 ( 0.00%) 1.54 ( 2.26%)
Stddev 2 2.84 ( 0.00%) 2.95 ( -4.15%)
Stddev 4 6.78 ( 0.00%) 6.85 ( -0.99%)
Stddev 8 16.85 ( 0.00%) 16.37 ( 2.85%)
Stddev 16 41.59 ( 0.00%) 41.04 ( 1.32%)
Stddev 32 111.05 ( 0.00%) 105.11 ( 5.35%)
Stddev 64 285.94 ( 0.00%) 288.01 ( -0.72%)
Stddev 128 803.39 ( 0.00%) 809.73 ( -0.79%)
It's a small improvement which is not surprising given that migrations that
migrate to a different node as not that common. However, it is noticeable
in the CPU migration statistics which are reduced by 24%.
There was a query for v1 of this patch about NAS so here are the results
for C-class using MPI for parallelisation on the same machine
nas-mpi
4.15.0-rc3 4.15.0-rc3
vanilla noirq
Time cg.C 24.25 ( 0.00%) 23.17 ( 4.45%)
Time ep.C 8.22 ( 0.00%) 8.29 ( -0.85%)
Time ft.C 22.67 ( 0.00%) 20.34 ( 10.28%)
Time is.C 1.42 ( 0.00%) 1.47 ( -3.52%)
Time lu.C 55.62 ( 0.00%) 54.81 ( 1.46%)
Time mg.C 7.93 ( 0.00%) 7.91 ( 0.25%)
4.15.0-rc3 4.15.0-rc3
vanilla noirq-v1r1
User 3799.96 3748.34
System 672.10 626.15
Elapsed 91.91 79.49
lu.C sees a small gain, ft.C a large gain and ep.C and is.C see small
regressions but in terms of absolute time, the difference is small and
likely within run-to-run variance. System CPU usage is slightly reduced.
schbench from Facebook was also requested. This is a bit of a mixed bag but
it's important to note that this workload should not be heavily impacted
by wakeups from interrupt context.
4.15.0-rc3 4.15.0-rc3
vanilla noirq-v1r1
Lat 50.00th-qrtle-1 41.00 ( 0.00%) 41.00 ( 0.00%)
Lat 75.00th-qrtle-1 42.00 ( 0.00%) 42.00 ( 0.00%)
Lat 90.00th-qrtle-1 43.00 ( 0.00%) 44.00 ( -2.33%)
Lat 95.00th-qrtle-1 44.00 ( 0.00%) 46.00 ( -4.55%)
Lat 99.00th-qrtle-1 57.00 ( 0.00%) 58.00 ( -1.75%)
Lat 99.50th-qrtle-1 59.00 ( 0.00%) 59.00 ( 0.00%)
Lat 99.90th-qrtle-1 67.00 ( 0.00%) 78.00 ( -16.42%)
Lat 50.00th-qrtle-2 40.00 ( 0.00%) 51.00 ( -27.50%)
Lat 75.00th-qrtle-2 45.00 ( 0.00%) 56.00 ( -24.44%)
Lat 90.00th-qrtle-2 53.00 ( 0.00%) 59.00 ( -11.32%)
Lat 95.00th-qrtle-2 57.00 ( 0.00%) 61.00 ( -7.02%)
Lat 99.00th-qrtle-2 67.00 ( 0.00%) 71.00 ( -5.97%)
Lat 99.50th-qrtle-2 69.00 ( 0.00%) 74.00 ( -7.25%)
Lat 99.90th-qrtle-2 83.00 ( 0.00%) 77.00 ( 7.23%)
Lat 50.00th-qrtle-4 51.00 ( 0.00%) 51.00 ( 0.00%)
Lat 75.00th-qrtle-4 57.00 ( 0.00%) 56.00 ( 1.75%)
Lat 90.00th-qrtle-4 60.00 ( 0.00%) 59.00 ( 1.67%)
Lat 95.00th-qrtle-4 62.00 ( 0.00%) 62.00 ( 0.00%)
Lat 99.00th-qrtle-4 73.00 ( 0.00%) 72.00 ( 1.37%)
Lat 99.50th-qrtle-4 76.00 ( 0.00%) 74.00 ( 2.63%)
Lat 99.90th-qrtle-4 85.00 ( 0.00%) 78.00 ( 8.24%)
Lat 50.00th-qrtle-8 54.00 ( 0.00%) 58.00 ( -7.41%)
Lat 75.00th-qrtle-8 59.00 ( 0.00%) 62.00 ( -5.08%)
Lat 90.00th-qrtle-8 65.00 ( 0.00%) 66.00 ( -1.54%)
Lat 95.00th-qrtle-8 67.00 ( 0.00%) 70.00 ( -4.48%)
Lat 99.00th-qrtle-8 78.00 ( 0.00%) 79.00 ( -1.28%)
Lat 99.50th-qrtle-8 81.00 ( 0.00%) 80.00 ( 1.23%)
Lat 99.90th-qrtle-8 116.00 ( 0.00%) 83.00 ( 28.45%)
Lat 50.00th-qrtle-16 65.00 ( 0.00%) 64.00 ( 1.54%)
Lat 75.00th-qrtle-16 77.00 ( 0.00%) 71.00 ( 7.79%)
Lat 90.00th-qrtle-16 83.00 ( 0.00%) 82.00 ( 1.20%)
Lat 95.00th-qrtle-16 87.00 ( 0.00%) 87.00 ( 0.00%)
Lat 99.00th-qrtle-16 95.00 ( 0.00%) 96.00 ( -1.05%)
Lat 99.50th-qrtle-16 99.00 ( 0.00%) 103.00 ( -4.04%)
Lat 99.90th-qrtle-16 104.00 ( 0.00%) 122.00 ( -17.31%)
Lat 50.00th-qrtle-32 71.00 ( 0.00%) 73.00 ( -2.82%)
Lat 75.00th-qrtle-32 91.00 ( 0.00%) 92.00 ( -1.10%)
Lat 90.00th-qrtle-32 108.00 ( 0.00%) 107.00 ( 0.93%)
Lat 95.00th-qrtle-32 118.00 ( 0.00%) 115.00 ( 2.54%)
Lat 99.00th-qrtle-32 134.00 ( 0.00%) 129.00 ( 3.73%)
Lat 99.50th-qrtle-32 138.00 ( 0.00%) 133.00 ( 3.62%)
Lat 99.90th-qrtle-32 149.00 ( 0.00%) 146.00 ( 2.01%)
Lat 50.00th-qrtle-39 83.00 ( 0.00%) 81.00 ( 2.41%)
Lat 75.00th-qrtle-39 105.00 ( 0.00%) 102.00 ( 2.86%)
Lat 90.00th-qrtle-39 120.00 ( 0.00%) 119.00 ( 0.83%)
Lat 95.00th-qrtle-39 129.00 ( 0.00%) 128.00 ( 0.78%)
Lat 99.00th-qrtle-39 153.00 ( 0.00%) 149.00 ( 2.61%)
Lat 99.50th-qrtle-39 166.00 ( 0.00%) 156.00 ( 6.02%)
Lat 99.90th-qrtle-39 12304.00 ( 0.00%) 12848.00 ( -4.42%)
When heavily loaded (e.g. 99.50th-qrtle-39 indicates 39 threads), there
are small gains in many cases. Otherwise it depends on the quartile used
where it can be bad -- e.g. 75.00th-qrtle-2. However, even these results
are probably a co-incidence. For this workload, much depends on what node
the threads get placed on and their relative locality and not wakeups from
interrupt context. A larger component on how it behaves would be automatic
NUMA balancing where a fault incurred to measure locality would be a much
larger contributer to latency than the wakeup path.
This is the results from an almost identical machine that happened to run
the same test. They only differ in terms of storage which is irrelevant
for this test.
4.15.0-rc3 4.15.0-rc3
vanilla noirq-v1r1
Lat 50.00th-qrtle-1 41.00 ( 0.00%) 41.00 ( 0.00%)
Lat 75.00th-qrtle-1 42.00 ( 0.00%) 42.00 ( 0.00%)
Lat 90.00th-qrtle-1 44.00 ( 0.00%) 43.00 ( 2.27%)
Lat 95.00th-qrtle-1 53.00 ( 0.00%) 45.00 ( 15.09%)
Lat 99.00th-qrtle-1 59.00 ( 0.00%) 58.00 ( 1.69%)
Lat 99.50th-qrtle-1 60.00 ( 0.00%) 59.00 ( 1.67%)
Lat 99.90th-qrtle-1 86.00 ( 0.00%) 61.00 ( 29.07%)
Lat 50.00th-qrtle-2 52.00 ( 0.00%) 41.00 ( 21.15%)
Lat 75.00th-qrtle-2 57.00 ( 0.00%) 46.00 ( 19.30%)
Lat 90.00th-qrtle-2 60.00 ( 0.00%) 53.00 ( 11.67%)
Lat 95.00th-qrtle-2 62.00 ( 0.00%) 57.00 ( 8.06%)
Lat 99.00th-qrtle-2 73.00 ( 0.00%) 68.00 ( 6.85%)
Lat 99.50th-qrtle-2 74.00 ( 0.00%) 71.00 ( 4.05%)
Lat 99.90th-qrtle-2 90.00 ( 0.00%) 75.00 ( 16.67%)
Lat 50.00th-qrtle-4 57.00 ( 0.00%) 52.00 ( 8.77%)
Lat 75.00th-qrtle-4 60.00 ( 0.00%) 58.00 ( 3.33%)
Lat 90.00th-qrtle-4 62.00 ( 0.00%) 62.00 ( 0.00%)
Lat 95.00th-qrtle-4 65.00 ( 0.00%) 65.00 ( 0.00%)
Lat 99.00th-qrtle-4 76.00 ( 0.00%) 75.00 ( 1.32%)
Lat 99.50th-qrtle-4 77.00 ( 0.00%) 77.00 ( 0.00%)
Lat 99.90th-qrtle-4 87.00 ( 0.00%) 81.00 ( 6.90%)
Lat 50.00th-qrtle-8 59.00 ( 0.00%) 57.00 ( 3.39%)
Lat 75.00th-qrtle-8 63.00 ( 0.00%) 62.00 ( 1.59%)
Lat 90.00th-qrtle-8 66.00 ( 0.00%) 67.00 ( -1.52%)
Lat 95.00th-qrtle-8 68.00 ( 0.00%) 70.00 ( -2.94%)
Lat 99.00th-qrtle-8 79.00 ( 0.00%) 80.00 ( -1.27%)
Lat 99.50th-qrtle-8 80.00 ( 0.00%) 84.00 ( -5.00%)
Lat 99.90th-qrtle-8 84.00 ( 0.00%) 90.00 ( -7.14%)
Lat 50.00th-qrtle-16 65.00 ( 0.00%) 65.00 ( 0.00%)
Lat 75.00th-qrtle-16 77.00 ( 0.00%) 75.00 ( 2.60%)
Lat 90.00th-qrtle-16 84.00 ( 0.00%) 83.00 ( 1.19%)
Lat 95.00th-qrtle-16 88.00 ( 0.00%) 87.00 ( 1.14%)
Lat 99.00th-qrtle-16 97.00 ( 0.00%) 96.00 ( 1.03%)
Lat 99.50th-qrtle-16 100.00 ( 0.00%) 104.00 ( -4.00%)
Lat 99.90th-qrtle-16 110.00 ( 0.00%) 126.00 ( -14.55%)
Lat 50.00th-qrtle-32 70.00 ( 0.00%) 71.00 ( -1.43%)
Lat 75.00th-qrtle-32 92.00 ( 0.00%) 94.00 ( -2.17%)
Lat 90.00th-qrtle-32 110.00 ( 0.00%) 110.00 ( 0.00%)
Lat 95.00th-qrtle-32 121.00 ( 0.00%) 118.00 ( 2.48%)
Lat 99.00th-qrtle-32 135.00 ( 0.00%) 137.00 ( -1.48%)
Lat 99.50th-qrtle-32 140.00 ( 0.00%) 146.00 ( -4.29%)
Lat 99.90th-qrtle-32 150.00 ( 0.00%) 160.00 ( -6.67%)
Lat 50.00th-qrtle-39 80.00 ( 0.00%) 71.00 ( 11.25%)
Lat 75.00th-qrtle-39 102.00 ( 0.00%) 91.00 ( 10.78%)
Lat 90.00th-qrtle-39 118.00 ( 0.00%) 108.00 ( 8.47%)
Lat 95.00th-qrtle-39 128.00 ( 0.00%) 117.00 ( 8.59%)
Lat 99.00th-qrtle-39 149.00 ( 0.00%) 133.00 ( 10.74%)
Lat 99.50th-qrtle-39 160.00 ( 0.00%) 139.00 ( 13.12%)
Lat 99.90th-qrtle-39 13808.00 ( 0.00%) 4920.00 ( 64.37%)
Despite being nearly identical, it showed a variety of major gains so
I'm not convinced that heavy emphasis should be placed on this particular
workload in terms of evaluating this particular patch. Further evidence of
this is the fact that testing on a UMA machine showed small gains/losses
even though the patch should be a no-op on UMA.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171219085947.13136-2-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the remote cpufreq callback work, the cpufreq_update_util() call can happen
from remote CPUs. The comment about local CPUs is thus obsolete. Update it
accordingly.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Android Kernel <kernel-team@android.com>
Cc: Atish Patra <atish.patra@oracle.com>
Cc: Chris Redpath <Chris.Redpath@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: EAS Dev <eas-dev@lists.linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Ramussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rohit Jain <rohit.k.jain@oracle.com>
Cc: Saravana Kannan <skannan@quicinc.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20171215153944.220146-2-joelaf@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
find_idlest_group_cpu() goes through CPUs of a group previous selected by
find_idlest_group(). find_idlest_group() returns NULL if the local group is the
selected one and doesn't execute find_idlest_group_cpu if the group to which
'cpu' belongs to is chosen. So we're always guaranteed to call
find_idlest_group_cpu() with a group to which 'cpu' is non-local.
This makes one of the conditions in find_idlest_group_cpu() an impossible one,
which we can get rid off.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Brendan Jackman <brendan.jackman@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Android Kernel <kernel-team@android.com>
Cc: Atish Patra <atish.patra@oracle.com>
Cc: Chris Redpath <Chris.Redpath@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: EAS Dev <eas-dev@lists.linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Ramussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rohit Jain <rohit.k.jain@oracle.com>
Cc: Saravana Kannan <skannan@quicinc.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20171215153944.220146-3-joelaf@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Initializing sg_cpu->flags to SCHED_CPUFREQ_RT has no obvious benefit.
The flags field wouldn't be used until the utilization update handler is
called for the first time, and once that is called we will overwrite
flags anyway.
Initialize it to 0.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: dietmar.eggemann@arm.com
Cc: joelaf@google.com
Cc: morten.rasmussen@arm.com
Cc: tkjos@android.com
Link: http://lkml.kernel.org/r/763feda6424ced8486b25a0c52979634e6104478.1513158452.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
capacity_spare_wake() in the slow path influences choice of idlest groups,
as we search for groups with maximum spare capacity. In scenarios where
RT pressure is high, a sub optimal group can be chosen and hurt
performance of the task being woken up.
Fix this by using capacity_of() instead of capacity_orig_of() in capacity_spare_wake().
Tests results from improvements with this change are below. More tests
were also done by myself and Matt Fleming to ensure no degradation in
different benchmarks.
1) Rohit ran barrier.c test (details below) with following improvements:
------------------------------------------------------------------------
This was Rohit's original use case for a patch he posted at [1] however
from his recent tests he showed my patch can replace his slow path
changes [1] and there's no need to selectively scan/skip CPUs in
find_idlest_group_cpu in the slow path to get the improvement he sees.
barrier.c (open_mp code) as a micro-benchmark. It does a number of
iterations and barrier sync at the end of each for loop.
Here barrier,c is running in along with ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'
barrier.c can be found at:
http://www.spinics.net/lists/kernel/msg2506955.html
Following are the results for the iterations per second with this
micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
Intel x86 machine:
+--------+------------------+---------------------------+
|Threads | Without patch | With patch |
| | | |
+--------+--------+---------+-----------------+---------+
| | Mean | Std Dev | Mean | Std Dev |
+--------+--------+---------+-----------------+---------+
|1 | 539.36 | 60.16 | 572.54 (+6.15%) | 40.95 |
|2 | 481.01 | 19.32 | 530.64 (+10.32%)| 56.16 |
|4 | 474.78 | 22.28 | 479.46 (+0.99%) | 18.89 |
|8 | 450.06 | 24.91 | 447.82 (-0.50%) | 12.36 |
|16 | 436.99 | 22.57 | 441.88 (+1.12%) | 7.39 |
|32 | 388.28 | 55.59 | 429.4 (+10.59%)| 31.14 |
|64 | 314.62 | 6.33 | 311.81 (-0.89%) | 11.99 |
+--------+--------+---------+-----------------+---------+
2) ping+hackbench test on bare-metal sever (by Rohit)
-----------------------------------------------------
Here hackbench is running in threaded mode along
with, running ping on CPU 0 and 1 as:
'ping -l 10000 -q -s 10 -f hostX'
This test is running on 2 socket, 20 core and 40 threads Intel x86
machine:
Number of loops is 10000 and runtime is in seconds (Lower is better).
+--------------+-----------------+--------------------------+
|Task Groups | Without patch | With patch |
| +-------+---------+----------------+---------+
|(Groups of 40)| Mean | Std Dev | Mean | Std Dev |
+--------------+-------+---------+----------------+---------+
|1 | 0.851 | 0.007 | 0.828 (+2.77%)| 0.032 |
|2 | 1.083 | 0.203 | 1.087 (-0.37%)| 0.246 |
|4 | 1.601 | 0.051 | 1.611 (-0.62%)| 0.055 |
|8 | 2.837 | 0.060 | 2.827 (+0.35%)| 0.031 |
|16 | 5.139 | 0.133 | 5.107 (+0.63%)| 0.085 |
|25 | 7.569 | 0.142 | 7.503 (+0.88%)| 0.143 |
+--------------+-------+---------+----------------+---------+
[1] https://patchwork.kernel.org/patch/9991635/
Matt Fleming also ran several different hackbench tests and cyclic test
to santiy-check that the patch doesn't harm other usecases.
Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Tested-by: Rohit Jain <rohit.k.jain@oracle.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Atish Patra <atish.patra@oracle.com>
Cc: Brendan Jackman <brendan.jackman@arm.com>
Cc: Chris Redpath <Chris.Redpath@arm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Ramussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Saravana Kannan <skannan@quicinc.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20171214212158.188190-1-joelaf@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Utilization and capacity are tracked as 'unsigned long', however some
functions using them return an 'int' which is ultimately assigned back to
'unsigned long' variables.
Since there is not scope on using a different and signed type,
consolidate the signature of functions returning utilization to always
use the native type.
This change improves code consistency, and it also benefits
code paths where utilizations should be clamped by avoiding
further type conversions or ugly type casts.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: Brendan Jackman <brendan.jackman@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@android.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20171205171018.9203-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>