Commit 51c3c62b58 ("powerpc: Avoid code patching freed init
sections") accesses 'init_mem_is_free' flag too early, before the
kernel is relocated. This provokes early boot failure (before the
console is active).
As it is not necessary to do this verification that early, this
patch moves the test into patch_instruction() instead of
__patch_instruction().
This modification also has the advantage of avoiding unnecessary
remappings.
Fixes: 51c3c62b58 ("powerpc: Avoid code patching freed init sections")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CON_PRINTBUFFER console registration requires us to do several
preparation steps:
- Rollback console_seq to replay logbuf messages which were already
seen on other consoles;
- Set exclusive_console flag so console_unlock() will ->write() logbuf
messages only to the exclusive_console driver.
The way we do it, however, is a bit racy
logbuf_lock_irqsave(flags);
console_seq = syslog_seq;
console_idx = syslog_idx;
logbuf_unlock_irqrestore(flags);
<< preemption enabled
<< irqs enabled
exclusive_console = newcon;
console_unlock();
We rollback console_seq under logbuf_lock with IRQs disabled, but
we set exclusive_console with local IRQs enabled and logbuf unlocked.
If the system oops-es or panic-s before we set exclusive_console - and
given that we have IRQs and preemption enabled there is such a
possibility - we will re-play all logbuf messages to every registered
console, which may be a bit annoying and time consuming.
Move exclusive_console assignment to the same IRQs-disabled and
logbuf_lock-protected section where we rollback console_seq.
Link: http://lkml.kernel.org/r/20180928095304.9972-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
commit efdfeb079c
("regulator: fixed: Convert to use GPIO descriptor only")
switched to use gpiod_get() to look up the regulator from the
gpiolib core whether that is device tree or boardfile.
This meant that we activate the code in
a603a2b8d8 ("gpio: of: Add special quirk to parse regulator flags")
which means the descriptors coming from the device tree already
have the right inversion and open drain semantics set up from
the gpiolib core.
As the fixed regulator was inspected again we got the
inverted inversion and things broke.
Fix it by ignoring the config in the device tree for now: the
later patches in the series will push all inversion handling
over to the gpiolib core and set it up properly in the
boardfiles for legacy devices, but I did not finish that
for this kernel cycle.
Fixes: commit efdfeb079c ("regulator: fixed: Convert to use GPIO descriptor only")
Reported-by: Leonard Crestez <leonard.crestez@nxp.com>
Reported-by: Fabio Estevam <festevam@gmail.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The variable "exclusive_console" is used to reply all existing messages
on a newly registered console. It is cleared when all messages are out.
The problem is that new messages might appear in the meantime. These
are then visible only on the exclusive console.
The obvious solution is to clear "exclusive_console" after we replay
all messages that were already proceed before we started the reply.
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Link: http://lkml.kernel.org/r/20180913123406.14378-1-pmladek@suse.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
We return H_TOO_HARD from TCE update handlers when we think that
the next handler (realmode -> virtual mode -> user mode) has a chance to
handle the request; H_HARDWARE/H_CLOSED otherwise.
This changes the handlers to return H_TOO_HARD on every error giving
the userspace an opportunity to handle any request or at least log
them all.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The KVM TCE handlers are written in a way so they fail when either
something went horribly wrong or the userspace did some obvious mistake
such as passing a misaligned address.
We are going to enhance the TCE checker to fail on attempts to map bigger
IOMMU page than the underlying pinned memory so let's valitate TCE
beforehand.
This should cause no behavioral change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some commits we'd like to share between the powerpc and kvm-ppc tree for
next have dependencies on commits that went into 4.19 via the
kvm-ppc-fixes branch and weren't merged before 4.19-rc3, which is our
base commit.
So merge the kvm-ppc-fixes branch into topic/ppc-kvm.
Explicitly forbid creating cgroup local storage maps with zero value
size, as it makes no sense and might even cause a panic.
Reported-by: syzbot+18628320d3b14a5c459c@syzkaller.appspotmail.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski says:
====================
This series makes the control message parsing for interacting
with BPF maps more flexible. Up until now we had a hard limit
in the ABI for key and value size to be 64B at most. Using
TLV capability allows us to support large map entries.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In current ABI the size of the messages carrying map elements was
statically defined to at most 16 words of key and 16 words of value
(NFP word is 4 bytes). We should not make this assumption and use
the max key and value sizes from the BPF capability instead.
To make sure old kernels don't get surprised with larger (or smaller)
messages bump the FW ABI version to 3 when key/value size is different
than 16 words.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some apps may want to have higher MTU on the control vNIC/queue.
Allow them to set the requested MTU at init time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Up until now we only had per-vNIC BPF ABI version capabilities,
which are slightly awkward to use because bulk of the resources
and configuration does not relate to any particular vNIC. Add
a new capability for global ABI version and check the per-vNIC
version are equal to it. Assume the ABI version 2 if no explicit
version capability is present.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- Fixup conversion of debounce time to/from ms/us
MMC host:
- sdhi: Fixup whitelisting for Gen3 types
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAluzG3YXHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCmUIA//aO8Itm9cnbQQPtcO09Wca1Ht
JlgRc3ZP0Tackpe8d3eH9MzQ5pRJ9lBTMUR0cqEF/HHpjqktPLA66IkalsceEq+E
eMXLOqGUzKLqEJGKslggKJzSRuqFxtNNATPPqNkaOXZ/UK7cGgSsWHnavgUWnEd4
h+hcGSu4R4J/ftrNPlc0TB8tCzo/ER7Xs3JNV/GzP5FNpMs+LzjSN1Vcp/9CJKkK
wZmmVSnLH0oY5U/VM1EUSMjc8p2pe7K2j9OENt3aFpMXsW5HTeHcwmMxbWUFGVAs
iuGzk6CBhVujWPrujcWf+7IzlTHgkDAaH1WB5YJJ32r6xpgiHxiP7HuGtHsvDpml
S5rCV3gHRhxI6BIyaldZ94bRzqcDQ1hoCidKOdEGRfFjtnvIvFAajERz2HkF8KiX
I6ErmnFrrLA3EQTDd0DueldaZ+R2OD/h43awlj67834LofE8l6zj1vCOSqXcnWqk
03Qm2I5xrkicfncWLi6vxcpiWLIil2CLP7porovOuzuY0aZp3adRNwykQqI7E0oo
Fz5Vch2iiQgkwzIMFqUciiXuKDOXKZO6qGQLczARhgzU2YhlcN0y8aGC5ecXWTqn
4JS4grJstR1Hp9ri6RYj20FXMqtwESePyLhg5b7C0Ru8aWVfLDFNVlOg+tdzv4Ti
QnHF/N5ymxkAxztC1MU=
=AYff
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Ulf writes:
"MMC core:
- Fixup conversion of debounce time to/from ms/us
MMC host:
- sdhi: Fixup whitelisting for Gen3 types"
* tag 'mmc-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: slot-gpio: Fix debounce time to use miliseconds again
mmc: core: Fix debounce time to use microseconds
mmc: sdhi: sys_dmac: check for all Gen3 types when whitelisting
If the "workaround_for_vbus" is true, the driver will not call
usb_disconnect(). So, since the controller keeps some registers'
value, the driver doesn't re-enumarate suitable speed after
the b-device mode is disabled. To fix the issue, this patch
adds usb_disconnect() calling in renesas_usb3_b_device_write()
if workaround_for_vbus is true.
Fixes: 43ba968b00 ("usb: gadget: udc: renesas_usb3: add debugfs to set the b-device mode")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Describe the System Error Interrupt (SEI) controller. It aggregates two
types of interrupts, wired and MSIs from respectively the AP and the
CPs, into a single SPI interrupt.
Suggested-by: Haim Boot <hayim@marvell.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Change the documentation to reflect the new bindings used for Marvell
ICU. This involves describing each interrupt group as a subnode of the
ICU node. Each of them having their own compatible.
The DT binding documentation still documents the legacy binding, where
there was a single node with no subnode.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far the ICU only handled NSR interrupts through GICP. An SEI driver
provides an MSI domain through which it is possible to raise SEI, so
let's add SEI support to the ICU driver.
Handle the NSR probe function in a more generic way to support other
type of interrupts.
Each interrupt domain is a tree domain to avoid allocation the 207
entries each time.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Enable the newly introduced Marvell SEI driver for the 64-bit Marvell
EBU platforms.
Suggested-by: Haim Boot <hayim@marvell.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This is a cascaded interrupt controller in the AP806 GIC that collapses
SEIs (System Error Interrupt) coming from the AP and the CPs (through
the ICU).
The SEI handles up to 64 interrupts. The first 21 interrupts are wired
from the AP. The next 43 interrupts are from the CPs and are triggered
through MSI messages. To handle this complexity, the driver has to
declare to the upper layer: one IRQ domain for the wired interrupts,
one IRQ domain for the MSIs; and acts as a MSI controller ('parent')
by declaring an MSI domain.
Suggested-by: Haim Boot <hayim@marvell.com>
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The ICU can handle several type of interrupt, each of them being handled
differently on AP side. On CP side, the ICU should be able to make the
distinction between each interrupt group by pointing to the right parent.
This is done through the introduction of new bindings, presenting the ICU
node as the parent of multiple ICU sub-nodes, each of them being an
interrupt type with a different interrupt parent. ICU interrupt 'clients'
now directly point to the right sub-node, avoiding the need for the extra
ICU_GRP_* parameter.
ICU subnodes are probed automatically with devm_platform_populate(). If
the node as no child, the probe function for NSRs will still be called
'manually' in order to preserve backward compatibility with DT using the
old binding.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
NSR (non-secure interrupts) are handled in the ICU driver like if there
was only this type of interrupt in the ICU. Change this behavior to
prepare the introduction of SEI (System Error Interrupts) support by
moving the NSR code in a separate function. This is done under the form
of a 'probe' function to ease future migration to NSR/SEI being platform
devices part of the ICU. The 'icu' structure is passed as driver data
and not as a parameter for the same reason.
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Rewrite a small section to clarify the reset operation of interrupts
already configured by ATF that we want to handle in the driver. This
will simplify the introduction of System Error Interrupts support.
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The irq_domain structure has an host_data pointer that just stores
private data. It is meant to not be touched by the IRQ core. However,
when it comes to MSI, the MSI layer adds its own private data there
with a structure that also has a host_data pointer.
Because this IRQ domain is an MSI domain, to access private data we
should do a d->host_data->host_data, also wrapped as
'platform_msi_get_host_data()'.
This bug was lying there silently because the 'icu' structure retrieved
this way was just called by dev_err(), only producing a
'(NULL device *):' output on the console.
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
ICU size in CP110 is not 0x10 but at least 0x440 bytes long (from the
specification).
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add pins, groups, and function for the Interrupt Controller for
External Devices (INTC-EX) on the R-Car E3 SoC.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
This provides a pinctrl driver for the Renesas RZ/N1 device family.
Based on a patch originally written by Michel Pollet at Renesas.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
The Renesas RZ/N1 device family PINCTRL node description.
Based on a patch originally written by Michel Pollet at Renesas.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
platform_msi_create_device_domain() always creates a revmap-based
irqdomain, which has the drawback of requiring the number of MSIs
that can be allocated ahead of time. This is not always possible,
and we sometimes need to use a tree-based irqdomain instead.
Add a new platform_msi_create_device_tree_domain() helper to
that effect.
Reported-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Document the R-Car E3 (R8A77990) SoC in the R-Car PCIe bindings.
Signed-off-by: Tho Vu <tho.vu.wh@rvc.renesas.com>
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: linux-renesas-soc@vger.kernel.org
Document support for the Interrupt Controller for External Devices
(INTC-EX) in the Renesas E3 (r8a77990) SoC.
No driver update is needed.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The PDC irqchp can convert a falling edge or level low interrupt to a
rising edge or level high interrupt at the GIC. We just need to setup
the GIC correctly. Set up the interrupt type for the IRQ_TYPE_EDGE_BOTH
as IRQ_TYPE_EDGE_RISING at the GIC.
Fixes: f55c73aef8 ("irqchip/pdc: Add PDC interrupt controller for QCOM SoCs")
Reported-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If the LPI tables have been reserved with the EFI reservation
mechanism, we assume that these tables are safe to use even
when we find the redistributors to have LPIs enabled at
boot time, meaning that kexec can now work with GICv3.
You're welcome.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Upon enabling a redistributor, let's register the allocated tables
with the EFI table that tracks the memory reservations.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If booting with LPIs enabled, all the redistributors must have the
exact same property table. No ifs, no buts.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If using a kdump kernel, and that we cannot disable LPIs to install
our own tables, let's switch to using the already allocated tables.
This means that we'll change some of the initial kernel's memory,
but at least we'll be able to have LPIs in this secondary kernel.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to cope with kexec and GICv3, let's try and spot when
we're booting with LPIs already enabled, and the tables already
programmed into the redistributors.
This code is currently guarded by a predicate that is always false,
meaning this is not functionnal just yet.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We're currently only tracking the page allocated to contain the
property table by its struct page. In the future, it is going to
be convenient to track both PA and VA for that page instead. Let's
do that.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pending tables for the redistributors are currently allocated
one at a time as each CPU boots. This is causing some grief
for Linux/RT (allocation from within a CPU hotplug notifier is
frown upon).
Let's move this allocation to take place at init time, when we
only have a single CPU. It means we're allocating memory for CPUs
that are not online yet, but most system will boot all of their
CPUs anyway, so that's not completely wasted.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we're going to reuse some pre-allocated memory for the property
table, split out the zeroing of that table into a separate function
for later use.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
LPI_PENDING_SZ is always used in conjunction with a max(), which doesn't
make much sense, since we're guaranteed that LPI_PENDING_SZ is already
aligned to 64K. Let's remove it.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We currently initialize the LPIs (and the ITS) fairly early, even
before the SMP support and the CPU interface. This is a bit odd
(as LPIs are not exactly crutial for the early boot process),
and is going to cause issues when reorganizing the probing code.
Let's move this initialization later.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Automatic NUMA Balancing uses a multi-stage pass to decide whether a page
should migrate to a local node. This filter avoids excessive ping-ponging
if a page is shared or used by threads that migrate cross-node frequently.
Threads inherit both page tables and the preferred node ID from the
parent. This means that threads can trigger hinting faults earlier than
a new task which delays scanning for a number of seconds. As it can be
load balanced very early in its lifetime there can be an unnecessary delay
before it starts migrating thread-local data. This patch migrates private
pages faster early in the lifetime of a thread using the sequence counter
as an identifier of new tasks.
With this patch applied, STREAM performance is the same as 4.17 even though
processes are not spread cross-node prematurely. Other workloads showed
a mix of minor gains and losses. This is somewhat expected most workloads
are not very sensitive to the starting conditions of a process.
4.19.0-rc5 4.19.0-rc5 4.17.0
numab-v1r1 fastmigrate-v1r1 vanilla
MB/sec copy 43298.52 ( 0.00%) 47335.46 ( 9.32%) 47219.24 ( 9.06%)
MB/sec scale 30115.06 ( 0.00%) 32568.12 ( 8.15%) 32527.56 ( 8.01%)
MB/sec add 32825.12 ( 0.00%) 36078.94 ( 9.91%) 35928.02 ( 9.45%)
MB/sec triad 32549.52 ( 0.00%) 35935.94 ( 10.40%) 35969.88 ( 10.51%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181001100525.29789-3-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rate limiting of page migrations due to automatic NUMA balancing was
introduced to mitigate the worst-case scenario of migrating at high
frequency due to false sharing or slowly ping-ponging between nodes.
Since then, a lot of effort was spent on correctly identifying these
pages and avoiding unnecessary migrations and the safety net may no longer
be required.
Jirka Hladky reported a regression in 4.17 due to a scheduler patch that
avoids spreading STREAM tasks wide prematurely. However, once the task
was properly placed, it delayed migrating the memory due to rate limiting.
Increasing the limit fixed the problem for him.
Currently, the limit is hard-coded and does not account for the real
capabilities of the hardware. Even if an estimate was attempted, it would
not properly account for the number of memory controllers and it could
not account for the amount of bandwidth used for normal accesses. Rather
than fudging, this patch simply eliminates the rate limiting.
However, Jirka reports that a STREAM configuration using multiple
processes achieved similar performance to 4.16. In local tests, this patch
improved performance of STREAM relative to the baseline but it is somewhat
machine-dependent. Most workloads show little or not performance difference
implying that there is not a heavily reliance on the throttling mechanism
and it is safe to remove.
STREAM on 2-socket machine
4.19.0-rc5 4.19.0-rc5
numab-v1r1 noratelimit-v1r1
MB/sec copy 43298.52 ( 0.00%) 44673.38 ( 3.18%)
MB/sec scale 30115.06 ( 0.00%) 31293.06 ( 3.91%)
MB/sec add 32825.12 ( 0.00%) 34883.62 ( 6.27%)
MB/sec triad 32549.52 ( 0.00%) 34906.60 ( 7.24%
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix incorrect line number in example output
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1538391663-54524-1-git-send-email-andrew.murray@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This code needs to select function #0, which is the first int in the
array of functions, not the number 0 which may or may not be the
function for "GPIO mode" per the enum mapping. We were getting lucky on
SDM845, where this was tested, because the function 0 matched the enum
value for "GPIO mode". On other platforms, e.g. MSM8996, the gpio enum
value is the last one in the list so this code doesn't work and we see a
warning at boot. Fix it by grabbing the first element out of the array
of functions.
Cc: Doug Anderson <dianders@chromium.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Fixes: 1de7ddb3a1 ("pinctrl: msm: Mux out gpio function with gpio_request()")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Since 890658b7ab ("locking/mutex: Kill arch specific code"), there
are no mutex header files under arch/, so we can remove the redundant
entry from MAINTAINERS.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Jason Low <jason.low2@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181001142856.GC9716@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Amend the changes in commit:
1f03e8d291 ("locking/barriers: Replace smp_cond_acquire() with smp_cond_load_acquire()")
... by updating the documentation accordingly.
Also remove some obsolete information related to the implementation.
Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-arch@vger.kernel.org
Cc: parri.andrea@gmail.com
Link: http://lkml.kernel.org/r/20180926182920.27644-5-paulmck@linux.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>