The __diag260() inline asm temporarily changes the program check new
psw to redirect a potential program check on the diag instruction.
Restoring of the program check new psw is done in C code behind the
inline asm.
This can be problematic, especially if the function is inlined, since
the compiler can reorder instructions in such a way that a different
instruction, which may result in a program check, might be executed
before the program check new psw has been restored.
To avoid such a scenario move restoring into the inline asm. For
consistency reasons move also saving of the original program check new
psw into the inline asm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The __diag308() inline asm temporarily changes the program check new
psw to redirect a potential program check on the diag instruction.
Restoring of the program check new psw is done in C code behind the
inline asm.
This can be problematic, especially if the function is inlined, since
the compiler can reorder instructions in such a way that a different
instruction, which may result in a program check, might be executed
before the program check new psw has been restored.
To avoid such a scenario move restoring into the inline asm. For
consistency reasons move also saving of the original program check new
psw into the inline asm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove the hvc_iucv_driver, but keep the device struct around so that
it can continue to provide the hvc_iucv_dev_attr_groups attributes.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Enable ztsd support in s390/boot, to enable booting with zstd
compressed kernel when configured with CONFIG_KERNEL_ZSTD=y.
BOOT_HEAP_SIZE is defined to 0x30000 in this case. Actual decompressor
memory usage with allyesconfig is currently 0x26150.
BugLink: https://bugs.launchpad.net/bugs/1931725
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
cc: Heiko Carstens <hca@linux.ibm.com>
cc: Vasily Gorbik <gor@linux.ibm.com>
cc: Christian Borntraeger <borntraeger@de.ibm.com>
cc: linux-s390@vger.kernel.org
Link: https://lore.kernel.org/r/20210615114150.325080-1-dimitri.ledkov@canonical.com
[gor: added BOOT_HEAP_SIZE for zstd]
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently BOOT_HEAP_SIZE is always defined as 0x400000 due to
bogus condition. Use CONFIG_KERNEL_BZIP2 instead of
CONFIG_HAVE_KERNEL_BZIP2 to correct that.
BOOT_HEAP_SIZE of 0x10000 is still good enough for every decompressor
algorithm but bzip2. Actual decompressor memory usage with allyesconfig
is the following:
gzip 0xbc28
bzip2 0x379518
xz 0x7410
lzma 0x3e6c
lzo 0
lz4 0
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove register asm usage from diag8_noresponse() since it wasn't
needed at all. There is no requirement for even/odd register pairs for
diag 0x8.
For diag_response() use register pairs to fulfill the rx+1 and ry+1
requirements as required if a response buffer is specified. Also
change the inline asm to return the condition code of the diagnose
instruction and do the conditional handling of response length
calculation in C.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Get rid of register asm statement and use a register pair.
This allows the compiler to allocate registers on its own.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Introduce a register pair union, which is supposed to be used for
inline assemblies where instructions require parameters in even/odd
numbered register pairs.
This is more or less the same register pair construct which was
available for 31 bit builds which was removed with commit 5a79859ae0
("s390: remove 31 bit support").
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover vmlogrdr-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover sclp base-related power management code. Note that we
keep the registration of the sclp platform driver since it is used to
externalize non-PM related attributes in sysfs.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover sclp quiesce-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover sclp memory hotplug-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover sclp vt220-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover sclp console-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover monwriter-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover monreader-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover xpram-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Power management support was removed for s390 with
commit 394216275c ("s390: remove broken hibernate / power management
support").
Remove leftover dcssblk-related power management code.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently css_wait_for_slow_path() gets called inside the chp->lock.
The path-verification-loop of slowpath inside this lock could lead to
deadlock as reported by the lockdep validator.
The ccw_device_get_chp_desc() during the instance of a device-set-online
would try to acquire the same 'chp->lock' to read the chp->desc.
The instance of this function can get called from multiple scenario,
like probing or setting-device online manually. This could, in some
corner-cases lead to the deadlock.
lockdep validator reported this as,
CPU0 CPU1
---- ----
lock(&chp->lock);
lock(kn->active#43);
lock(&chp->lock);
lock((wq_completion)cio);
The chp->lock was introduced to serialize the access of struct
channel_path. This lock is not needed for the css_wait_for_slow_path()
function, so invoke the slow-path function outside this lock.
Fixes: b730f3a933 ("[S390] cio: add lock to struct channel_path")
Cc: <stable@vger.kernel.org>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
All s390 irqflags functions are very small and should be always inlined.
Therefore mark them __always_inline. This also allows to get rid of the
rather odd notrace attribute for these small functions, which was only
added to prevent tracing iff any of these functions would not be inlined.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
s390 is the only architecture which makes use of the __no_kasan_or_inline
attribute for two functions. Given that both stap() and __load_psw_mask()
are very small functions they can and should be always inlined anyway.
Therefore get rid of __no_kasan_or_inline and always inline these
functions.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When read via debugfs, s390dbf debug-views print the kernel address of
the call-site that created a trace entry. The kernel's %p pointer
hashing feature obfuscates this address, and commit 860ec7c6e2
("s390/debug: use pK for kernel pointers") made this obfuscation
configurable via the kptr_restrict sysctl.
Obfuscation of kernel address data printed via s390dbf debug-views does
not add any additional protection since the associated debugfs files are
only accessible to the root user that typically has enough other means
to obtain kernel address data.
Also trace payload data may contain binary representations of kernel
addresses as part of logged data structues. Requiring such payload data
to be obfuscated as well would be impractical and greatly diminish the
use of s390dbf.
Therefore completely remove pointer obfuscation from s390dbf
debug-views.
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since OLDMEM_BASE/OLDMEM_SIZE is already taken into consideration and is
reflected in ident_map_size. reserve/remove_oldmem() is no longer needed
and could be removed.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently there are two separate places where kernel memory layout has
to be known and adjusted:
1. early kasan setup.
2. paging setup later.
Those 2 places had to be kept in sync and adjusted to reflect peculiar
technical details of one another. With additional factors which influence
kernel memory layout like ultravisor secure storage limit, complexity
of keeping two things in sync grew up even more.
Besides that if we look forward towards creating identity mapping and
enabling DAT before jumping into uncompressed kernel - that would also
require full knowledge of and control over kernel memory layout.
So, de-duplicate and move kernel memory layout setup logic into
the decompressor.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
into the extcon and pdx86 trees.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmDLVF0UHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9x7CQf/QX3UCjEVClsrGzJtYShP3JaDWDj/
ICb9V4SriUqzIJkbD0AaN0mOv2cRjLvN9H50p9YgNhVlAjXTcbG5UNJQB9DR5alN
SlHJ9fr3nVgzSVaTKiZoxo3+ICIDL1PrFyLUmVtZ0K63BKn73JDB8VqkR6N+sj1S
J70MoV5MLWAgu5Dth8FBL4L4hZn46VTMaRols7AaOWx6mdqoOGR7eEOaIvGlc3c8
D8/EsT2wyW9rBXbdZrkZ1sutaIkmw9j4LKUyI7dyWV9OXFJnB/dB++AbiFvVHzha
WBbpLsq4EToxYiZsxE2gNlNfUr4FLOK044sjFvCarPDJA79bcTQXyle5cQ==
=hnVq
-----END PGP SIGNATURE-----
Merge tag 'devm-helpers-v5.14-1' into review-hans
Signed tag for the immutable devm-helpers branch for merging
into the extcon and pdx86 trees.
There is a problem in mapping CPU to a PCI device instance when the
bus numbers are reused in different packages. This was observed on
some Sapphire Rapids systems.
The current implementation reads bus number assigned to a CPU package
via MSR 0x128. This allows to establish relationship between a CPU
and a PCI device. This allows to update power related parameters to a
MMIO offset in a PCI device space which is unique to a CPU. But if
two packages uses same bus number then this mapping will not be unique.
When bus number is reused, PCI device will use different domain number
or segment number. So we need to be aware of this domain information
while matching CPU to PCI bus number. This domain information is not
available via any MSR. So need to use ACPI numa node information.
There is an interface already available in the Linux to read numa
node for a CPU and a PCI device. This change uses this interface
to check the numa node of a match PCI device with bus number.
If the bus number and numa node matches with the CPU's assigned
bus number and numa node, the matched PCI device instance will be
returned to the caller.
It is possible that before Sapphire Rapids, the numa node is not
defined for the Speed Select PCI device in some OEM systems. In this
case to restore old behavior, return the last matched PCI device
for domain 0 unlsess there are more than one matches.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20210616221329.1909276-2-srinivas.pandruvada@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
It was observed that some of the high performance benchmarks are spending
more time in kernel depending on which CPU package they are executing.
The difference is significant and benchmark scores varies more than 10%.
These benchmarks adjust class of service to improve thread performance
which run in parallel. This class of service change causes access to
MMIO region of Intel Speed Select PCI devices depending on the CPU
package they are executing.
This mapping from CPU to PCI device instance uses a standard Linux PCI
interface "pci_get_domain_bus_and_slot()". This function does a linear
search to get to a PCI device. Since these platforms have 100+ PCI
devices, this search can be expensive in fast path for benchmarks.
Since the device and function of PCI device is fixed for Intel
Speed Select PCI devices, the CPU to PCI device information can be cached
at the same time when bus number for the CPU is read. In this way during
runtime the cached information can be used. This improves performance
of these benchmarks significantly.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20210616221329.1909276-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
This release adds following change:
- Fix reporting of memory frequency
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The uncore memory frequency value from the mailbox command
CONFIG_TDP_GET_MEM_FREQ needs to be scaled based on the platform for
display. There is no single constant multiplier.
This change introduces CPU model specific memory frequency multiplier.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fixes for the PMUv3 emulation of PMCR_EL0:
- Don't spuriously reset the cycle counter when resetting other counters
- Force PMCR_EL0 to become effective after having restored it
* kvm-arm64/pmu-fixes:
KVM: arm64: Restore PMU configuration on first run
KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set
The assignment of iommu from info->iommu occurs before info is null checked
hence leading to a potential null pointer dereference issue. Fix this by
assigning iommu and checking if iommu is null after null checking info.
Addresses-Coverity: ("Dereference before null check")
Fixes: 4c82b88696 ("iommu/vt-d: Allocate/register iopf queue for sva devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210611135024.32781-1-colin.king@canonical.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Restoring a guest with an active virtual PMU results in no perf
counters being instanciated on the host side. Not quite what
you'd expect from a restore.
In order to fix this, force a writeback of PMCR_EL0 on the first
run of a vcpu (using a new request so that it happens once the
vcpu has been loaded). This will in turn create all the host-side
counters that were missing.
Reported-by: Jinank Jain <jinankj@amazon.de>
Tested-by: Jinank Jain <jinankj@amazon.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87wnrbylxv.wl-maz@kernel.org
Link: https://lore.kernel.org/r/b53dfcf9bbc4db7f96154b1cd5188d72b9766358.camel@amazon.de
The trace_clock_global() tries to make sure the events between CPUs is
somewhat in order. A global value is used and updated by the latest read
of a clock. If one CPU is ahead by a little, and is read by another CPU, a
lock is taken, and if the timestamp of the other CPU is behind, it will
simply use the other CPUs timestamp.
The lock is also only taken with a "trylock" due to tracing, and strange
recursions can happen. The lock is not taken at all in NMI context.
In the case where the lock is not able to be taken, the non synced
timestamp is returned. But it will not be less than the saved global
timestamp.
The problem arises because when the time goes "backwards" the time
returned is the saved timestamp plus 1. If the lock is not taken, and the
plus one to the timestamp is returned, there's a small race that can cause
the time to go backwards!
CPU0 CPU1
---- ----
trace_clock_global() {
ts = clock() [ 1000 ]
trylock(clock_lock) [ success ]
global_ts = ts; [ 1000 ]
<interrupted by NMI>
trace_clock_global() {
ts = clock() [ 999 ]
if (ts < global_ts)
ts = global_ts + 1 [ 1001 ]
trylock(clock_lock) [ fail ]
return ts [ 1001]
}
unlock(clock_lock);
return ts; [ 1000 ]
}
trace_clock_global() {
ts = clock() [ 1000 ]
if (ts < global_ts) [ false 1000 == 1000 ]
trylock(clock_lock) [ success ]
global_ts = ts; [ 1000 ]
unlock(clock_lock)
return ts; [ 1000 ]
}
The above case shows to reads of trace_clock_global() on the same CPU, but
the second read returns one less than the first read. That is, time when
backwards, and this is not what is allowed by trace_clock_global().
This was triggered by heavy tracing and the ring buffer checker that tests
for the clock going backwards:
Ring buffer clock went backwards: 20613921464 -> 20613921463
------------[ cut here ]------------
WARNING: CPU: 2 PID: 0 at kernel/trace/ring_buffer.c:3412 check_buffer+0x1b9/0x1c0
Modules linked in:
[..]
[CPU: 2]TIME DOES NOT MATCH expected:20620711698 actual:20620711697 delta:6790234 before:20613921463 after:20613921463
[20613915818] PAGE TIME STAMP
[20613915818] delta:0
[20613915819] delta:1
[20613916035] delta:216
[20613916465] delta:430
[20613916575] delta:110
[20613916749] delta:174
[20613917248] delta:499
[20613917333] delta:85
[20613917775] delta:442
[20613917921] delta:146
[20613918321] delta:400
[20613918568] delta:247
[20613918768] delta:200
[20613919306] delta:538
[20613919353] delta:47
[20613919980] delta:627
[20613920296] delta:316
[20613920571] delta:275
[20613920862] delta:291
[20613921152] delta:290
[20613921464] delta:312
[20613921464] delta:0 TIME EXTEND
[20613921464] delta:0
This happened more than once, and always for an off by one result. It also
started happening after commit aafe104aa9 was added.
Cc: stable@vger.kernel.org
Fixes: aafe104aa9 ("tracing: Restructure trace_clock_global() to never block")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
A while ago, when the "trace" file was opened, tracing was stopped, and
code was added to stop recording the comms to saved_cmdlines, for mapping
of the pids to the task name.
Code has been added that only records the comm if a trace event occurred,
and there's no reason to not trace it if the trace file is opened.
Cc: stable@vger.kernel.org
Fixes: 7ffbd48d5c ("tracing: Cache comms only after an event occurred")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The saved_cmdlines is used to map pids to the task name, such that the
output of the tracing does not just show pids, but also gives a human
readable name for the task.
If the name is not mapped, the output looks like this:
<...>-1316 [005] ...2 132.044039: ...
Instead of this:
gnome-shell-1316 [005] ...2 132.044039: ...
The names are updated when tracing is running, but are skipped if tracing
is stopped. Unfortunately, this stops the recording of the names if the
top level tracer is stopped, and not if there's other tracers active.
The recording of a name only happens when a new event is written into a
ring buffer, so there is no need to test if tracing is on or not. If
tracing is off, then no event is written and no need to test if tracing is
off or not.
Remove the check, as it hides the names of tasks for events in the
instance buffers.
Cc: stable@vger.kernel.org
Fixes: 7ffbd48d5c ("tracing: Cache comms only after an event occurred")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>