Add a test case that does some basic verification of the SVE ptrace
interface, forking off a child with known values in the registers and
then using ptrace to inspect and manipulate the SVE registers of the
child, including in FPSIMD mode to account for sharing between the SVE
and FPSIMD registers.
This program was written by Dave Martin and modified for kselftest by
me.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200819114837.51466-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add a test case that verifies that we can enumerate the SVE vector lengths
on systems where we detect SVE, and that those SVE vector lengths are
valid. This program was written by Dave Martin and adapted to kselftest by
me.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200819114837.51466-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
PAuth adds 5 different keys that can be used to sign addresses.
Add a test that verifies that the kernel initializes them to different
values and preserves them across context switches.
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200918104715.182310-5-boian4o1@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Kernel documentation states that it will change PAuth keys on exec() calls.
Verify that all keys are correctly switched to new ones.
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200918104715.182310-4-boian4o1@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
PAuth adds sign/verify controls to enable and disable groups of
instructions in hardware for compatibility with libraries that do not
implement PAuth. The kernel always enables them if it detects PAuth.
Add a test that checks that each group of instructions is enabled, if the
kernel reports PAuth as detected.
Note: For groups, for the purpose of this patch, we intend instructions
that use a certain key.
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200918104715.182310-3-boian4o1@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
PAuth signs and verifies return addresses on the stack. It does so by
inserting a Pointer Authentication code (PAC) into some of the unused top
bits of an address. This is achieved by adding paciasp/autiasp instructions
at the beginning and end of a function.
This feature is partially backwards compatible with earlier versions of the
ARM architecture. To coerce the compiler into emitting fully backwards
compatible code the main file is compiled to target an earlier ARM version.
This allows the tests to check for the feature and print meaningful error
messages instead of crashing.
Add a test to verify that corrupting the return address results in a
SIGSEGV on return.
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200918104715.182310-2-boian4o1@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
This change exposes write-combine mappings under sysfs for
prefetchable PCI resources on arm64.
Originally, the usage of "write combine" here was driven by the x86
definition of write combine. This definition is specific to x86 and
does not generalize to other architectures. However, the usage of WC
has mutated to "write combine" semantics, which is implemented
differently on each arch.
Generally, prefetchable BARs are accepted to allow speculative
accesses, write combining, and re-ordering-- from the PCI perspective,
this means there are no read side effects. (This contradicts the PCI
spec which allows prefetchable BARs to have read side effects, but
this definition is ill-advised as it is impossible to meet.) On x86,
prefetchable BARs are mapped as WC as originally defined (with some
conditionals on arch features). On arm64, WC is taken to mean normal
non-cacheable memory.
In practice, write combine semantics are used to minimize write
operations. A common usage of this is minimizing PCI TLPs which can
significantly improve performance with PCI devices. In order to
provide the same benefits to userspace, we need to allow userspace to
map prefetchable BARs with write combine semantics. The resourceX_wc
mapping is used today by userspace programs and libraries.
While this model is flawed as "write combine" is very ill-defined, it
is already used by multiple non-x86 archs to expose write combine
semantics to user space. We enable this on arm64 to give userspace on
arm64 an equivalent mechanism for utilizing write combining with PCI
devices.
Signed-off-by: Clint Sbisa <csbisa@amazon.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200918033312.ddfpibgfylfjpex2@amazon.com
Signed-off-by: Will Deacon <will@kernel.org>
Add support for probing device from ACPI node.
Each DSU ACPI node and its associated cpus are inside a cluster node.
Signed-off-by: Tuan Phan <tuanphan@os.amperecomputing.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/1600106656-9542-1-git-send-email-tuanphan@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
The function __{pgd, pud, pmd, pte}_error() are introduced so that
they can be called by {pgd, pud, pmd, pte}_ERROR(). However, some
of the functions could never be called when the corresponding page
table level isn't enabled. For example, __{pud, pmd}_error() are
unused when PUD and PMD are folded to PGD.
This removes __{pgd, pud, pmd, pte}_error() and call pr_err() from
{pgd, pud, pmd, pte}_ERROR() directly, similar to what x86/powerpc
are doing. With this, the code looks a bit simplified either.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20200913234730.23145-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
The existing comment about steppable hint instruction is not complete
and only describes NOP instructions as steppable. As the function
aarch64_insn_is_steppable_hint allows all white-listed instruction
to be probed so the comment is updated to reflect this.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-7-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
With the addition of ARMv8.3-FPAC feature, the probe of authenticate
ptrauth instructions (AUT*) may cause ptrauth fault exception in case of
authenticate failure so they cannot be safely single stepped.
Hence the probe of authenticate instructions is disallowed but the
corresponding pac ptrauth instruction (PAC*) is not affected and they can
still be probed. Also AUTH* instructions do not make sense at function
entry points so most realistic probes would be unaffected by this change.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-6-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
The current address authentication cpufeature levels are set as LOWER_SAFE
which is not compatible with the different configurations added for Armv8.3
ptrauth enhancements as the different levels have different behaviour and
there is no tunable to enable the lower safe versions. This is rectified
by setting those cpufeature type as EXACT.
The current cpufeature framework also does not interfere in the booting of
non-exact secondary cpus but rather marks them as tainted. As a workaround
this is fixed by replacing the generic match handler with a new handler
specific to ptrauth.
After this change, if there is any variation in ptrauth configurations in
secondary cpus from boot cpu then those mismatched cpus are parked in an
infinite loop.
Following ptrauth crash log is observed in Arm fastmodel with simulated
mismatched cpus without this fix,
CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64ISAR1_EL1. Boot CPU: 0x11111110211402, CPU4: 0x11111110211102
CPU features: Unsupported CPU feature variation detected.
GICv3: CPU4: found redistributor 100 region 0:0x000000002f180000
CPU4: Booted secondary processor 0x0000000100 [0x410fd0f0]
Unable to handle kernel paging request at virtual address bfff800010dadf3c
Mem abort info:
ESR = 0x86000004
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
[bfff800010dadf3c] address between user and kernel address ranges
Internal error: Oops: 86000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 PID: 29 Comm: migration/4 Tainted: G S 5.8.0-rc4-00005-ge658591d66d1-dirty #158
Hardware name: Foundation-v8A (DT)
pstate: 60000089 (nZCv daIf -PAN -UAO BTYPE=--)
pc : 0xbfff800010dadf3c
lr : __schedule+0x2b4/0x5a8
sp : ffff800012043d70
x29: ffff800012043d70 x28: 0080000000000000
x27: ffff800011cbe000 x26: ffff00087ad37580
x25: ffff00087ad37000 x24: ffff800010de7d50
x23: ffff800011674018 x22: 0784800010dae2a8
x21: ffff00087ad37000 x20: ffff00087acb8000
x19: ffff00087f742100 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: ffff800011ac1000 x14: 00000000000001bd
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 71519a147ddfeb82
x9 : 825d5ec0fb246314 x8 : ffff00087ad37dd8
x7 : 0000000000000000 x6 : 00000000fffedb0e
x5 : 00000000ffffffff x4 : 0000000000000000
x3 : 0000000000000028 x2 : ffff80086e11e000
x1 : ffff00087ad37000 x0 : ffff00087acdc600
Call trace:
0xbfff800010dadf3c
schedule+0x78/0x110
schedule_preempt_disabled+0x24/0x40
__kthread_parkme+0x68/0xd0
kthread+0x138/0x160
ret_from_fork+0x10/0x34
Code: bad PC value
After this fix, the mismatched CPU4 is parked as,
CPU features: CPU4: Detected conflict for capability 39 (Address authentication (IMP DEF algorithm)), System: 1, CPU: 0
CPU4: will not boot
CPU4: failed to come online
CPU4: died during early boot
[Suzuki: Introduce new matching function for address authentication]
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-5-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Some Armv8.3 Pointer Authentication enhancements have been introduced
which are mandatory for Armv8.6 and optional for Armv8.3. These features
are,
* ARMv8.3-PAuth2 - An enhanced PAC generation logic is added which hardens
finding the correct PAC value of the authenticated pointer.
* ARMv8.3-FPAC - Fault is generated now when the ptrauth authentication
instruction fails in authenticating the PAC present in the address.
This is different from earlier case when such failures just adds an
error code in the top byte and waits for subsequent load/store to abort.
The ptrauth instructions which may cause this fault are autiasp, retaa
etc.
The above features are now represented by additional configurations
for the Address Authentication cpufeature and a new ESR exception class.
The userspace fault received in the kernel due to ARMv8.3-FPAC is treated
as Illegal instruction and hence signal SIGILL is injected with ILL_ILLOPN
as the signal code. Note that this is different from earlier ARMv8.3
ptrauth where signal SIGSEGV is issued due to Pointer authentication
failures. The in-kernel PAC fault causes kernel to crash.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-4-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Some error signal need to pass proper ARM esr error code to userspace to
better identify the cause of the signal. So the function
force_signal_inject is extended to pass this as a parameter. The
existing code is not affected by this change.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-3-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently the ARMv8.3-PAuth combined branch instructions (braa, retaa
etc.) are not simulated for out-of-line execution with a handler. Hence the
uprobe of such instructions leads to kernel warnings in a loop as they are
not explicitly checked and fall into INSN_GOOD categories. Other combined
instructions like LDRAA and LDRBB can be probed.
The issue of the combined branch instructions is fixed by adding
group definitions of all such instructions and rejecting their probes.
The instruction groups added are br_auth(braa, brab, braaz and brabz),
blr_auth(blraa, blrab, blraaz and blrabz), ret_auth(retaa and retab) and
eret_auth(eretaa and eretab).
Warning log:
WARNING: CPU: 0 PID: 156 at arch/arm64/kernel/probes/uprobes.c:182 uprobe_single_step_handler+0x34/0x50
Modules linked in:
CPU: 0 PID: 156 Comm: func Not tainted 5.9.0-rc3 #188
Hardware name: Foundation-v8A (DT)
pstate: 804003c9 (Nzcv DAIF +PAN -UAO BTYPE=--)
pc : uprobe_single_step_handler+0x34/0x50
lr : single_step_handler+0x70/0xf8
sp : ffff800012af3e30
x29: ffff800012af3e30 x28: ffff000878723b00
x27: 0000000000000000 x26: 0000000000000000
x25: 0000000000000000 x24: 0000000000000000
x23: 0000000060001000 x22: 00000000cb000022
x21: ffff800012065ce8 x20: ffff800012af3ec0
x19: ffff800012068d50 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : ffff800010085c90 x8 : 0000000000000000
x7 : 0000000000000000 x6 : ffff80001205a9c8
x5 : ffff80001205a000 x4 : ffff80001233db80
x3 : ffff8000100a7a60 x2 : 0020000000000003
x1 : 0000fffffffff008 x0 : ffff800012af3ec0
Call trace:
uprobe_single_step_handler+0x34/0x50
single_step_handler+0x70/0xf8
do_debug_exception+0xb8/0x130
el0_sync_handler+0x138/0x1b8
el0_sync+0x158/0x180
Fixes: 74afda4016 ("arm64: compile the kernel with ptrauth return address signing")
Fixes: 04ca3204fa ("arm64: enable pointer authentication")
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-2-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Besides disabling MMU, HVC_SOFT_RESTART also clears I+D bits. These behaviors
are what kexec-reboot expects, so describe it more precisely.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-doc@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/1598621998-20563-2-git-send-email-kernelfans@gmail.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
Kernel startup entry point requires disabling MMU and D-cache.
As for kexec-reboot, taking a close look at "msr sctlr_el1, x12" in
__cpu_soft_restart as the following:
-1. booted at EL1
The instruction is enough to disable MMU and I/D cache for
EL1 regime.
-2. booted at EL2, using VHE
Access to SCTLR_EL1 is redirected to SCTLR_EL2 in EL2. So the instruction
is enough to disable MMU and clear I+C bits for EL2 regime.
-3. booted at EL2, not using VHE
The instruction itself can not affect EL2 regime. But The hyp-stub doesn't
enable the MMU and I/D cache for EL2 regime. And KVM also disable them for EL2
regime when its unloaded, or execute a HVC_SOFT_RESTART call. So when
kexec-reboot, the code in KVM has prepare the requirement.
As a conclusion, disabling MMU and clearing I+C bits in
SYM_CODE_START(arm64_relocate_new_kernel) is redundant, and can be removed
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Remi Denis-Courmont <remi.denis.courmont@huawei.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/1598621998-20563-1-git-send-email-kernelfans@gmail.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
Similar to how CONT_PTE_SHIFT is determined, this introduces a new
kernel option (CONFIG_CONT_PMD_SHIFT) to determine CONT_PMD_SHIFT.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-3-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
CONT_PTE_SHIFT actually depends on CONFIG_ARM64_CONT_SHIFT. It's
reasonable to reflect the dependency:
* This renames CONFIG_ARM64_CONT_SHIFT to CONFIG_ARM64_CONT_PTE_SHIFT,
so that we can introduce CONFIG_ARM64_CONT_PMD_SHIFT later.
* CONT_{SHIFT, SIZE, MASK}, defined in page-def.h are removed as they
are not used by anyone.
* CONT_PTE_SHIFT is determined by CONFIG_ARM64_CONT_PTE_SHIFT.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-2-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
The macro was introduced by commit <ecf35a237a> ("arm64: PTE/PMD
contiguous bit definition") at the beginning. It's only used by
commit <348a65cdcb> ("arm64: Mark kernel page ranges contiguous"),
which was reverted later by commit <667c27597c>. This makes the
macro unused.
This removes the unused macro (CONT_RANGE_OFFSET).
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are
scanned for /proc/cpuinfo are detached from their bit definitions making it
vulnerable and difficult to correlate. It is also bit problematic because
during /proc/cpuinfo dump these arrays get traversed sequentially assuming
they reflect and match actual HWCAP bit sequence, to test various features
for a given CPU. This redefines name arrays per their HWCAP bit definitions
. It also warns after detecting any feature which is not expected on arm64.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599630535-29337-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In certain page migration situations, a THP page can be migrated without
being split into it's constituent subpages. This saves time required to
split a THP and put it back together when required. But it also saves an
wider address range translation covered by a single TLB entry, reducing
future page fault costs.
A previous patch changed platform THP helpers per generic memory semantics,
clearing the path for THP migration support. This adds two more THP helpers
required to create PMD migration swap entries. Now enable THP migration via
ARCH_ENABLE_THP_MIGRATION.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599627183-14453-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
pmd_present() and pmd_trans_huge() are expected to behave in the following
manner during various phases of a given PMD. It is derived from a previous
detailed discussion on this topic [1] and present THP documentation [2].
pmd_present(pmd):
- Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
- Returns false if pmd refers to a migration or swap entry
pmd_trans_huge(pmd):
- Returns true if pmd refers to system RAM and is a trans huge mapping
-------------------------------------------------------------------------
| PMD states | pmd_present | pmd_trans_huge |
-------------------------------------------------------------------------
| Mapped | Yes | Yes |
-------------------------------------------------------------------------
| Splitting | Yes | Yes |
-------------------------------------------------------------------------
| Migration/Swap | No | No |
-------------------------------------------------------------------------
The problem:
PMD is first invalidated with pmdp_invalidate() before it's splitting. This
invalidation clears PMD_SECT_VALID as below.
PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID
Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re-
affirm pmd_present() as true during the THP split process. To comply with
above mentioned semantics, pmd_trans_huge() should also check pmd_present()
first before testing presence of an actual transparent huge mapping.
The solution:
Ideally PMD_TYPE_SECT should have been used here instead. But it shares the
bit position with PMD_SECT_VALID which is used for THP invalidation. Hence
it will not be there for pmd_present() check after pmdp_invalidate().
A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD
entry during invalidation which can help pmd_present() return true and in
recognizing the fact that it still points to memory.
This bit is transient. During the split process it will be overridden by a
page table page representing normal pages in place of erstwhile huge page.
Other pmdp_invalidate() callers always write a fresh PMD value on the entry
overriding this transient PMD_PRESENT_INVALID bit, which makes it safe.
[1]: https://lkml.org/lkml/2018/10/17/231
[2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599627183-14453-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In the absence of ACPI or DT topology data, we fallback to haphazardly
decoding *something* out of MPIDR. Sadly, the contents of that register are
mostly unusable due to the implementation leniancy and things like Aff0
having to be capped to 15 (despite being encoded on 8 bits).
Consider a simple system with a single package of 32 cores, all under the
same LLC. We ought to be shoving them in the same core_sibling mask, but
MPIDR is going to look like:
| CPU | 0 | ... | 15 | 16 | ... | 31 |
|------+---+-----+----+----+-----+----+
| Aff0 | 0 | ... | 15 | 0 | ... | 15 |
| Aff1 | 0 | ... | 0 | 1 | ... | 1 |
| Aff2 | 0 | ... | 0 | 0 | ... | 0 |
Which will eventually yield
core_sibling(0-15) == 0-15
core_sibling(16-31) == 16-31
NUMA woes
=========
If we try to play games with this and set up NUMA boundaries within those
groups of 16 cores via e.g. QEMU:
# Node0: 0-9; Node1: 10-19
$ qemu-system-aarch64 <blah> \
-smp 20 -numa node,cpus=0-9,nodeid=0 -numa node,cpus=10-19,nodeid=1
The scheduler's MC domain (all CPUs with same LLC) is going to be built via
arch_topology.c::cpu_coregroup_mask()
In there we try to figure out a sensible mask out of the topology
information we have. In short, here we'll pick the smallest of NUMA or
core sibling mask.
node_mask(CPU9) == 0-9
core_sibling(CPU9) == 0-15
MC mask for CPU9 will thus be 0-9, not a problem.
node_mask(CPU10) == 10-19
core_sibling(CPU10) == 0-15
MC mask for CPU10 will thus be 10-19, not a problem.
node_mask(CPU16) == 10-19
core_sibling(CPU16) == 16-19
MC mask for CPU16 will thus be 16-19... Uh oh. CPUs 16-19 are in two
different unique MC spans, and the scheduler has no idea what to make of
that. That triggers the WARN_ON() added by commit
ccf74128d6 ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")
Fixing MPIDR-derived topology
=============================
We could try to come up with some cleverer scheme to figure out which of
the available masks to pick, but really if one of those masks resulted from
MPIDR then it should be discarded because it's bound to be bogus.
I was hoping to give MPIDR a chance for SMT, to figure out which threads are
in the same core using Aff1-3 as core ID, but Sudeep and Robin pointed out
to me that there are systems out there where *all* cores have non-zero
values in their higher affinity fields (e.g. RK3288 has "5" in all of its
cores' MPIDR.Aff1), which would expose a bogus core ID to userspace.
Stop using MPIDR for topology information. When no other source of topology
information is available, mark each CPU as its own core and its NUMA node
as its LLC domain.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20200829130016.26106-1-valentin.schneider@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Kernel virtual region [BPF_JIT_REGION_START..BPF_JIT_REGION_END] is missing
from address_markers[], hence relevant page table entries are not displayed
with /sys/kernel/debug/kernel_page_tables. This adds those missing markers.
While here, also rename arch/arm64/mm/dump.c which sounds bit ambiguous, as
arch/arm64/mm/ptdump.c instead.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599208259-11191-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
event_idx is obtained from armv8pmu_get_event_idx(), and this idx must be
between ARMV8_IDX_CYCLE_COUNTER and cpu_pmu->num_events. So it's unnecessary
to do this check. Let's remove it.
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Link: https://lore.kernel.org/r/1599213458-28394-1-git-send-email-liuqi115@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
TEXT_OFFSET serves no purpose, and for this reason, it was redefined
as 0x0 in the v5.8 timeframe. Since this does not appear to have caused
any issues that require us to revisit that decision, let's get rid of the
macro entirely, along with any references to it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200825135440.11288-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Since commit 8212688600 ("ACPI/IORT: Fix build error when IOMMU_SUPPORT
is disabled"), iort_fwspec_iommu_ops() and iort_add_device_replay() are not
needed anymore when CONFIG_IOMMU_API is not selected. Let's remove them.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/20200818063625.980-3-yuzenghui@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Since commit d2e1a003af ("ACPI/IORT: Don't call iommu_ops->add_device
directly"), we use the IOMMU core API to replace a direct invoke of the
specified callback. The parameter @ops has therefore became unused. Let's
drop it.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/20200818063625.980-2-yuzenghui@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
MODULE_*** is used in HiSilicon uncore PMU drivers and is provided by
linux/module.h, but the header file is not directly included. Add the
missing include.
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1599186097-18599-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Will Deacon <will@kernel.org>
This patch is to add the general hardware last level cache (LLC) events
for PMUv3: one event is for LLC access and another is for LLC miss.
With this change, perf tool can support last level cache profiling,
below is an example to demonstrate the usage on Arm64:
$ perf stat -e LLC-load-misses -e LLC-loads -- \
perf bench mem memcpy -s 1024MB -l 100 -f default
[...]
Performance counter stats for 'perf bench mem memcpy -s 1024MB -l 100 -f default':
35,534,262 LLC-load-misses # 2.16% of all LL-cache hits
1,643,946,443 LLC-loads
[...]
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200811053505.21223-1-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Currently, there are different description strings in die() such as
die("Oops",,), die("Oops - BUG",,). And panic() called by die() will
always show "Fatal exception" or "Fatal exception in interrupt".
Note that panic() will run any panic handler via panic_notifier_list.
And the string above will be formatted and placed in static buf[]
which will be passed to handler.
So panic handler can not distinguish which Oops it is from the buf if
we want to do some things like reserve the string in memory or panic
statistics. It's not benefit to debug. We need to add more codes to
troubleshoot. Let's utilize existing resource to make debug much simpler.
Signed-off-by: Yue Hu <huyue2@yulong.com>
Link: https://lore.kernel.org/r/20200804085347.10720-1-zbestahu@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
There's really no need to put every parameter on a new line when calling
a function with a long name, so reformat the *setup_additional_pages()
functions in the vDSO setup code to follow the usual conventions.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Most of the compat vDSO code can be built and guarded using IS_ENABLED,
so drop the unnecessary #ifdefs.
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Pull crypto fixes from Herbert Xu:
- fix regression in af_alg that affects iwd
- restore polling delay in qat
- fix double free in ingenic on error path
- fix potential build failure in sa2ul due to missing Kconfig dependency
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: af_alg - Work around empty control messages without MSG_MORE
crypto: sa2ul - add Kconfig selects to fix build error
crypto: ingenic - Drop kfree for memory allocated with devm_kzalloc
crypto: qat - add delay before polling mailbox
- Move disabling of the local APIC after invoking fixup_irqs() to ensure
that interrupts which are incoming are noted in the IRR and not ignored.
- Unbreak affinity setting. The rework of the entry code reused the
regular exception entry code for device interrupts. The vector number is
pushed into the errorcode slot on the stack which is then lifted into an
argument and set to -1 because that's regs->orig_ax which is used in
quite some places to check whether the entry came from a syscall. But it
was overlooked that orig_ax is used in the affinity cleanup code to
validate whether the interrupt has arrived on the new target. It turned
out that this vector check is pointless because interrupts are never
moved from one vector to another on the same CPU. That check is a
historical leftover from the time where x86 supported multi-CPU
affinities, but not longer needed with the now strict single CPU
affinity. Famous last words ...
- Add a missing check for an empty cpumask into the matrix allocator. The
affinity change added a warning to catch the case where an interrupt is
moved on the same CPU to a different vector. This triggers because a
condition with an empty cpumask returns an assignment from the allocator
as the allocator uses for_each_cpu() without checking the cpumask for
being empty. The historical inconsistent for_each_cpu() behaviour of
ignoring the cpumask and unconditionally claiming that CPU0 is in the
mask striked again. Sigh.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L6WYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRV5D/9dRq/4pn5g1esnzm4GhIr2To3Qp6cl
s7VswTdN8FmWBqVz79ZVYqj663UpL3pPY1np01ctrxRQLeDVfWcI2BMR5irnny8h
otORhFysuDUl+yfuomWVbzfQQNJ+VeQVeWKD3cIhD1I3sXqDX5Wpa8n086hYKQXx
eutVC3+JdzJZFm68xarlLW7h2f1au1eZZFgVnyY+J5KO9Dwm63a4RITdDVk7KV4t
uKEDza5P9SY+kE9LAGNq8BAEObf9FeMXw0mRM7atRKVsJQQGVk6bgiuaRr01w1+W
hQCPx/3g6PHFnGgx/KQgHf1jgrZFhXOyIDo6ZeFy+SJGIZRB3n8o5Kjns2l8Pa+K
2qy1TRoZIsGkwGCi/BM6viLzBikbh/gnGYy/8KTEJdKs8P3ZKHUZVSAB1dpapOWX
4n+rKoVPnvxgRSeZZo+tgLkvUdh+/9Huyr9vHiYjtbbB8tFvjlkOmrZ6sirHByDy
jg6TjOJVb1CC/PoW4M7JNfmeKvHQnTACwH6djdVGDLPJspuUsYkPI0Uk0CX21SA3
45Tuylvl9jT6+vq95Av2RbAiipmSpZ/O1NHV8Paf466SKmhUgG3lv5PHh3xTm1U2
Be/RbJ75x4Muuw42ttU1LcpcLPcOZRQNEREoNd5UysgYYgWRekBvU+ZRQNW4g2nw
3JDgJgm0iBUN9w==
=zIi4
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Three interrupt related fixes for X86:
- Move disabling of the local APIC after invoking fixup_irqs() to
ensure that interrupts which are incoming are noted in the IRR and
not ignored.
- Unbreak affinity setting.
The rework of the entry code reused the regular exception entry
code for device interrupts. The vector number is pushed into the
errorcode slot on the stack which is then lifted into an argument
and set to -1 because that's regs->orig_ax which is used in quite
some places to check whether the entry came from a syscall.
But it was overlooked that orig_ax is used in the affinity cleanup
code to validate whether the interrupt has arrived on the new
target. It turned out that this vector check is pointless because
interrupts are never moved from one vector to another on the same
CPU. That check is a historical leftover from the time where x86
supported multi-CPU affinities, but not longer needed with the now
strict single CPU affinity. Famous last words ...
- Add a missing check for an empty cpumask into the matrix allocator.
The affinity change added a warning to catch the case where an
interrupt is moved on the same CPU to a different vector. This
triggers because a condition with an empty cpumask returns an
assignment from the allocator as the allocator uses for_each_cpu()
without checking the cpumask for being empty. The historical
inconsistent for_each_cpu() behaviour of ignoring the cpumask and
unconditionally claiming that CPU0 is in the mask struck again.
Sigh.
plus a new entry into the MAINTAINER file for the HPE/UV platform"
* tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
x86/irq: Unbreak interrupt affinity setting
x86/hotplug: Silence APIC only after all interrupts are migrated
MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
- Revert the platform driver conversion of interrupt chip drivers as it
turned out to create more problems than it solves.
- Fix a trivial typo in the new module helpers which made probing reliably
fail.
- Small fixes in the STM32 and MIPS Ingenic drivers
- The TI firmware rework which had badly managed dependencies and had to
wait post rc1.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L4wITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYobWiEACmMdJl1ZHC+IcKzAVngX82rJUQ/4dC
zGBRnddkPYjobXpK+w+W28vdPHYcGZ4de89X1N0l6zocSRQh2NdmJ080ivH8LD/V
AaT82Sf8mjuulUPolyfiF09+e4OeBUC94CsyM2LkGmicoMLeWyWv4EEh95anh3s9
GsaZ6cW6Cvq1whK2SSCoa5V1pkCGLIMxw2gV4DDw/s2RuqwrIGeleq9bcMNA8RVS
eBF91N6nUpXI9PMS68p72ZcRbpK0cKTARt99lS4cDEOS11b0jsRxZ525Xem736V/
5zgXvYOdErZaBRO3WeSaxlYTwwRiLKTRNOL4ZFgI1xQEK4Pcv9S+80zaXSvDXSsn
s6LzXYifgPCE6edIXKAMwcNcbRXcGdn4PZpSTwvtNJpbVskRQgPmtEVDUD+J5upU
7NN/myosEkA5JUQSWdaaaa4nLe4H8qmxnQrn+B41NzzikekZuN+ai9OAOTTdsv1j
W+hNhX78qgVbYtm4/8mOQp4BAIOIGPDTgw3WfXYW3FbecEM6EQBEQgT9E1RCivTX
cETaV34jJY7XGm3iVQ97jLsKBhK0lcIfJwltuY1Ck+4o9fqZkJ0iZ045EArJMms6
sG+tp6+0MWPkLf2P31qdLKTMt/9HZCHao7F1K6Fz218k4mwGHa12V/OUgPkdr8vJ
hb57EF+zXOaFag==
=guRc
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Revert the platform driver conversion of interrupt chip drivers as
it turned out to create more problems than it solves.
- Fix a trivial typo in the new module helpers which made probing
reliably fail.
- Small fixes in the STM32 and MIPS Ingenic drivers
- The TI firmware rework which had badly managed dependencies and had
to wait post rc1"
* tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ingenic: Leave parent IRQ unmasked on suspend
irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake
irqchip: Revert modular support for drivers using IRQCHIP_PLATFORM_DRIVER helperse
irqchip: Fix probing deferal when using IRQCHIP_PLATFORM_DRIVER helpers
arm64: dts: k3-am65: Update the RM resource types
arm64: dts: k3-am65: ti-sci-inta/intr: Update to latest bindings
arm64: dts: k3-j721e: ti-sci-inta/intr: Update to latest bindings
irqchip/ti-sci-inta: Add support for INTA directly connecting to GIC
irqchip/ti-sci-inta: Do not store TISCI device id in platform device id field
dt-bindings: irqchip: Convert ti, sci-inta bindings to yaml
dt-bindings: irqchip: ti, sci-inta: Update docs to support different parent.
irqchip/ti-sci-intr: Add support for INTR being a parent to INTR
dt-bindings: irqchip: Convert ti, sci-intr bindings to yaml
dt-bindings: irqchip: ti, sci-intr: Update bindings to drop the usage of gic as parent
firmware: ti_sci: Add support for getting resource with subtype
firmware: ti_sci: Drop unused structure ti_sci_rm_type_map
firmware: ti_sci: Drop the device id to resource type translation
- Make is_idle_task() __always_inline to prevent the compiler from putting
it out of line into the wrong section because it's used inside noinstr
sections.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L5woTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYoDD/4hjGLSH3A1YwxnE/tUmLzIZhQKWveD
s25DHX2dIn+ZBSF96qWHVvvZoPNZo1zS3FZWoM6wEHT7qPkUm/61Zlme/1ZUlwP9
jVkBXiUHhOlemH9VM6udAOYG6whVguCkFixUhI1rLN8cd3Ko7v3jSax+6nV2g762
PrEnVM9Ps0YGNnoLBWl9G0zsfaLHfRJpWUHY6kqYTmmjN1NWJEUnmO9IxIytTG/A
3wOokwJJSfMMSB8WG+VAAHMtYmLirUrpoXFfhschsEiw+SBqNI8I/KieGG8P+i4W
P1am4sJe0vPapEvwyNKHcR8iQSUEHQ5ZmUiBvCWu+2HDYn6KLSioKaPVh45/+UPh
EeSXutPFXIvX9WwbNnQctwl/xX7BSpGb0tBiegiqlwk2i8QFEWvKr8HrjznLS+kW
7cCaQdM5a46eAoo/4JP4izm4JjqFD2LuNt32f9lizRkFim0zrMswmA01Fog5A3j6
FSQmNhqOPSLpC24/rWhW1sk2dnOefrBxcE67yFmsbEtf+BmoFGVYkulI36P/trmY
nTua0v2Okmjj0zIh23VUOzhBGDzD+KQJFEj/8nyFJShlBbv6yxRQiqtzWUmsuhfu
o+B/WEAllBnSOjohbp4cBmTM9VOkNAgBfRJs26ELuBucEgMdzPByVNygKzdUcW/j
6vH1bbgu+FXA1Q==
=oKWA
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A single fix for the scheduler:
- Make is_idle_task() __always_inline to prevent the compiler from
putting it out of line into the wrong section because it's used
inside noinstr sections"
* tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Use __always_inline on is_idle_task()
- Prevent recursion by using raw_cpu_* operations
- Fixup the interrupt state in the cpu idle code to be consistent
- Push rcu_idle_enter/exit() invocations deeper into the idle path so
that the lock operations are inside the RCU watching sections
- Move trace_cpu_idle() into generic code so it's called before RCU goes
idle.
- Handle raw_local_irq* vs. local_irq* operations correctly
- Move the tracepoints out from under the lockdep recursion handling
which turned out to be fragile and inconsistent.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L5qETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoV/NEADG+h02tj2I4gP7IQ3nVodEzS1+odPI
orabY5ggH0kn4YIhPB4UtOd5zKZjr3FJs9wEhyhQpV6ZhvFfgaIKiYqfg+Q81aMO
/BXrfh6jBD2Hu7gaPBnVdkKeh1ehl+w0PhTeJhPBHEEvbGeLUYWwyPNlaKz//VQl
XCWl7e7o/Uw2UyJ469SCx3z+M2DMNqwdMys/zcqvTLiBdLNCwp4TW5ACzEA0rfHh
Pepu3eIKnMURyt82QanrOATvT2io9pOOaUh59zeKi2WM8ikwKd/Eho2kXYng6GvM
GzX4Kn13MsNobZXf9BhqEGICdRkaJqLsXlmBNmbJdSTCn5W2lLZqu2wCEp5VZHCc
XwMbey8ek+BRskJMqAV4oq2GA8Om9KEYWOOdixyOG0UJCiW5qDowuDYBXTLV7FWj
XhzLGuHpUF9eKLKokJ7ideLaDcpzwYjHr58pFLQrqPwmjVKWguLeYMg5BhhTiEuV
wNfiLIGdMNsCpYKhnce3o9paV8+hy1ZveWhNy+/4HaDLoEwI2T62i8R7xxbrcWMg
sgdAiQG+kVLwSJ13bN+Cz79uLYTIbqGaZHtOXmeIT3jSxBjx5RlXfzocwTHSYrNk
GuLYHd7+QaemN49Rrf4bPR16Db7ifL32QkUtLBTBLcnos9jM+fcl+BWyqYRxhgDv
xzDS+vfK8DvRiA==
=Hgt6
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"A set of fixes for lockdep, tracing and RCU:
- Prevent recursion by using raw_cpu_* operations
- Fixup the interrupt state in the cpu idle code to be consistent
- Push rcu_idle_enter/exit() invocations deeper into the idle path so
that the lock operations are inside the RCU watching sections
- Move trace_cpu_idle() into generic code so it's called before RCU
goes idle.
- Handle raw_local_irq* vs. local_irq* operations correctly
- Move the tracepoints out from under the lockdep recursion handling
which turned out to be fragile and inconsistent"
* tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep,trace: Expose tracepoints
lockdep: Only trace IRQ edges
mips: Implement arch_irqs_disabled()
arm64: Implement arch_irqs_disabled()
nds32: Implement arch_irqs_disabled()
locking/lockdep: Cleanup
x86/entry: Remove unused THUNKs
cpuidle: Move trace_cpu_idle() into generic code
cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
sched,idle,rcu: Push rcu_idle deeper into the idle path
cpuidle: Fixup IRQ state
lockdep: Use raw_cpu_*() for per-cpu variables
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl9Lw74ACgkQiiy9cAdy
T1HOMwv/WwqctX4SN4kA97C4HQFJwan5kPf1bBYdp3zEm45umxkZRKI7i8NN+4Z7
a7m3n9Kwm5CP0pHICJ6PLhYNs5J9ZSEx89J2GOmyl1SIbjNUHKGftrf75BCMceGT
6dcEoMLAFw8Z9D39n1mkLa09IOI7XAlHt48VUis2qnLIZc1WDA5wzZ8dW+EqSFWX
itg/P8I/4QUWf+IzXw3Hj9WiiIJMVkaIkz7lccrS8VzQD2KYNyDPl9+xNr8Q54Uu
n0sTiHXQenPrH+tubrKrdQ1b9OwxL41kfCeE0PfC/BatSJ5rBk4x+zw5EWvfM6Mz
y/llDLqtShfycKNGOChfrA2Dv3VvH7P0TDYu/Nl0x09KbZLRiswJ1iGn0WAMI8gG
0POaEWKHLkBGesItE9vMi7RZEb1wB8z6pFgwr6xadHx1RIWztv80rIcUF+qywJvZ
paVIPCFWyuahbbzWltxCmCLGLLn3j+Qm57md8PtLdSu5vOJu8kF35F6xP+DxHF1n
E/WgF59O
=0PxR
-----END PGP SIGNATURE-----
Merge tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull cfis fix from Steve French:
"DFS fix for referral problem when using SMB1"
* tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix check of tcon dfs in smb1
Revert our removal of PROT_SAO, at least one user expressed an interest in using
it on Power9. Instead don't allow it to be used in guests unless enabled
explicitly at compile time.
A fix for a crash introduced by a recent change to FP handling.
Revert a change to our idle code that left Power10 with no idle support.
One minor fix for the new scv system call path to set PPR.
Fix a crash in our "generic" PMU if branch stack events were enabled.
A fix for the IMC PMU, to correctly identify host kernel samples.
The ADB_PMU powermac code was found to be incompatible with VMAP_STACK, so make
them incompatible in Kconfig until the code can be fixed.
A build fix in drivers/video/fbdev/controlfb.c, and a documentation fix.
Thanks to:
Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy, Giuseppe Sacco,
Madhavan Srinivasan, Milton Miller, Nicholas Piggin, Pratik Rajesh Sampat,
Randy Dunlap, Shawn Anastasio, Vaidyanathan Srinivasan.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl9LlF8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEwJD/4nEkp9id7bZyiGruoawqxdpmc9viIp
JFRH3+eHWbE5rfoXn7fwM1zTE9SsHxCd0q09cHk2rtAwKMXcJW83/pXNuWEjIzcy
7Ra8Zq2jRl6qgWAx84VKoZVg+W40yNFex0M0akMQV55SjYOTN8gpGe+algi+wPaH
44oYBYctDi3B9X8CsaUQEdov1EZdWT6TxcN9xIJiIdr53VXMER6C+ytYV8VgkGHW
Qt+Ardyvp6eNq9+foGegRSk3OmNcmj+CJZYzhkp5+1k9ko9GQ8wg9NzxTV4ZoSJ9
g5rgD4ztBfLGyUDu6oUypzOnSVbfzJh9JPH/h1zaSOjSv9MnJ20zqvqjD7QXFNbs
j960PiylTfVWdnOoUUkvON0UOYZM9XiZP63i8z/mBsMJ5BFaLB1TonZ+lDwXc1vK
MHXhjahP2qP0LnJZ/M5gT3zfLPyrKoeIlmLTOkLjrM5C9mcSxpPnagq+AHacfYpG
sGrg2LGLfBo/9PomUNHseQhBfsc2uYwM924si9MpNWN6BT+TNgTJYeNPDOnvRCbG
ivDQ7HFZ6aiOj+b5iTZI2RV3EOaBKZgo+VEryNDnqd7etjyDr5PNbooGaHJDgsnz
mNFxUNusxzv0vMI3zyFtLMTe/99/NlRSYyMXPL8SL7MvlRt624ngrrxYv+2+dBRt
aIpxSpgdqTVXSw==
=t+yB
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Revert our removal of PROT_SAO, at least one user expressed an
interest in using it on Power9. Instead don't allow it to be used in
guests unless enabled explicitly at compile time.
- A fix for a crash introduced by a recent change to FP handling.
- Revert a change to our idle code that left Power10 with no idle
support.
- One minor fix for the new scv system call path to set PPR.
- Fix a crash in our "generic" PMU if branch stack events were enabled.
- A fix for the IMC PMU, to correctly identify host kernel samples.
- The ADB_PMU powermac code was found to be incompatible with
VMAP_STACK, so make them incompatible in Kconfig until the code can
be fixed.
- A build fix in drivers/video/fbdev/controlfb.c, and a documentation
fix.
Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy,
Giuseppe Sacco, Madhavan Srinivasan, Milton Miller, Nicholas Piggin,
Pratik Rajesh Sampat, Randy Dunlap, Shawn Anastasio, Vaidyanathan
Srinivasan.
* tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU
Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"
powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc
powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode
powerpc/64s: scv entry should set PPR
Documentation/powerpc: fix malformed table in syscall64-abi
video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n
selftests/powerpc: Update PROT_SAO test to skip ISA 3.1
powerpc/64s: Disallow PROT_SAO in LPARs by default
Revert "powerpc/64s: Remove PROT_SAO support"
Let's try this again... Here are some USB fixes for 5.9-rc3.
This differs from the previous pull request for this release in that:
- the usb gadget patch now does not break some systems, and
actually does what it was intended to do. Many thanks to
Marek Szyprowski for quickly noticing and testing the patch
from Andy Shevchenko to resolve this issue.
- some more new USB quirks have been added to get some new
devices to work properly based on user reports.
Other than that, the original pull request patches are all here, and
they contain:
- usb gadget driver fixes
- xhci driver fixes
- typec fixes
- new quirks and ids
- fixes for USB patches that went into 5.9-rc1.
All of these have been tested in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX0t32g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylFMACeLEQgeN1rmfQfLyo2NHROQEeDhnIAniLMhchZ
p9dXWJ8aNeyI5OrNjD5b
=Vd05
-----END PGP SIGNATURE-----
Merge tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Let's try this again... Here are some USB fixes for 5.9-rc3.
This differs from the previous pull request for this release in that
the usb gadget patch now does not break some systems, and actually
does what it was intended to do. Many thanks to Marek Szyprowski for
quickly noticing and testing the patch from Andy Shevchenko to resolve
this issue.
Additionally, some more new USB quirks have been added to get some new
devices to work properly based on user reports.
Other than that, the patches are all here, and they contain:
- usb gadget driver fixes
- xhci driver fixes
- typec fixes
- new quirks and ids
- fixes for USB patches that went into 5.9-rc1.
All of these have been tested in linux-next with no reported issues"
* tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
usb: storage: Add unusual_uas entry for Sony PSZ drives
USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge
usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe()
USB: gadget: u_f: Unbreak offset calculation in VLAs
USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D
usb: typec: tcpm: Fix Fix source hard reset response for TDA 2.3.1.1 and TDA 2.3.1.2 failures
USB: PHY: JZ4770: Fix static checker warning.
USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb()
USB: gadget: u_f: add overflow checks to VLA macros
xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
xhci: Do warm-reset when both CAS and XDEV_RESUME are set
usb: host: xhci: fix ep context print mismatch in debugfs
usb: uas: Add quirk for PNY Pro Elite
tools: usb: move to tools buildsystem
USB: Fix device driver race
USB: Also match device drivers using the ->match vfunc
usb: host: xhci-tegra: fix tegra_xusb_get_phy()
usb: host: xhci-tegra: otg usb2/usb3 port init
usb: hcd: Fix use after free in usb_hcd_pci_remove()
usb: typec: ucsi: Hold con->lock for the entire duration of ucsi_register_port()
...
a subsequent load can probe the system properly; by Shiju Jose.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl9LTFoACgkQEsHwGGHe
VUpAxQ//YXxtNRrGxTQa6TGcHE0wRif/EkA20O54zWGPt+kolW9mXmsUFweDi0PH
U6EW1To9H7vukfynCxc+TFGNG8oh3B6XXhaMILsNi691m418Z52a+wOHGaf/qLpX
fqJe3WIIvo97cFtJJJvdHW90NaswtIo0TUspYqUiyThjwI3a9XXmY6Vh7odtVQaP
M/tYt90By60KAR57zROifo6binSOO45zCOJiEO6Qm8X9UphXhShv6oL4TQvlEEwf
ydWSz5NZwPnnVPk4MATMIQoptrixNaDXAfmh2kLBuNNmeNLCHKGous9C7ErI+Pqo
ZH5oECXGOvJELRL3CJ/6fgb9lKaUg9z0IILsRhmoSqj6AYNcDAreWyi4wjJTaRTs
rGNzYGHgB1tlWIpFxHBoxl3u/0EaRatkfvHYnY/i9LlmbIcNQgS+vBxV8LNyCRnk
uxzJDIc8adPx+4RwmzQrnjwN6A52d50JfhVXakQykTWxTaKFg2qkEDUgpEJ4bMCr
i77H1ytr+nGuR4BL9mferTfZqSWZULiQkeDV+skKAx8sSryOrKloucDXJYUzWFDU
6MfKi4iqfNH+w0LQxDE4cRGfOID1tEz7WWI1ROflI17sD2zW+/WkgYoHj7nD2Vwy
xm5DD91Jst4rdDd53PqJhiB3sz/V9VGlDL7o+BsYzYagFP9pyfc=
=WMYq
-----END PGP SIGNATURE-----
Merge tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"A fix to properly clear ghes_edac driver state on driver remove so
that a subsequent load can probe the system properly (Shiju Jose)"
* tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()
Most of the CPU mask operations behave the same way, but for_each_cpu() and
it's variants ignore the cpumask argument and claim that CPU0 is always in
the mask. This is historical, inconsistent and annoying behaviour.
The matrix allocator uses for_each_cpu() and can be called on UP with an
empty cpumask. The calling code does not expect that this succeeds but
until commit e027fffff7 ("x86/irq: Unbreak interrupt affinity setting")
this went unnoticed. That commit added a WARN_ON() to catch cases which
move an interrupt from one vector to another on the same CPU. The warning
triggers on UP.
Add a check for the cpumask being empty to prevent this.
Fixes: 2f75d9e1c9 ("genirq: Implement bitmap matrix allocator")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org