Commit graph

1058139 commits

Author SHA1 Message Date
Paul E. McKenney
147f04b14a rcu: Prevent expedited GP from enabling tick on offline CPU
If an RCU expedited grace period starts just when a CPU is in the process
of going offline, so that the outgoing CPU has completed its pass through
stop-machine but has not yet completed its final dive into the idle loop,
RCU will attempt to enable that CPU's scheduling-clock tick via a call
to tick_dep_set_cpu().  For this to happen, that CPU has to have been
online when the expedited grace period completed its CPU-selection phase.

This is pointless:  The outgoing CPU has interrupts disabled, so it cannot
take a scheduling-clock tick anyway.  In addition, the tick_dep_set_cpu()
function's eventual call to irq_work_queue_on() will splat as follows:

smpboot: CPU 1 is now offline
WARNING: CPU: 6 PID: 124 at kernel/irq_work.c:95
+irq_work_queue_on+0x57/0x60
Modules linked in:
CPU: 6 PID: 124 Comm: kworker/6:2 Not tainted 5.15.0-rc1+ #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Workqueue: rcu_gp wait_rcu_exp_gp
RIP: 0010:irq_work_queue_on+0x57/0x60
Code: 8b 05 1d c7 ea 62 a9 00 00 f0 00 75 21 4c 89 ce 44 89 c7 e8
+9b 37 fa ff ba 01 00 00 00 89 d0 c3 4c 89 cf e8 3b ff ff ff eb ee <0f> 0b eb b7
+0f 0b eb db 90 48 c7 c0 98 2a 02 00 65 48 03 05 91
 6f
RSP: 0000:ffffb12cc038fe48 EFLAGS: 00010282
RAX: 0000000000000001 RBX: 0000000000005208 RCX: 0000000000000020
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9ad01f45a680
RBP: 000000000004c990 R08: 0000000000000001 R09: ffff9ad01f45a680
R10: ffffb12cc0317db0 R11: 0000000000000001 R12: 00000000fffecee8
R13: 0000000000000001 R14: 0000000000026980 R15: ffffffff9e53ae00
FS:  0000000000000000(0000) GS:ffff9ad01f580000(0000)
+knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000de0c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tick_nohz_dep_set_cpu+0x59/0x70
 rcu_exp_wait_wake+0x54e/0x870
 ? sync_rcu_exp_select_cpus+0x1fc/0x390
 process_one_work+0x1ef/0x3c0
 ? process_one_work+0x3c0/0x3c0
 worker_thread+0x28/0x3c0
 ? process_one_work+0x3c0/0x3c0
 kthread+0x115/0x140
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30
---[ end trace c5bf75eb6aa80bc6 ]---

This commit therefore avoids invoking tick_dep_set_cpu() on offlined
CPUs to limit both futility and false-positive splats.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:22:22 -08:00
Paul E. McKenney
5401cc5264 rcu: Mark sync_sched_exp_online_cleanup() ->cpu_no_qs.b.exp load
The sync_sched_exp_online_cleanup() is called from rcutree_online_cpu(),
which can be invoked with interrupts enabled.  This means that
the ->cpu_no_qs.b.exp field is subject to data races from the
rcu_exp_handler() IPI handler, so this commit marks the load from
that field.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:22:21 -08:00
Frederic Weisbecker
6120b72e25 rcu: Remove rcu_data.exp_deferred_qs and convert to rcu_data.cpu no_qs.b.exp
Having two fields for the same purpose with subtle differences on
different RCU flavours is confusing, especially when both fields always
exist on both RCU flavours.

Fortunately, it is now safe for preemptible RCU to rely on the rcu_data
structure's ->cpu_no_qs.b.exp field, just like non-preemptible RCU.
This commit therefore removes the ad-hoc ->exp_deferred_qs field.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:22:21 -08:00
Frederic Weisbecker
6e16b0f7ba rcu: Move rcu_data.cpu_no_qs.b.exp reset to rcu_export_exp_rdp()
On non-preemptible RCU, move clearing of the rcu_data structure's
->cpu_no_qs.b.exp filed to the actual expedited quiescent state report
function, matching hw preemptible RCU handles the ->exp_deferred_qs field.

This prepares for removing ->exp_deferred_qs in favor of ->cpu_no_qs.b.exp
for both preemptible and non-preemptible RCU.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:22:21 -08:00
Frederic Weisbecker
a438265948 rcu: Ignore rdp.cpu_no_qs.b.exp on preemptible RCU's rcu_qs()
Preemptible RCU does not use the rcu_data structure's ->cpu_no_qs.b.exp,
instead using a separate ->exp_deferred_qs field to record the need for
an expedited quiescent state.

In fact ->cpu_no_qs.b.exp should never be set in preemptible RCU because
preemptible RCU's expedited grace periods use other mechanisms to record
quiescent states.

This commit therefore removes the implicit rcu_qs() reference to
->cpu_no_qs.b.exp in favor of a direct reference to ->cpu_no_qs.b.norm.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:20:59 -08:00
Paul E. McKenney
90b21bcfb2 torture: Properly redirect kvm-remote.sh "echo" commands
The echo commands following initialization of the "oldrun" variable need
to be "tee"d to $oldrun/remote-log.  This commit fixes several stragglers.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:30:29 -08:00
Paul E. McKenney
b6c9dbf04f torture: Fix incorrectly redirected "exit" in kvm-remote.sh
The "exit 4" in kvm-remote.sh is pointlessly redirected, so this commit
removes the redirection.

Fixes: 0092eae4cb ("torture: Add kvm-remote.sh script for distributed rcutorture test runs")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:30:29 -08:00
Paul E. McKenney
a959ed627a rcutorture: Test RCU Tasks lock-contention detection
This commit adjusts the TRACE02 scenario to use a pair of callback-flood
kthreads.  This in turn forces lock contention on the single RCU Tasks
Trace callback queue, which forces use of all CPUs' queues, thus testing
this transition.  (No, there is not yet any way to transition back.

Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:30:29 -08:00
Paul E. McKenney
4ead4e3319 rcutorture: Cause TREE02 and TREE10 scenarios to do more callback flooding
This commit enables two callback-flood kthreads for the TREE02 scenario
and 28 for the TREE10 scenario.

Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:30:29 -08:00
Paul E. McKenney
f61537009e torture: Retry download once before giving up
Currently, a transient network error can kill a run if it happens while
downloading the tarball to one of the target systems.  This commit
therefore does a 60-second wait and then a retry.  If further experience
indicates, a more elaborate mechanism might be used later.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:30:29 -08:00
Paul E. McKenney
c06354a121 torture: Make kvm-find-errors.sh report link-time undefined symbols
This commit makes kvm-find-errors.sh check for and report undefined
symbols that are detected at link time.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:30:28 -08:00
Paul E. McKenney
b6a4fd35d2 torture: Catch kvm.sh help text up with actual options
This commit brings the kvm.sh script's help text up to date with recently
(and some not-so-recently) added parameters.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:30:28 -08:00
Li Zhijian
9880eb878c refscale: Prevent buffer to pr_alert() being too long
0Day/LKP observed that the refscale results fail to complete when larger
values of nrun (such as 300) are specified.  The problem is that printk()
can accept at most a 1024-byte buffer.  This commit therefore prints
the buffer whenever its length exceeds 800 bytes.

CC: Philip Li <philip.li@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:29:50 -08:00
Li Zhijian
c30c876312 refscale: Simplify the errexit checkpoint
There is only the one OOM error case in main_func(), so this commit
eliminates the errexit local variable in favor of a branch to cleanup
code.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:29:50 -08:00
Paul E. McKenney
340170fef0 rcutorture: Suppress pi-lock-across read-unlock testing for Tiny SRCU
Because Tiny srcu_read_unlock() directly calls swake_up_one(), lockdep
complains when a pi lock is held across that srcu_read_unlock().
Although this is a lockdep false positive (there is no other CPU to
complete the deadlock cycle), lockdep is what it is at the moment.
This commit therefore prevents rcutorture from holding pi lock across
a Tiny srcu_read_unlock().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:29:50 -08:00
Paul E. McKenney
1c3d53986f rcutorture: More thoroughly test nested readers
Currently, nested readers occur only when a timer handler interrupts a
reader.  This is rare, and is thus insufficient testing of the transition
between nesting levels.  This commit therefore causes rcutorture nested
readers to be the rule rather than the exception.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:29:50 -08:00
Paul E. McKenney
902d82e629 rcutorture: Sanitize RCUTORTURE_RDR_MASK
RCUTORTURE_RDR_MASK is currently not the bit indicated by
RCUTORTURE_RDR_SHIFT, but is instead all the bits less significant than
that one.  This is an accident waiting to happen, so this commit makes
RCUTORTURE_RDR_MASK be that one bit and adjusts uses accordingly.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:29:49 -08:00
Paul E. McKenney
f5dbc594b5 rcu-tasks: Don't remove tasks with pending IPIs from holdout list
Currently, the check_all_holdout_tasks_trace() function removes all tasks
marked with ->trc_reader_checked from the holdout list, including those
with IPIs pending.  This means that the IPI handler might arrive at
a task that has already been removed from the list, which is at best
an accident waiting to happen.

This commit therefore avoids removing tasks with IPIs pending from
the holdout list.  This in turn means that the "if" condition in the
for_each_online_cpu() loop in rcu_tasks_trace_postgp() should always
evaluate to false, so a WARN_ON_ONCE() is added to check that.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:29:06 -08:00
Paul E. McKenney
1f8da406a9 srcu: Prevent redundant __srcu_read_unlock() wakeup
Tiny SRCU readers can appear at task level, but also in interrupt and
softirq handlers.  Because Tiny SRCU is selected only in kernels built
with CONFIG_SMP=n and CONFIG_PREEMPTION=n, it is not possible for a grace
period to start while there is a non-task-level SRCU reader executing.
This means that it does not make sense for __srcu_read_unlock() to awaken
the Tiny SRCU grace period, because that can only happen when the grace
period is waiting for one value of ->srcu_idx and __srcu_read_unlock()
is ending the last reader for some other value of ->srcu_idx.  After all,
any such wakeup will be redundant.

Worse yet, in some cases, such wakeups generate lockdep splats:

	======================================================
	WARNING: possible circular locking dependency detected
	5.15.0-rc1+ #3758 Not tainted
	------------------------------------------------------
	rcu_torture_rea/53 is trying to acquire lock:
	ffffffff9514e6a8 (srcu_ctl.srcu_wq.lock){..-.}-{2:2}, at:
	xa/0x30

	but task is already holding lock:
	ffff95c642479d80 (&p->pi_lock){-.-.}-{2:2}, at:
	_extend+0x370/0x400

	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #1 (&p->pi_lock){-.-.}-{2:2}:
	       _raw_spin_lock_irqsave+0x2f/0x50
	       try_to_wake_up+0x50/0x580
	       swake_up_locked.part.7+0xe/0x30
	       swake_up_one+0x22/0x30
	       rcutorture_one_extend+0x1b6/0x400
	       rcu_torture_one_read+0x290/0x5d0
	       rcu_torture_timer+0x1a/0x70
	       call_timer_fn+0xa6/0x230
	       run_timer_softirq+0x493/0x4c0
	       __do_softirq+0xc0/0x371
	       irq_exit+0x73/0x90
	       sysvec_apic_timer_interrupt+0x63/0x80
	       asm_sysvec_apic_timer_interrupt+0x12/0x20
	       default_idle+0xb/0x10
	       default_idle_call+0x5e/0x170
	       do_idle+0x18a/0x1f0
	       cpu_startup_entry+0xa/0x10
	       start_kernel+0x678/0x69f
	       secondary_startup_64_no_verify+0xc2/0xcb

	-> #0 (srcu_ctl.srcu_wq.lock){..-.}-{2:2}:
	       __lock_acquire+0x130c/0x2440
	       lock_acquire+0xc2/0x270
	       _raw_spin_lock_irqsave+0x2f/0x50
	       swake_up_one+0xa/0x30
	       rcutorture_one_extend+0x387/0x400
	       rcu_torture_one_read+0x290/0x5d0
	       rcu_torture_reader+0xac/0x200
	       kthread+0x12d/0x150
	       ret_from_fork+0x22/0x30

	other info that might help us debug this:

	 Possible unsafe locking scenario:

	       CPU0                    CPU1
	       ----                    ----
	  lock(&p->pi_lock);
				       lock(srcu_ctl.srcu_wq.lock);
				       lock(&p->pi_lock);
	  lock(srcu_ctl.srcu_wq.lock);

	 *** DEADLOCK ***

	1 lock held by rcu_torture_rea/53:
	 #0: ffff95c642479d80 (&p->pi_lock){-.-.}-{2:2}, at:
	_extend+0x370/0x400

	stack backtrace:
	CPU: 0 PID: 53 Comm: rcu_torture_rea Not tainted 5.15.0-rc1+

	Hardware name: Red Hat KVM/RHEL-AV, BIOS
	e_el8.5.0+746+bbd5d70c 04/01/2014
	Call Trace:
	 check_noncircular+0xfe/0x110
	 ? find_held_lock+0x2d/0x90
	 __lock_acquire+0x130c/0x2440
	 lock_acquire+0xc2/0x270
	 ? swake_up_one+0xa/0x30
	 ? find_held_lock+0x72/0x90
	 _raw_spin_lock_irqsave+0x2f/0x50
	 ? swake_up_one+0xa/0x30
	 swake_up_one+0xa/0x30
	 rcutorture_one_extend+0x387/0x400
	 rcu_torture_one_read+0x290/0x5d0
	 rcu_torture_reader+0xac/0x200
	 ? rcutorture_oom_notify+0xf0/0xf0
	 ? __kthread_parkme+0x61/0x90
	 ? rcu_torture_one_read+0x5d0/0x5d0
	 kthread+0x12d/0x150
	 ? set_kthread_struct+0x40/0x40
	 ret_from_fork+0x22/0x30

This is a false positive because there is only one CPU, and both locks
are raw (non-preemptible) spinlocks.  However, it is worthwhile getting
rid of the redundant wakeup, which has the side effect of breaking
the theoretical deadlock cycle.  This commit therefore eliminates the
redundant wakeups.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:28:16 -08:00
Mark Brown
b0fe9dec66 tools/nolibc: Implement gettid()
Allow test programs to determine their thread ID.

Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Ammar Faizi
7bdc0e7a39 tools/nolibc: x86-64: Use mov $60,%eax instead of mov $60,%rax
Note that mov to 32-bit register will zero extend to 64-bit register.
Thus `mov $60,%eax` has the same effect with `mov $60,%rax`. Use the
shorter opcode to achieve the same thing.
```
  b8 3c 00 00 00       	mov    $60,%eax (5 bytes) [1]
  48 c7 c0 3c 00 00 00 	mov    $60,%rax (7 bytes) [2]
```
Currently, we use [2]. Change it to [1] for shorter code.

Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Ammar Faizi
bf91666959 tools/nolibc: x86: Remove r8, r9 and r10 from the clobber list
Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").

  - rax for the return value.
  - rcx to save the return address.
  - r11 to save the rflags.

Other registers are preserved.

Having r8, r9 and r10 in the syscall clobber list is harmless, but this
results in a missed-optimization.

As the syscall doesn't clobber r8-r10, GCC should be allowed to reuse
their value after the syscall returns to userspace. But since they are
in the clobber list, GCC will always miss this opportunity.

Remove them from the x86-64 syscall clobber list to help GCC generate
better code and fix the comment.

See also the x86-64 ABI, section A.2 AMD64 Linux Kernel Conventions,
A.2.1 Calling Conventions [1].

Extra note:
Some people may think it does not really give a benefit to remove r8,
r9 and r10 from the syscall clobber list because the impression of
syscall is a C function call, and function call always clobbers those 3.

However, that is not the case for nolibc.h, because we have a potential
to inline the "syscall" instruction (which its opcode is "0f 05") to the
user functions.

All syscalls in the nolibc.h are written as a static function with inline
ASM and are likely always inline if we use optimization flag, so this is
a profit not to have r8, r9 and r10 in the clobber list.

Here is the example where this matters.

Consider the following C code:
```
  #include "tools/include/nolibc/nolibc.h"
  #define read_abc(a, b, c) __asm__ volatile("nop"::"r"(a),"r"(b),"r"(c))

  int main(void)
  {
  	int a = 0xaa;
  	int b = 0xbb;
  	int c = 0xcc;

  	read_abc(a, b, c);
  	write(1, "test\n", 5);
  	read_abc(a, b, c);

  	return 0;
  }
```

Compile with:
    gcc -Os test.c -o test -nostdlib

With r8, r9, r10 in the clobber list, GCC generates this:

0000000000001000 <main>:
    1000:	f3 0f 1e fa          	endbr64
    1004:	41 54                	push   %r12
    1006:	41 bc cc 00 00 00    	mov    $0xcc,%r12d
    100c:	55                   	push   %rbp
    100d:	bd bb 00 00 00       	mov    $0xbb,%ebp
    1012:	53                   	push   %rbx
    1013:	bb aa 00 00 00       	mov    $0xaa,%ebx
    1018:	90                   	nop
    1019:	b8 01 00 00 00       	mov    $0x1,%eax
    101e:	bf 01 00 00 00       	mov    $0x1,%edi
    1023:	ba 05 00 00 00       	mov    $0x5,%edx
    1028:	48 8d 35 d1 0f 00 00 	lea    0xfd1(%rip),%rsi
    102f:	0f 05                	syscall
    1031:	90                   	nop
    1032:	31 c0                	xor    %eax,%eax
    1034:	5b                   	pop    %rbx
    1035:	5d                   	pop    %rbp
    1036:	41 5c                	pop    %r12
    1038:	c3                   	ret

GCC thinks that syscall will clobber r8, r9, r10. So it spills 0xaa,
0xbb and 0xcc to callee saved registers (r12, rbp and rbx). This is
clearly extra memory access and extra stack size for preserving them.

But syscall does not actually clobber them, so this is a missed
optimization.

Now without r8, r9, r10 in the clobber list, GCC generates better code:

0000000000001000 <main>:
    1000:	f3 0f 1e fa          	endbr64
    1004:	41 b8 aa 00 00 00    	mov    $0xaa,%r8d
    100a:	41 b9 bb 00 00 00    	mov    $0xbb,%r9d
    1010:	41 ba cc 00 00 00    	mov    $0xcc,%r10d
    1016:	90                   	nop
    1017:	b8 01 00 00 00       	mov    $0x1,%eax
    101c:	bf 01 00 00 00       	mov    $0x1,%edi
    1021:	ba 05 00 00 00       	mov    $0x5,%edx
    1026:	48 8d 35 d3 0f 00 00 	lea    0xfd3(%rip),%rsi
    102d:	0f 05                	syscall
    102f:	90                   	nop
    1030:	31 c0                	xor    %eax,%eax
    1032:	c3                   	ret

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/wikis/x86-64-psABI [1]
Link: https://lore.kernel.org/lkml/20211011040344.437264-1-ammar.faizi@students.amikom.ac.id/
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Willy Tarreau
de0244ae40 tools/nolibc: fix incorrect truncation of exit code
Ammar Faizi reported that our exit code handling is wrong. We truncate
it to the lowest 8 bits but the syscall itself is expected to take a
regular 32-bit signed integer, not an unsigned char. It's the kernel
that later truncates it to the lowest 8 bits. The difference is visible
in strace, where the program below used to show exit(255) instead of
exit(-1):

  int main(void)
  {
        return -1;
  }

This patch applies the fix to all archs. x86_64, i386, arm64, armv7 and
mips were all tested and confirmed to work fine now. Risc-v was not
tested but the change is trivial and exactly the same as for other archs.

Reported-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Willy Tarreau
ebbe0d8a44 tools/nolibc: i386: fix initial stack alignment
After re-checking in the spec and comparing stack offsets with glibc,
The last pushed argument must be 16-byte aligned (i.e. aligned before the
call) so that in the callee esp+4 is multiple of 16, so the principle is
the 32-bit equivalent to what Ammar fixed for x86_64. It's possible that
32-bit code using SSE2 or MMX could have been affected. In addition the
frame pointer ought to be zero at the deepest level.

Link: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
Cc: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Ammar Faizi
937ed91c71 tools/nolibc: x86-64: Fix startup code bug
Before this patch, the `_start` function looks like this:
```
0000000000001170 <_start>:
    1170:	pop    %rdi
    1171:	mov    %rsp,%rsi
    1174:	lea    0x8(%rsi,%rdi,8),%rdx
    1179:	and    $0xfffffffffffffff0,%rsp
    117d:	sub    $0x8,%rsp
    1181:	call   1000 <main>
    1186:	movzbq %al,%rdi
    118a:	mov    $0x3c,%rax
    1191:	syscall
    1193:	hlt
    1194:	data16 cs nopw 0x0(%rax,%rax,1)
    119f:	nop
```
Note the "and" to %rsp with $-16, it makes the %rsp be 16-byte aligned,
but then there is a "sub" with $0x8 which makes the %rsp no longer
16-byte aligned, then it calls main. That's the bug!

What actually the x86-64 System V ABI mandates is that right before the
"call", the %rsp must be 16-byte aligned, not after the "call". So the
"sub" with $0x8 here breaks the alignment. Remove it.

An example where this rule matters is when the callee needs to align
its stack at 16-byte for aligned move instruction, like `movdqa` and
`movaps`. If the callee can't align its stack properly, it will result
in segmentation fault.

x86-64 System V ABI also mandates the deepest stack frame should be
zero. Just to be safe, let's zero the %rbp on startup as the content
of %rbp may be unspecified when the program starts. Now it looks like
this:
```
0000000000001170 <_start>:
    1170:	pop    %rdi
    1171:	mov    %rsp,%rsi
    1174:	lea    0x8(%rsi,%rdi,8),%rdx
    1179:	xor    %ebp,%ebp                # zero the %rbp
    117b:	and    $0xfffffffffffffff0,%rsp # align the %rsp
    117f:	call   1000 <main>
    1184:	movzbq %al,%rdi
    1188:	mov    $0x3c,%rax
    118f:	syscall
    1191:	hlt
    1192:	data16 cs nopw 0x0(%rax,%rax,1)
    119d:	nopl   (%rax)
```

Cc: Bedirhan KURT <windowz414@gnuweeb.org>
Cc: Louvian Lyndal <louvianlyndal@gmail.com>
Reported-by: Peter Cordes <peter@cordes.ca>
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
[wt: I did this on purpose due to a misunderstanding of the spec, other
     archs will thus have to be rechecked, particularly i386]
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Jun Miao
300c0c5e72 rcu: Avoid alloc_pages() when recording stack
The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT,
which in turn can then call alloc_pages(GFP_NOWAIT, ...).  In general, however,
it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and ab00db216c).
Fix it by instructing stackdepot to not expand stack storage via alloc_pages()
in case it runs out by using kasan_record_aux_stack_noalloc().

Jianwei Hu reported:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3
INFO: lockdep is turned off.
irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
  softirqs last  enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 6 PID: 15319 Comm: python3 Tainted: G        W  O 5.15-rc7-preempt-rt #1
  Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018
  Call Trace:
    show_stack+0x52/0x58
    dump_stack+0xa1/0xd6
    ___might_sleep.cold+0x11c/0x12d
    rt_spin_lock+0x3f/0xc0
    rmqueue+0x100/0x1460
    rmqueue+0x100/0x1460
    mark_usage+0x1a0/0x1a0
    ftrace_graph_ret_addr+0x2a/0xb0
    rmqueue_pcplist.constprop.0+0x6a0/0x6a0
     __kasan_check_read+0x11/0x20
     __zone_watermark_ok+0x114/0x270
     get_page_from_freelist+0x148/0x630
     is_module_text_address+0x32/0xa0
     __alloc_pages_nodemask+0x2f6/0x790
     __alloc_pages_slowpath.constprop.0+0x12d0/0x12d0
     create_prof_cpu_mask+0x30/0x30
     alloc_pages_current+0xb1/0x150
     stack_depot_save+0x39f/0x490
     kasan_save_stack+0x42/0x50
     kasan_save_stack+0x23/0x50
     kasan_record_aux_stack+0xa9/0xc0
     __call_rcu+0xff/0x9c0
     call_rcu+0xe/0x10
     put_object+0x53/0x70
     __delete_object+0x7b/0x90
     kmemleak_free+0x46/0x70
     slab_free_freelist_hook+0xb4/0x160
     kfree+0xe5/0x420
     kfree_const+0x17/0x30
     kobject_cleanup+0xaa/0x230
     kobject_put+0x76/0x90
     netdev_queue_update_kobjects+0x17d/0x1f0
     ... ...
     ksys_write+0xd9/0x180
     __x64_sys_write+0x42/0x50
     do_syscall_64+0x38/0x50
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642
Fixes: 84109ab585 ("rcu: Record kvfree_call_rcu() call stack for KASAN")
Fixes: 26e760c9a7 ("rcu: kasan: record and print call_rcu() call stack")
Reported-by: Jianwei Hu <jianwei.hu@windriver.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Jun Miao <jun.miao@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:25:20 -08:00
Zqiang
c2cf0767e9 rcu: Avoid running boost kthreads on isolated CPUs
When the boost kthreads are created on systems with nohz_full CPUs,
the cpus_allowed_ptr is set to housekeeping_cpumask(HK_FLAG_KTHREAD).
However, when the rcu_boost_kthread_setaffinity() is called, the original
affinity will be changed and these kthreads can subsequently run on
nohz_full CPUs.  This commit makes rcu_boost_kthread_setaffinity()
restrict these boost kthreads to housekeeping CPUs.

Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:25:20 -08:00
Zhouyi Zhou
17ea371882 rcu: Improve tree_plugin.h comments and add code cleanups
This commit cleans up some comments and code in kernel/rcu/tree_plugin.h.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:25:20 -08:00
Changbin Du
2407a64f80 rcu: in_irq() cleanup
This commit replaces the obsolete and ambiguous macro in_irq() with its
shiny new in_hardirq() equivalent.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:25:20 -08:00
Chun-Hung Tseng
24ba53017e rcu: Replace ________p1 and _________p1 with __UNIQUE_ID(rcu)
This commit replaces both ________p1 and _________p1 with __UNIQUE_ID(rcu),
and also adjusts the callers of the affected macros.

__UNIQUE_ID(rcu) will generate unique variable names during compilation,
which eliminates the need of ________p1 and _________p1 (both having 4
occurrences prior to the code change).  This also avoids the variable
name shadowing issue, or at least makes those wishing to cause shadowing
problems work much harder to do so.

The same idea is used for the min/max macros (commit 589a978 and commit
e9092d0).

Signed-off-by: Jim Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:25:20 -08:00
Paul E. McKenney
bc849e9192 rcu: Move rcu_needs_cpu() to tree.c
Now that RCU_FAST_NO_HZ is no more, there is but one implementation of
the rcu_needs_cpu() function.  This commit therefore moves this function
from kernel/rcu/tree_plugin.c to kernel/rcu/tree.c.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:24:47 -08:00
Paul E. McKenney
e2c73a6860 rcu: Remove the RCU_FAST_NO_HZ Kconfig option
All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve
systems with RCU callbacks offloaded.  In this situation, all that this
Kconfig option does is slow down idle entry/exit with an additional
allways-taken early exit.  If this is the only use case, then this
Kconfig option nothing but an attractive nuisance that needs to go away.

This commit therefore removes the RCU_FAST_NO_HZ Kconfig option.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:24:47 -08:00
Paul E. McKenney
24eab6e1ff torture: Remove RCU_FAST_NO_HZ from rcu scenarios
All of the rcu scenarios that mentioning CONFIG_RCU_FAST_NO_HZ disable it.
But this Kconfig option is disabled by default, so this commit removes
the pointless "CONFIG_RCU_FAST_NO_HZ=n" lines from these scenarios.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:24:47 -08:00
Paul E. McKenney
f04cbe651b torture: Remove RCU_FAST_NO_HZ from rcuscale and refscale scenarios
All of the rcuscale and refscale scenarios that mention the Kconfig option
CONFIG_RCU_FAST_NO_HZ disable it.  But this Kconfig option is disabled by
default, so this commit removes the pointless "CONFIG_RCU_FAST_NO_HZ=n"
lines from these scenarios.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:24:46 -08:00
Akira Yokosawa
5861dad198 doc: RCU: Avoid 'Symbol' font-family in SVG figures
On Ubuntu Focal, strings in some of SVG files under
Documentation/RCU/Design can not be rendered properly when
converted to PDF.

Ubuntu releases since Focal and Debian bullseye have trouble
with "Symbol" font-family in SVG files.

As those strings are mostly API names such as "READ_ONCE()",
"WRITE_ONCE(), "rcu_read_lock()", and so on, using a generic
monospace font-family should be a good alternative.

Substitute the font-family name by a simple sed pattern:

    's/Symbol/monospace/g'

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:23:14 -08:00
NeilBrown
7c0be9f890 doc: Add refcount analogy to What is RCU
The reader-writer-lock analogy is a useful way to think about RCU, but
it is not always applicable.  It is useful to have other analogies to
work with, and particularly to emphasise that no single analogy is
perfect.

This patch add a "RCU as reference count" to the "what is RCU" document.

See https://lwn.net/Articles/872559/

[ paulmck: Apply Akira Yokosawa feedback. ]
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:23:06 -08:00
Paul E. McKenney
db4cb76861 doc: Remove obsolete kernel-per-CPU-kthreads RCU_FAST_NO_HZ advice
This document advises building with both CONFIG_NO_HZ=y and
CONFIG_RCU_FAST_NO_HZ=y.  However, CONFIG_NO_HZ=y offloads callbacks from
all nohz_full CPUs, and CPUs with offloaded callbacks do not benefit from
CONFIG_RCU_FAST_NO_HZ=y.  Quite the opposite: CONFIG_RCU_FAST_NO_HZ=y
simply adds a bit of idle entry/exit overhead.

This commit therefore changes that advice to only CONFIG_NO_HZ=y.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:23:06 -08:00
Paul E. McKenney
8c0abfd6d2 rcutorture: Add CONFIG_PREEMPT_DYNAMIC=n to tiny scenarios
With CONFIG_PREEMPT_DYNAMIC=y, the kernel builds with CONFIG_PREEMPTION=y
because preemption can be enabled at runtime.  This prevents any tests
of Tiny RCU or Tiny SRCU from running correctly.  This commit therefore
explicitly sets CONFIG_PREEMPT_DYNAMIC=n for those scenarios.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:20:58 -08:00
Linus Torvalds
fa55b7dcdc Linux 5.16-rc1 2021-11-14 13:56:52 -08:00
Gustavo A. R. Silva
dee2b702bc kconfig: Add support for -Wimplicit-fallthrough
Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang.

The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH,
which is enabled by default.

Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This
bugfix only appears in Clang 14.0.0, so older versions still contain
the bug and -Wimplicit-fallthrough won't be enabled for them, for now.

This concludes a long journey and now we are finally getting rid
of the unintentional fallthrough bug-class in the kernel, entirely. :)

Link: 9ed4a94d64 [1]
Link: https://bugs.llvm.org/show_bug.cgi?id=51094 [2]
Link: https://github.com/KSPP/linux/issues/115
Link: https://github.com/ClangBuiltLinux/linux/issues/236
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-14 13:27:30 -08:00
Linus Torvalds
ce49bfc8d0 Minor tweaks for 5.16:
* Clean up open-coded swap() calls.
  * A little bit of #ifdef golf to complete the reunification of the
    kernel and userspace libxfs source code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGNT1IACgkQ+H93GTRK
 tOucKA//Qk2NX3QBm/8pCrFE5V+eqooPANhZmzeviJCN/6++jcNOy0f+YK6JXVRC
 U2WdotHFH5fF6lsDkzNtMPHZ8JMZmOfEiPx5CGFiWT5iUW7FbLkROHm7GFtbwMoH
 qm3Lt7PbdSzJqTuOTvaGCw1xWkjDXMLdsdFM7mx3JO5zT9a/fCqjjmyR2Kl0qcSP
 RzfruVe20wUka2BeaXfZzSasgfLswratkU4xsiNiwA37yQaldzhrg8fg6uP3OSYi
 dkWFXi6WdWwQzARnjWNPwigUwA3xVaYgV+I6+ME0DYsUBvywZzUg3pkowhRAHyA9
 kv86L5Zt5K7kQcVqyd+lIvIuAcGrOZ9hA18PIXnwahLBqmjcqAJoF9XhTTZDMD4J
 LfujGMrf7DSDcf0vH8G9wlQQthsPGUOoFia5rr8MhdVVNee/b1Qvwsh7kmyg0DOK
 9WuNQxGPd7s+X+kwdmGrK7E6fqyPwEfC43l8wtCiBIyGz6QcorwD7kH9DGzv5xGF
 NX7WQeKvcaoXn1XVfonb0YgdVOnbyqK4AiY3Po1Ood3IxGyiLGCgDnusvYu+C9/r
 T0rRMbljkX1lUKqfzGkg2egOKPR+8RFgFKrKNSXUkDxl8TFLRd3ZObowPPlohq1I
 9lIIirip5UFYRv+7srDU1oZPWkvwkpJmaMFgagD3w+OWdo6zwao=
 =wsFu
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs cleanups from Darrick Wong:
 "The most 'exciting' aspect of this branch is that the xfsprogs
  maintainer and I have worked through the last of the code
  discrepancies between kernel and userspace libxfs such that there are
  no code differences between the two except for #includes.

  IOWs, diff suffices to demonstrate that the userspace tools behave the
  same as the kernel, and kernel-only bits are clearly marked in the
  /kernel/ source code instead of just the userspace source.

  Summary:

   - Clean up open-coded swap() calls.

   - A little bit of #ifdef golf to complete the reunification of the
     kernel and userspace libxfs source code"

* tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: sync xfs_btree_split macros with userspace libxfs
  xfs: #ifdef out perag code for userspace
  xfs: use swap() to make dabtree code cleaner
2021-11-14 12:18:22 -08:00
Linus Torvalds
c3b68c27f5 parisc architecture build-, trace-, backtrace- and page table fixes
Fix a build error in stracktrace.c, fix resolving of addresses to
 function names in backtraces, fix single-stepping in assembly code
 and flush userspace pte's when using set_pte_at().
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYZAyIAAKCRD3ErUQojoP
 X8zUAQCV4fijXlUEOnZorH42QsSvs1SowHXu2YdHU8CmauWVawEAt14Zj3fmMBcS
 +uSacieZROU9maQtGgJAYHZVJwgPyQY=
 =OQRw
 -----END PGP SIGNATURE-----

Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull more parisc fixes from Helge Deller:
 "Fix a build error in stracktrace.c, fix resolving of addresses to
  function names in backtraces, fix single-stepping in assembly code and
  flush userspace pte's when using set_pte_at()"

* tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc/entry: fix trace test in syscall exit path
  parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
  parisc: Fix implicit declaration of function '__kernel_text_address'
  parisc: Fix backtrace to always include init funtion names
2021-11-14 11:53:59 -08:00
Linus Torvalds
24318ae80d arch/sh updates for 5.16
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJhkUvoAAoJELcQ+SIFb8HaQG8H/1pF/zhT+xYAK7dMOYfdyl9R
 UGgnNGR2kjPDhsAkMHIYWVopP2KFV4kA4TCe8UcEt4GPeWZgGjV8lIaHJa/kdUsH
 5tyRYtC8ZtVhJMVJQ8jZYZcTwOqYFFGtCksPq6/j0dHE0Gf+nQStSXNZqi8/XxE9
 c3jDg+l7PVwdl7LWFedPVXcflhD5JLrRhyA/17V0INWrqP/MboXGRyq/Yluksw7x
 C89ZXrgRE+5tN9icfRT4pl/XBpezWQkjuvsyHM9DWOjqICXfng8yeB5MZ/+R2uyF
 qbXRSGqICOFhRRV9m/L0kypCYYiaZ8bBiYvLeGNggmpUJVMosIDFpS7kiYQavuA=
 =dnw0
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-5.16' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker.

* tag 'sh-for-5.16' of git://git.libc.org/linux-sh:
  sh: pgtable-3level: Fix cast to pointer from integer of different size
  sh: fix READ/WRITE redefinition warnings
  sh: define __BIG_ENDIAN for math-emu
  sh: math-emu: drop unused functions
  sh: fix kconfig unmet dependency warning for FRAME_POINTER
  sh: Cleanup about SPARSE_IRQ
  sh: kdump: add some attribute to function
  maple: fix wrong return value of maple_bus_init().
  sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/
  sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y
  sh: boards: Fix the cacography in irq.c
  sh: check return code of request_irq
  sh: fix trivial misannotations
2021-11-14 11:37:49 -08:00
Linus Torvalds
6ea45c57dc ARM fixes for 5.16-rc1:
- Fix early_iounmap
 - Drop cc-option fallbacks for architecture selection
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmGRL48ACgkQ9OeQG+St
 rGQyXA//ZuMjsA615TncG5686glPds6y+SoiWAPosz9ElGawL5Ke9vybIBPxYrZj
 gSaR4aiyoCy4B22tSQcAuyd8Ai8+cX5jUv7fbgZcCWWQto0OiD3S1jpaGwaPN+3I
 1fwbUvUjrdzzDfGHJbFwBYykkQ8bPWrnjtZ9EK8g9t7QuetqkZrihw/gfqHeflnu
 Ad9WqEiHWDguOU9kdT6uGZf5RpfjwoLuxYNMSQVktQw81+Lk0GGfEroR2ZqUZAf4
 YIrhaKTxjXfkm4Hv6bfLtSYxWY0sw1ziO0qhtEJHKQyE5hbMOy5Dhdq9Ntefkin1
 /+OaNZzLiKuv6ZqWJjg5ViXc0pVgDNfLKbj8aicGW3966F13OISq9MIy9pkivK+2
 R7ROhcdkJ4Tz6sIbAZBxmyFbgY+sjZ0F2OoOkjcLUCfyMGqfl8FohNd6EvSs0yP8
 6zfnUOg4vBkLg80d/lg6RjSszqOSdFjW4Qbi8HQoDYk9bemLPtVxQnjYsd2RMZhM
 oKTkNTcA17MUCYncw6xu2lL3ahz15XMxBf/L2mfFwFCIplKvpTIYbP5BWSisjiyz
 JhbjLj9PaCOmubu+n2JKK4dm0GeEngWhu5CW0xN0eQ0at8CJa9dmt5y3V1V8//jj
 F3Xltms1d67nHr+6pkE+EYp4YMTMBO1L7opsktTs1vCIJ2trTWM=
 =Ksk/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

 - Fix early_iounmap

 - Drop cc-option fallbacks for architecture selection

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9156/1: drop cc-option fallbacks for architecture selection
  ARM: 9155/1: fix early early_iounmap()
2021-11-14 11:30:50 -08:00
Linus Torvalds
0d1503d8d8 Devicetree fixes for v5.16, take 1:
- 2 fixes due to DT node name changes on Arm, Ltd. boards
 
 - Treewide rename of Ingenic CGU headers
 
 - Update ST email addresses
 
 - Remove Netlogic DT bindings
 
 - Dropping few more cases of redundant 'maxItems' in schemas
 
 - Convert toshiba,tc358767 bridge binding to schema
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmGRN/wQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw0EXD/0byDq/gx0BOgSY18wOWp0W3tnJiudRrKNk
 8+TqfphpridGSzZryboBPQ3U06eOT1pMV9EBzHfqeb87nIneMxZ26KhgMPRFf7qt
 87G2tyfq4Itd59HATC/8xsq/5uiCONksbPojhjn6SrI4wLBzTegIYG5ZiocQw2tP
 vcW9wDAOfcRVfXBtQwmYD7nkeShaoTrv+EBcAFhg4XB43TWyezCCBqvAz0qDSl7q
 N/9xomgsB/fG1ImL/UWjPuPix64I9mrop7d8+C17980RJM10e5wL/eDWpbvWxqy6
 R7nTHT9HWnbcV5ywgBTh3W6iQw9EGyytD0Uzw3v+Meoga1/zCBkF48mEin61nY1c
 i44os73B9mwSZo34qysrFkir+NBNRBAwdP7z+GrarB9twlcIdfjfInQYWlNP73Yb
 zJ/XIVrfn6GsEZRdgXXK5VHL0ffDYzwoGEHSU6m0cTJI+iNQT4WQ89HkhMHG1q0d
 pRwKCE75mhMrwvniBYinL8I8LuMAJPZZKDpc6ZtBJdUdF6MFgXccJFdIBoho1TVL
 oHkaXWwUjkGXliTdWX7UJk4zS4zNAyB1jLT9uHMrLvX7uVM1Dsy9bYWaU4ABB0xr
 viT/gj/nawCbGbniytZEfAAe1bsnZOLqxmqe3XGUgYuOPQLB2xSSjkrr7HYmuLmx
 1AskvHxO0w==
 =ukBC
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Two fixes due to DT node name changes on Arm, Ltd. boards

 - Treewide rename of Ingenic CGU headers

 - Update ST email addresses

 - Remove Netlogic DT bindings

 - Dropping few more cases of redundant 'maxItems' in schemas

 - Convert toshiba,tc358767 bridge binding to schema

* tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: watchdog: sunxi: fix error in schema
  bindings: media: venus: Drop redundant maxItems for power-domain-names
  dt-bindings: Remove Netlogic bindings
  clk: versatile: clk-icst: Ensure clock names are unique
  of: Support using 'mask' in making device bus id
  dt-bindings: treewide: Update @st.com email address to @foss.st.com
  dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml
  dt-bindings: media: Update maintainers for st,stm32-cec.yaml
  dt-bindings: mfd: timers: Update maintainers for st,stm32-timers
  dt-bindings: timer: Update maintainers for st,stm32-timer
  dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
  dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml
  dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
2021-11-14 11:11:51 -08:00
Linus Torvalds
622c72b651 A single fix for POSIX CPU timers to address a problem where POSIX CPU
timer delivery stops working for a new child task because copy_process()
 copies state information which is only valid for the parent task.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDVUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYocOFD/42NOdli73N+Jdq7APHUIHXzu+6DVT6
 CI5toLQw+0KPoF0s1wg4+J0YCDt2k0Pu4lOabF3Ze/+c6RlR5zfCXESqsXdHaCjh
 E91Vs57u0ataRMEHo6KB6eBIutuF8hyxfY6vVXfkTRNAreUIWiwWYrlB0G64JVOG
 +/l1W7adovjLcLwcW+ArrnLJwkBKtXunK6PVv2IrdRHwpMHbwoNRCCCFvzkqnWmQ
 4Yy2/NaB/PEBK5kezP1/j9EMcGCTWk1JJIm+l/PEwCCcbIgIdUahpW3XHAaqms6R
 oukqCvE5ukfmVzBFYBhCamhF8heyEeBVRqGU+Yyk48+I+DQFBCqaqa1NKSuEUdNL
 Nycy6Rp1yn7CHVSB461shMS6NJGOSNDBjv7vxer3WjV3HPJu7y0RrN7jXbkSfQnm
 hVKjkmbDEYwylgzFE5+T857NqD5MEXeuIBtTO08hNRnpd61aB3x+qq+8ElE6ST8Y
 pm6rMzw0AZ5buPK8QdGVDk0dD4WKObj1LzmRZvBtYeWynO6sxyKUl6B2CgAxrvn5
 D1Li2/arkJMCVeIuIL5uE6DPoxSh8J7OuEC4KeWX8M8xQSEDImqfZ+tDL2Esv6jv
 xDmymq584hiCBc1CJjCOA9kZYe6KNXC7lkVOns6GaKKzLhkrcvUR3dUGhMyzxAMO
 t9QIAinR6JwRRA==
 =EBbc
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for POSIX CPU timers to address a problem where POSIX CPU
  timer delivery stops working for a new child task because
  copy_process() copies state information which is only valid for the
  parent task"

* tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
2021-11-14 10:43:38 -08:00
Linus Torvalds
c36e33e2f4 A set of fixes for the interrupt subsystem:
- Core code:
 
     A regression fix for the Open Firmware interrupt mapping code where a
     interrupt controller property in a node caused a map property in the
     same node to be ignored.
 
   - Interrupt chip drivers:
 
     - Workaround a limitation in SiFive PLIC interrupt chip which silently
       ignores an EOI when the interrupt line is masked.
 
     - Provide the missing mask/unmask implementation for the CSKY MP
       interrupt controller.
 
   - PCI/MSI:
 
     - Prevent a use after free when PCI/MSI interrupts are released by
       destroying the sysfs entries before freeing the memory which is
       accessed in the sysfs show() function.
 
     - Implement a mask quirk for the Nvidia ION AHCI chip which does not
       advertise masking capability despite implementing it. Even worse the
       chip comes out of reset with all MSI entries masked, which due to the
       missing masking capability never get unmasked.
 
     - Move the check which prevents accessing the MSI[X] masking for XEN
       back into the low level accessors. The recent consolidation missed
       that these accessors can be invoked from places which do not have
       that check which broke XEN. Move them back to he original place
       instead of sprinkling tons of these checks all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDCsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTL5D/4n7CUudohHPckr0Rl3LbnSUfyY9g3H
 irTKur71AT392YerJtQp+WBp3AKYMDD8wPTgydfpWe95ouIjx5jhb/co7uSifG6k
 ZssXYS10bkvjqyS8E2s5FnA5xbnagunK/R981qju14Ec39xqx1JzlUnO/Pra0Kcr
 5rBV7br9jJMBleBI4OFuS9fS8dVL1MH/yushkuDNfIKEnaElnaxaYUk/ZdzkMMAW
 lt1B+dPhK24t1hXQvZKp/iVQUGrJWdzzy9aDiUYPv1IZP+V5nbLMgmFvEv8jNdNa
 6kkfp0l30nXM9rgvcp2KkasVUPVhurVEwitzz9+tT6LRA+/kSwi2yx8/FwCVUcL6
 xD0AgKQgxOj/WwGJTZswvPu3afsLuw3rGmx5uH1IV40P9mPX0AiHWgvoaInHjzlJ
 QKFQ7mJEuUcC6cJ36RGqX9njhKvPIcUENGCTjGSffcXsWltPrOCg2mQFcsDa9fSH
 qPfXDVv4YINI+0MAlOULh6TLWQ07xy37HiskJu/AgILOfipoDi8pXdqNJRfvxB1S
 D3O8vB+SH3lPj69w4dtj7539SdNZn8CCyN3RbNlstl2vHV5Bus3cVk0CcOhG8qNW
 KwK/tSH8O0ZYHAsUu8OqBipXy6qOPi/10MJQn3NOpvvOmS4oDd+82bq+jp5qJpsG
 42WNuzEoBdaUiA==
 =LBQL
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for the interrupt subsystem

  Core code:

   - A regression fix for the Open Firmware interrupt mapping code where
     a interrupt controller property in a node caused a map property in
     the same node to be ignored.

  Interrupt chip drivers:

   - Workaround a limitation in SiFive PLIC interrupt chip which
     silently ignores an EOI when the interrupt line is masked.

   - Provide the missing mask/unmask implementation for the CSKY MP
     interrupt controller.

  PCI/MSI:

   - Prevent a use after free when PCI/MSI interrupts are released by
     destroying the sysfs entries before freeing the memory which is
     accessed in the sysfs show() function.

   - Implement a mask quirk for the Nvidia ION AHCI chip which does not
     advertise masking capability despite implementing it. Even worse
     the chip comes out of reset with all MSI entries masked, which due
     to the missing masking capability never get unmasked.

   - Move the check which prevents accessing the MSI[X] masking for XEN
     back into the low level accessors. The recent consolidation missed
     that these accessors can be invoked from places which do not have
     that check which broke XEN. Move them back to he original place
     instead of sprinkling tons of these checks all over the code"

* tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  of/irq: Don't ignore interrupt-controller when interrupt-map failed
  irqchip/sifive-plic: Fixup EOI failed when masked
  irqchip/csky-mpintc: Fixup mask/unmask implementation
  PCI/MSI: Destroy sysfs before freeing entries
  PCI: Add MSI masking quirk for Nvidia ION AHCI
  PCI/MSI: Deal with devices lying about their MSI mask capability
  PCI/MSI: Move non-mask check back into low level accessors
2021-11-14 10:38:27 -08:00
Linus Torvalds
218cc8b860 A single fix for static calls to make the trampoline patching more robust
by placing explicit signature bytes after the call trampoline to prevent
 patching random other jumps like the CFI jump table entries.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDKsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZeNEADEFTbUJKd8812O9vkY9we1GDAtH7bY
 z6sYkh0/rPvYjdPfHuwqW8tUAl+CO2ne2X8FRPKgEdRLg44BY4HaMHmujdbGh3fh
 zpqynUBPoOIgtWxAPGdF+JxjrKlzjFd+WwjG3qBXOF3pjKgCc5knyjTucsl6ced3
 wF293rSYrIJ6uRv2TTNbM5hWJdC0arWbdMFnwQTxeZR54WLpu7Wfm+CCK41w0fAU
 nrfSsv73WEwpmAZNh04wsZsf7h6yCO7dCrIJD/3mpJtrUVBZXuZAKDzUzJPvHJal
 T8LcKwxZQAgPv0ubmOCrolj98Qp6PAPSdDJbzNsCJUYEbBqaB2inJ0PeHcZPspy9
 YyW00EHXD2UKm/GNF/DIlhoiNxOSh8Wn4b6H5ZRML50bS7jsMp8YVbticWEjItL6
 N4/61c45/uPILBS+Lysj0aqyj4TvagiuffJFWjw3YAQ+Gp/pzlJwRNjrw7/4DxAx
 KdpM881IKCR8UowBz3gIiA9FrJv2dGMqq31Rs1fjuauxkIX0gV3c64tAIRWrVscT
 k6GKGvHSis5cT97K3yhmNH0BUND+Skeku8G/SnTkefvcB85aU/7HBkLLJpw0w84F
 F6PTCaCJOEHrl3ADkilsi3z0sKWrph6aAzDEgp6Q6cmo9ulFAGw0bjuJb59xsvVK
 flIvTLUY3n76FA==
 =dgiF
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 static call update from Thomas Gleixner:
 "A single fix for static calls to make the trampoline patching more
  robust by placing explicit signature bytes after the call trampoline
  to prevent patching random other jumps like the CFI jump table
  entries"

* tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call,x86: Robustify trampoline patching
2021-11-14 10:30:17 -08:00
Linus Torvalds
fc661f2dcb - Avoid touching ~100 config files in order to be able to select
the preemption model
 
 - clear cluster CPU masks too, on the CPU unplug path
 
 - prevent use-after-free in cfs
 
 - Prevent a race condition when updating CPU cache domains
 
 - Factor out common shared part of smp_prepare_cpus() into a common
 helper which can be called by both baremetal and Xen, in order to fix a
 booting of Xen PV guests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ8HcACgkQEsHwGGHe
 VUouoA//WAZ/dZu7IiM06JhZWswa2yNsdU8qQHys81lEqstaBqiWuZdg1qJTVIir
 2d0aN0keiPcsLyAsp1UJ2g/K/7D5vSJWDzsHKfEAToiAm8Tntai2LlSocWWfeSQm
 10grDHWpEHbj0hTHTA6HYOr2WbY4/LnR4cdL0WobIzivIrRTx49d0XUOUfWLP5KX
 60uM6dSjwpJrQUnvzk+bhGiHVmutFrEJy+UU/0o+nxkdhwraNiSbLi0007BGRCof
 6dokRRvLLR09dl1LMG51gVjQch4j/lCx6EWWUhYOFeV3I3gibSCNkmu7dpmMCBTR
 QWO01cR9gyFN4xQ2is4I36M5L0/8T+sbGvvXIXNDT/XWr0/p+g6p2mx0cd2XiYIr
 ZthGRcxxV/KGmxfPaygKS9tpQseMEIrdd6VjAnGfZ3OS6CtUvYt8d0B2Soj8FALQ
 N9fMXDIEP3uUZim8UvCT6HBKlj9LR5uI5n+dAQ6uzsenO9WqeGeldc/N26/+osdN
 vo4lNYTqiXJPhJvunYW5t4j5JnUa3grDHioAPWaQRJlWtEZBGKs9SXTcweg/KURb
 mNfe1RfSlGJt28RD3E18gXeSS7xWdKgpcVX1rmW/9tUjX04NNDWjq4sAzOj7c+Ir
 4sr78XgCY0pUxFaFYxvQWFUy7wcm0zAczo1RGUhcDTf1edDEvjo=
 =s2MX
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Avoid touching ~100 config files in order to be able to select the
   preemption model

 - clear cluster CPU masks too, on the CPU unplug path

 - prevent use-after-free in cfs

 - Prevent a race condition when updating CPU cache domains

 - Factor out common shared part of smp_prepare_cpus() into a common
   helper which can be called by both baremetal and Xen, in order to fix
   a booting of Xen PV guests

* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  preempt: Restore preemption model selection configs
  arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
  sched/fair: Prevent dead task groups from regaining cfs_rq's
  sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  x86/smp: Factor out parts of native_smp_prepare_cpus()
2021-11-14 09:39:03 -08:00
Linus Torvalds
f7018be292 - Prevent unintentional page sharing by checking whether a page
reference to a PMU samples page has been acquired properly before that
 
 - Make sure the LBR_SELECT MSR is saved/restored too
 
 - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
 residual data left
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ5z8ACgkQEsHwGGHe
 VUqqdQ/+JIV6t0yIj7aNADaakwAe+i9zFzzUuvb5KT0zPzZirswkz6xeZ4g8S8PZ
 lSjqKk8M2Yt3SJiqi/s3KNIOev52wtKGmeOFz1I+DUNpgk0wGHkRtVHV/iSptB61
 Kp/fJvOVppY5grs5B0fRYkM5e477RPyZo+E0COKnff1bQ+k+z2ItMLCVxFCxQS6k
 HmgPW7CBye811YcEg28lSwgS1OXiMZ19gACIsqnQ6kQP2Puo8+HT1/V1n+0grejb
 OeYxURuYSRPd6Ft76qz0YlRIe1dgKllUBr7b0AaM11ADBMtWBTxqJcQvq/mOIHmT
 9to0dVB/xFySR57iaL7BRuZFOrt8MRqJniEedMO99Dm9sxEVfHs1iXC9r7wZxQAf
 /HcvVkcyOJD92Kv+4LS5tKjowCByOYEJW2YQIgXEbA6oIhRuM9/fdxEW6lHwgdwc
 BPnOR6rtYuq+I+merBIIijAuf8OsIGY7ap2B+f7DkiOtA9+SHZsrU22J8T7CED/w
 gmrAC3+3KGt7YDs6WZTbvkXminZQyu5WpHe+2K6dlCIPmJLqEsYUx8TeXa/okyvb
 8ZXy/CfJNbHUrk6GZw7RFoeannwSPv9ZJO3Mfy5PDvwDk0Fj0J+/G92mR2Zucxpo
 siNyBCivPY5vBPqk+x6eUPev/C3wPS+dNrs4HOyr1N2gZwgTk40=
 =Ciqw
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Prevent unintentional page sharing by checking whether a page
   reference to a PMU samples page has been acquired properly before
   that

 - Make sure the LBR_SELECT MSR is saved/restored too

 - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
   residual data left

* tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Avoid put_page() when GUP fails
  perf/x86/vlbr: Add c->flags to vlbr event constraints
  perf/x86/lbr: Reset LBR_SELECT during vlbr reset
2021-11-14 09:33:12 -08:00