This patch fixes a bug where the sequence numbers of a socket created using
TCP repair functionality are lower than set after connect is called.
This occurs when the repair socket overlaps with a TIME-WAIT socket and
triggers the re-use code. The amount lower is equal to the number of times
that a particular IP/port set is re-used and then put back into TIME-WAIT.
Re-using the first time the sequence number is 1 lower, closing that socket
and then re-opening (with repair) a new socket with the same addresses/ports
puts the sequence number 2 lower than set via setsockopt. The third time is
3 lower, etc. I have not tested what the limit of this acrewal is, if any.
The fix is, if a socket is in repair mode, to respect the already set
sequence number and timestamp when it would have already re-used the
TIME-WAIT socket.
Signed-off-by: Stefan Baranoff <sbaranoff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SRCU has long used ->srcu_gp_seq, and now RCU uses ->gp_seq. This
commit therefore moves the rcutorture_get_gp_data() function from
a ->gpnum / ->completed pair to ->gp_seq.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(),
print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum
and ->completed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit converts the grace-period request code paths from ->completed
and ->gpnum to ->gp_seq. The need_future_gp_element() macro encapsulates
the shift operation required to use ->gp_seq as an index to the
->need_future_gp[] array. The rcu_cbs_completed() function is removed
in favor of the rcu_seq_snap() function. The rcu_start_this_gp()
gets some temporary consistency checks and uses rcu_seq_done(),
rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place
of the earlier open-coded comparisons of ->gpnum and ->completed.
The rcu_future_gp_cleanup() function replaces use of ->completed
with ->gp_seq. The rcu_accelerate_cbs() function replaces a call to
rcu_cbs_completed() with one to rcu_seq_snap(). The rcu_advance_cbs()
function replaces an access to >completed with one to ->gp_seq and adds
some temporary warnings. The rcu_nocb_wait_gp() function replaces a
call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded
comparison with rcu_seq_done().
The temporary warnings will be removed when the various ->gpnum and
->completed fields are removed. Their purpose is to locate code who
might still be using ->gpnum and ->completed. (Much easier that way
than trying to trace down the causes of too-short grace periods and
grace-period hangs!)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the quiescent-state no-backtracking checks from
->gpnum and ->completed to ->gp_seq.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the interrupt-disabled detection mechanism to
->gp_seq. This mechanism is used as part of RCU CPU stall warnings,
and detects cases where the stall is due to a CPU having interrupts
disabled.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_gp_in_progress() use ->gp_seq instead of
->completed and ->gpnum. The READ_ONCE() invocations are buried
in rcu_seq_current().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_try_advance_all_cbs() use ->gp_seq. It uses
rcu_seq_ctr() in order to shift away the state bits, so that the
low-order bits of the result may safely be used to index ->nocb_gp_wq[].
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the
exception of tracing, which will be converted later.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_implicit_dynticks_qs() use ->gp_seq, with the
exception of tracing, which will be converted later.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit converts rcu_gpnum_ovf() to use ->gp_seq instead of ->gpnum.
Same size unsigned long, so same approach.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves __note_gp_changes(), note_gp_changes(), and
__rcu_pending() to ->gp_seq, creating new rcu_seq_completed_gp() and
rcu_seq_new_gp() functions for this purpose.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Reinstate "cpuend: trace as suggested by Joel Fernandes. ]
This commit converts get_state_synchronize_rcu(), cond_synchronize_rcu(),
get_state_synchronize_sched(), and cond_synchronize_sched() from ->gpnum
and ->completed to ->gp_seq. Note that this also introduces a full
memory barrier in the already-done paths off cond_synchronize_rcu() and
cond_synchronize_sched(), as work with LKMM indicates that the earlier
smp_load_acquire() were insufficiently strong in some situations where
these two functions were called just as the grace period ended. In such
cases, these two functions would not gain the benefit of memory ordering
at the end of the grace period.
Please note that the performance impact is negligible, as you shouldn't
be using either function anywhere near a fastpath in any case.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the functions reporting quiescent states from
use of ->gpnum to ->gp_seq. In either case, the point is to handle
races where a given grace period ends before a quiescent state can
be reported. Failing to catch these races would result in too-short
grace periods, hence the checking.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches rcu_check_gp_kthread_starvation() from printing
->gpnum and ->completed to printing ->gp_seq upon detecting a starving
RCU grace-period kthread during an RCU CPU stall warning.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcutorture test invokes rcu_batches_started(),
rcu_batches_completed(), rcu_batches_started_bh(),
rcu_batches_completed_bh(), rcu_batches_started_sched(), and
rcu_batches_completed_sched() to do grace-period consistency checks,
and rcuperf uses the _completed variants for statistics.
These functions use ->gpnum and ->completed. This commit therefore
replaces them with rcu_get_gp_seq(), rcu_bh_get_gp_seq(), and
rcu_sched_get_gp_seq(), adjusting rcutorture and rcuperf to make
use of them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves rcu_gp_slow() to ->gp_seq. This function only uses
the grace-period number to modulate delay, so rcu_seq_ctr(rsp->gp_seq)
gets the same effect, at least in cases where the delay is to happen
more than four times per wrap of an unsigned long.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds grace-period sequence numbers (->gp_seq) to the
rcu_state, rcu_node, and rcu_data structures, and updates them.
It also checks for consistency between rsp->gpnum and rsp->gp_seq.
These ->gp_seq counters will eventually replace the existing ->gpnum
and ->completed counters, allowing a single memory access to determine
whether or not a grace period is in progress and if so, which one.
This in turn will enable changes that will reduce ->lock contention on
the leaf rcu_node structures.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
At the end of rcu_gp_cleanup(), if another grace period is needed, but
not via rcu_accelerate_cbs(), the ->gp_flags field is written twice,
once when making the new grace-period request, and once when clearing
all other types of requests. This commit therefore adds an else-clause
to avoid this double write.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit causes a splat if RCU is idle and a request for a new grace
period is ignored for more than one second. This splat normally indicates
that some code path asked for a new grace period, but failed to wake up
the RCU grace-period kthread.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix bug located by Dan Carpenter and his static checker. ]
[ paulmck: Fix self-deadlock bug located 0day test robot. ]
[ paulmck: Disable unless CONFIG_PROVE_RCU=y. ]
Use the existing %pad printk format to print dma_addr_t values.
This avoids the following warnings when compiling on the parisc64 platform:
drivers/block/skd_main.c: In function 'skd_preop_sg_list':
drivers/block/skd_main.c:660:4: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
syzkaller managed to trigger the following bug through fault injection:
[...]
[ 141.043668] verifier bug. No program starts at insn 3
[ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
[ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
fixup_call_args kernel/bpf/verifier.c:5587 [inline]
[ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
[ 141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
[ 141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 1.10.2-1 04/01/2014
[ 141.049877] Call Trace:
[ 141.050324] __dump_stack lib/dump_stack.c:77 [inline]
[ 141.050324] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
[ 141.050950] ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
[ 141.051837] panic+0x238/0x4e7 kernel/panic.c:184
[ 141.052386] ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
[ 141.053101] ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
[ 141.053814] ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
[ 141.054506] ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
[ 141.054506] ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
[ 141.054506] ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
[ 141.055163] __warn.cold.8+0x163/0x1ba kernel/panic.c:538
[ 141.055820] ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
[ 141.055820] ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
[ 141.055820] ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
[...]
What happens in jit_subprogs() is that kcalloc() for the subprog func
buffer is failing with NULL where we then bail out. Latter is a plain
return -ENOMEM, and this is definitely not okay since earlier in the
loop we are walking all subprogs and temporarily rewrite insn->off to
remember the subprog id as well as insn->imm to temporarily point the
call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
out in such state and handing this over to the interpreter is troublesome
since later/subsequent e.g. find_subprog() lookups are based on wrong
insn->imm.
Therefore, once we hit this point, we need to jump to out_free path
where we undo all changes from earlier loop, so that interpreter can
work on unmodified insn->{off,imm}.
Another point is that should find_subprog() fail in jit_subprogs() due
to a verifier bug, then we also should not simply defer the program to
the interpreter since also here we did partial modifications. Instead
we should just bail out entirely and return an error to the user who is
trying to load the program.
Fixes: 1c2a088a66 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+7d427828b2ea6e592804@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull timekeeping updates from John Stultz:
- Make the timekeeping update more precise when NTP frequency is set
directly by updating the multiplier.
- Adjust selftests
Defining `ATA_DEBUG` there are a lof of messages like below in the log.
[ 16.345472] ata_sg_setup: 1 sg elements mapped
As that is too verbose, only output these messages in verbose debug.
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Defining `ATA_DEBUG` nothing can be really seen, as the log is spammed
with CDB messages.
Therefore, guard the print by `ATA_VERBOSE_DEBUG`.
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Like phys, target_pwrs could be allocated with devm_ function
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The documentation about parameter for ahci_platform_shutdown has a typo.
This fix the following build warning:
drivers/ata/libahci_platform.c:693: warning: Function parameter or member 'pdev' not described in 'ahci_platform_shutdown'
drivers/ata/libahci_platform.c:693: warning: Excess function parameter 'dev' description in 'ahci_platform_shutdow
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch fixes checkpatch.pl warnings:
WARNING: Block comments should align the * on each line
Signed-off-by: Felix Siegel <felix.siegel@posteo.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move open open-curly brace to the next line following function
definition to match the kernel's coding style
Signed-off-by: Felix Siegel <felix.siegel@posteo.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To separate data members and the comment for better readability.
Signed-off-by: Roman Kiryanov <rkir@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux kernel coding style: spaces are never used for
indentation.
Signed-off-by: Roman Kiryanov <rkir@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix to return a negative error code from the kthread_run() error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the files are all properly tagged with SPDX lines, and the
boilerplate license text is gone, remove the TODO item.
Cc: Rob Springer <rspringer@google.com>
Cc: John Joseph <jnjoseph@google.com>
Cc: Ben Chan <benchan@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the SPDX tag is in all gasket files, that identifies the
license in a specific and legally-defined manner. So the extra GPL text
wording can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
Cc: Rob Springer <rspringer@google.com>
Cc: John Joseph <jnjoseph@google.com>
Cc: Ben Chan <benchan@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Fix up the all of the staging gasket files to have a proper SPDX
identifier, based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
Cc: Rob Springer <rspringer@google.com>
Cc: John Joseph <jnjoseph@google.com>
Cc: Ben Chan <benchan@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the parallelized initialization of expedited grace periods uses
the workqueue associated with each rcu_node structure's ->grplo field.
This works fine unless that CPU is offline. This commit therefore uses
the CPU corresponding to the lowest-numbered online CPU, or just queues
the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding
to this rcu_node structure.
Note that this patch uses cpu_is_offline() instead of the usual approach
of checking bits in the rcu_node structure's ->qsmaskinitnext field. This
is safe because preemption is disabled across both the cpu_is_offline()
check and the call to queue_work_on().
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Disable preemption to close offline race window. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply Peter Zijlstra feedback on CPU selection. ]
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Using ktime_to_ns() is nice to help backports to stable kernels.
Having a typesafe function instead of a macro avoid stupid typos
and waste of time tracking these typos.
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180711181641.10369-1-edumazet@google.com
Extend tc tunnel_key action unit tests with geneve options. Tests
include testing single and multiple geneve options, as well as
testing geneve options that are expected to fail.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lockdep is reporting a possible circular locking dependency:
======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc1-test-test+ #4 Not tainted
------------------------------------------------------
user_example/766 is trying to acquire lock:
0000000073479a0f (rdtgroup_mutex){+.+.}, at: pseudo_lock_dev_mmap
but task is already holding lock:
000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&mm->mmap_sem){++++}:
_copy_to_user+0x1e/0x70
filldir+0x91/0x100
dcache_readdir+0x54/0x160
iterate_dir+0x142/0x190
__x64_sys_getdents+0xb9/0x170
do_syscall_64+0x86/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #1 (&sb->s_type->i_mutex_key#3){++++}:
start_creating+0x60/0x100
debugfs_create_dir+0xc/0xc0
rdtgroup_pseudo_lock_create+0x217/0x4d0
rdtgroup_schemata_write+0x313/0x3d0
kernfs_fop_write+0xf0/0x1a0
__vfs_write+0x36/0x190
vfs_write+0xb7/0x190
ksys_write+0x52/0xc0
do_syscall_64+0x86/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (rdtgroup_mutex){+.+.}:
__mutex_lock+0x80/0x9b0
pseudo_lock_dev_mmap+0x2f/0x170
mmap_region+0x3d6/0x610
do_mmap+0x387/0x580
vm_mmap_pgoff+0xcf/0x110
ksys_mmap_pgoff+0x170/0x1f0
do_syscall_64+0x86/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Chain exists of:
rdtgroup_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key#3);
lock(&mm->mmap_sem);
lock(rdtgroup_mutex);
*** DEADLOCK ***
1 lock held by user_example/766:
#0: 000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x110
rdtgroup_mutex is already being released temporarily during pseudo-lock
region creation to prevent the potential deadlock between rdtgroup_mutex
and mm->mmap_sem that is obtained during device_create(). Move the
debugfs creation into this area to avoid the same circular dependency.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/fffb57f9c6b8285904c9a60cc91ce21591af17fe.1531332480.git.reinette.chatre@intel.com
- Join split message for easier grepping,
- Use pr_*() instead of printk*(),
- Use %u to format unsigned cpu numbers.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180712144118.8819-1-geert+renesas@glider.be
Pointer 'name' is being assigned but is never used hence it is redundant
and can be removed.
Cleans up clang warning:
warning: variable 'name' set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Russell King says:
====================
This series improves the ARM BPF JIT compiler by:
- enumerating the stack layout rather than using constants that happen
to be multiples of four
- rejig the BPF "register" accesses to use negative numbers instead of
positive, which could be confused with register numbers in the bpf2a32
array.
- since we maintain the ARM FP register as a pointer to the top of our
scratch space (or, with frame pointers enabled, a valid ARM frame
pointer register), we can access our scratch space using FP, which is
constant across all BPF programs, including tail-called programs.
- use immediate forms of ARM instructions where possible, rather than
first loading the immediate into an ARM register.
- use load-with-shift instruction rather than seperate shift instruction
followed by load
- avoid reloading index and array in the tail-call code
- use double-word load/store instructions where available
Version 2:
- Fix ARMv5 test pointed out by Olof
- Fix build error found by 0-day (adding an additional patch)
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use double-word load and stores where support for this instruction is
supported by the CPU architecture.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Always use an odd/even register pair for our 64-bit registers, so that
we're able to use the double-word load/store instructions in the future.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Rearranging the order of the initial tail call code a little allows is
to avoid reloading the 'array' pointer.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Avoid reloading 'index' after we have validated it - it remains in
tmp2[1] up to the point that we begin the code to index the pointer
array, so with a little rearrangement of the registers, we can use
the already loaded value.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>