hrtimer_force_reprogram() needs to be available unconditionally for softirq
based hrtimers. Move the function and all required struct members out of
the CONFIG_HIGH_RES_TIMERS #ifdef.
There is no functional change because hrtimer_force_reprogram() is only
invoked when hrtimer_cpu_base.hres_active is true and
CONFIG_HIGH_RES_TIMERS=y.
Making it unconditional increases the text size for the
CONFIG_HIGH_RES_TIMERS=n case slightly, but avoids replication of that code
for the upcoming softirq based hrtimers support. Most of the code gets
eliminated in the CONFIG_HIGH_RES_TIMERS=n case by the compiler.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-19-anna-maria@linutronix.de
[ Made it build on !CONFIG_HIGH_RES_TIMERS ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs,
so the prog show command prints the numeric type value:
$ bpftool prog show
1: type 15 name bpf_prog1 tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58 uid 0
xlated 96B jited 105B memlock 4096B
This patch defines the corresponding textual representation:
$ bpftool prog show
1: cgroup_device name bpf_prog1 tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58 uid 0
xlated 96B jited 105B memlock 4096B
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
hrtimer_reprogram() needs to be available unconditionally for softirq based
hrtimers. Move the function and all required struct members out of the
CONFIG_HIGH_RES_TIMERS #ifdef.
There is no functional change because hrtimer_reprogram() is only invoked
when hrtimer_cpu_base.hres_active is true. Making it unconditional
increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids
replication of that code for the upcoming softirq based hrtimers support.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer
in a CPU base.
This pointer cannot be dereferenced and is solely used to check whether a
hrtimer which is removed is the hrtimer which is the first to expire in the
CPU base. If this is the case, then the timer hardware needs to be
reprogrammed to avoid an extra interrupt for nothing.
Again, this is conditional functionality, but there is no compelling reason
to make this conditional. As a preparation, hrtimer_cpu_base.next_timer
needs to be available unconditonally.
Aside of that the upcoming support for softirq based hrtimers requires access
to this pointer unconditionally as well, so our motivation is not entirely
simplicity based.
Make the update of hrtimer_cpu_base.next_timer unconditional and remove the
#ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is
marginal as it's just a store on an already dirtied cacheline.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_cpu_base.expires_next is used to cache the next event armed in the
timer hardware. The value is used to check whether an hrtimer can be
enqueued remotely. If the new hrtimer is expiring before expires_next, then
remote enqueue is not possible as the remote hrtimer hardware cannot be
accessed for reprogramming to an earlier expiry time.
The remote enqueue check is currently conditional on
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no
compelling reason to make this conditional.
Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y
guarded area and remove the conditionals in hrtimer_check_target().
The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the
!hrtimer_cpu_base.hres_active case because in these cases nothing updates
hrtimer_cpu_base.expires_next yet. This will be changed with later patches
which further reduce the #ifdef zoo in this code.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__hrtimer_hres_active() is now available unconditionally, so replace open
coded direct accesses to hrtimer_cpu_base.hres_active.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-15-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The hrtimer_cpu_base::hres_active_member field depends on CONFIG_HIGH_RES_TIMERS=y
currently, and all related functions to this member are conditional as well.
To simplify the code make it unconditional and set it to zero during initialization.
(This will also help with the upcoming softirq based hrtimers code.)
The conditional code sections can be avoided by adding IS_ENABLED(HIGHRES)
conditionals into common functions, which ensures dead code elimination.
There is no functional change.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-14-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The upcoming softirq based hrtimers support requires an additional field in
the hrtimer_cpu_base struct, which would grow the struct size beyond a
cache line.
The hrtimer_cpu_base::nr_retries and ::nr_hangs members are solely
used for diagnostic output and have no requirement to be 'unsigned int'.
Make them 'unsigned short' to create room for the new struct member.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-13-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The pointer to the currently running timer is stored in hrtimer_cpu_base
before the base lock is dropped and the callback is invoked.
This results in two levels of indirections and the upcoming support for
softirq based hrtimer requires splitting the "running" storage into soft
and hard IRQ context expiry.
Storing both in the cpu base would require conditionals in all code paths
accessing that information.
It's possible to have a per clock base sequence count and running pointer
without changing the semantics of the related mechanisms because the timer
base pointer cannot be changed while a timer is running the callback.
Unfortunately this makes cpu_clock base larger than 32 bytes on 32-bit
kernels. Instead of having huge gaps due to alignment, remove the alignment
and let the compiler pack CPU base for 32-bit kernels. The resulting cache access
patterns are fortunately not really different from the current
behaviour. On 64-bit kernels the 64-byte alignment stays and the behaviour is
unchanged. This was determined by analyzing the resulting layout and
looking at the number of cache lines involved for the frequently used
clocks.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-12-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Looping over all clock bases to find active bits is suboptimal if not all
bases are active.
Avoid this by converting it to a __ffs() evaluation. The functionallity is
outsourced into its own function and is called via a macro as suggested by
Peter Zijlstra.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-11-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'hrtimer_start' tracepoint lacks the mode information. The mode is
important because consecutive starts can switch from ABS to REL or from
PINNED to non PINNED.
Append the mode field.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-10-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So far only CLOCK_MONOTONIC and CLOCK_REALTIME were taken into account as
well as HRTIMER_MODE_ABS/REL in the hrtimer_init tracepoint. The query for
detecting the ABS or REL timer modes is not valid anymore, it got broken
by the introduction of HRTIMER_MODE_PINNED.
HRTIMER_MODE_PINNED is not evaluated in the hrtimer_init() call, but for the
sake of completeness print all given modes.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-9-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's not obvious that the HRTIMER_MODE variants are bit combinations,
because all modes are hard coded constants currently.
Change it so the bit meanings are clear; and use the symbols for creating
modes which combine bits.
While at it get rid of the ugly tail comments as well.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-8-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The POSIX specification defines that relative CLOCK_REALTIME timers are not
affected by clock modifications. Those timers have to use CLOCK_MONOTONIC
to ensure POSIX compliance.
The introduction of the additional HRTIMER_MODE_PINNED mode broke this
requirement for pinned timers.
There is no user space visible impact because user space timers are not
using pinned mode, but for consistency reasons this needs to be fixed.
Check whether the mode has the HRTIMER_MODE_REL bit set instead of
comparing with HRTIMER_MODE_ABS.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Fixes: 597d027573 ("timers: Framework for identifying pinned timers")
Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only
when HRTIMER_MODE_PINNED is set.
Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-6-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the
clock ID, instead of the customary predefined "clockid_t" type.
In hrtimer coding style the canonical variable name for the clock ID is
'clock_id', therefore change the name of the parameter here as well
to make it all consistent.
While at it, clean up the description for the 'clock_id' and 'mode'
function parameters. The clock modes and the clock IDs are not
restricted as the comment suggests.
Fix the mode description as well for the callers of schedule_hrtimeout_range_clock().
No functional changes intended.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The '/**' sequence marks the start of a structure description. Add the
missing second asterisk. While at it adapt the ordering of the struct
members to the struct definition and document the purpose of
expires_next more precisely.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-4-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The protection of a hrtimer which runs its callback against migration to a
different CPU has nothing to do with hard interrupt context.
The protection against migration of a hrtimer running the expiry callback
is the pointer in the cpu_base which holds a pointer to the currently
running timer. This pointer is evaluated in the code which potentially
switches the timer base and makes sure it's kept on the CPU on which the
callback is running.
Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-3-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The hrtimer_cpu_base::migration_enable and ::nohz_active fields
were originally introduced to avoid accessing global variables
for these decisions.
Still that results in a (cache hot) load and conditional branch,
which can be avoided by using static keys.
Implement it with static keys and optimize for the most critical
case of high performance networking which tends to disable the
timer migration functionality.
No change in functionality.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos
Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds a register identifier for use with the one_reg interface
to allow the decrementer expiry time to be read and written by
userspace. The decrementer expiry time is in guest timebase units
and is equal to the sum of the decrementer and the guest timebase.
(The expiry time is used rather than the decrementer value itself
because the expiry time is not constantly changing, though the
decrementer value is, while the guest vcpu is not running.)
Without this, a guest vcpu migrated to a new host will see its
decrementer set to some random value. On POWER8 and earlier, the
decrementer is 32 bits wide and counts down at 512MHz, so the
guest vcpu will potentially see no decrementer interrupts for up
to about 4 seconds, which will lead to a stall. With POWER9, the
decrementer is now 56 bits side, so the stall can be much longer
(up to 2.23 years) and more noticeable.
To help work around the problem in cases where userspace has not been
updated to migrate the decrementer expiry time, we now set the
default decrementer expiry at vcpu creation time to the current time
rather than the maximum possible value. This should mean an
immediate decrementer interrupt when a migrated vcpu starts
running. In cases where the decrementer is 32 bits wide and more
than 4 seconds elapse between the creation of the vcpu and when it
first runs, the decrementer would have wrapped around to positive
values and there may still be a stall - but this is no worse than
the current situation. In the large-decrementer case, we are sure
to get an immediate decrementer interrupt (assuming the time from
vcpu creation to first run is less than 2.23 years) and we thus
avoid a very long stall.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
We cannot access the skb->_nfct field when CONFIG_NF_CONNTRACK is
disabled:
net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag':
net/ipv4/netfilter/nf_defrag_ipv4.c:83:9: error: 'struct sk_buff' has no member named '_nfct'
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag':
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68:9: error: 'struct sk_buff' has no member named '_nfct'
Both functions already have an #ifdef for this, so let's move the
check in there.
Fixes: 902d6a4c2a ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As a side-effect of adding the module option, we now get a section
mismatch warning:
WARNING: net/ipv4/netfilter/iptable_raw.o(.data+0x1c): Section mismatch in reference from the variable packet_raw to the function .init.text:iptable_raw_table_init()
The variable packet_raw references
the function __init iptable_raw_table_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
Apparently it's ok to link to a __net_init function from .rodata but not
from .data. We can address this by rearranging the logic so that the
structure is read-only again. Instead of writing to the .priority field
later, we have an extra copies of the structure with that flag. An added
advantage is that that we don't have writable function pointers with this
approach.
Fixes: 902d6a4c2a ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ipv6_defrag pulls network headers before fragment header. In case of
an error, the netfilter layer is currently dropping these packets.
This results in failure of some IPv6 standards tests which passed on
older kernels due to the netfilter framework using cloning.
The test case run here is a check for ICMPv6 error message replies
when some invalid IPv6 fragments are sent. This specific test case is
listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf
in the Extension Header Processing Order section.
A packet with unrecognized option Type 11 is sent and the test expects
an ICMP error in line with RFC2460 section 4.2 -
11 - discard the packet and, only if the packet's Destination
Address was not a multicast address, send an ICMP Parameter
Problem, Code 2, message to the packet's Source Address,
pointing to the unrecognized Option Type.
Since netfilter layer now drops all invalid IPv6 frag packets, we no
longer see the ICMP error message and fail the test case.
To fix this, save the transport header. If defrag is unable to process
the packet due to RFC2460, restore the transport header and allow packet
to be processed by stack. There is no change for other packet
processing paths.
Tested by confirming that stack sends an ICMP error when it receives
these packets. Also tested that fragmented ICMP pings succeed.
v1->v2: Instead of cloning always, save the transport_header and
restore it in case of this specific error. Update the title and
commit message accordingly.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
request_module may return a positive error result from modprobe,
if we cast this to ERR_PTR this returns a garbage result (it passes
IS_ERR checks).
Fix it by ignoring modprobe return values entirely, just retry the
table lookup instead.
Reported-by: syzbot+980925dbfbc7f93bc2ef@syzkaller.appspotmail.com
Fixes: 03d13b6868 ("netfilter: xtables: add and use xt_request_find_table_lock")
Fixes: 20651cefd2 ("netfilter: x_tables: unbreak module auto loading")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently the BSP microcode update code examines the initrd very early
in the boot process. If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet. Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for encrypting more than just the kernel, the encryption
support in sme_encrypt_kernel() needs to support 4KB page aligned
encryption instead of just 2MB large page aligned encryption.
Update the routines that populate the PGD to support non-2MB aligned
addresses. This is done by creating PTE page tables for the start
and end portion of the address range that fall outside of the 2MB
alignment. This results in, at most, two extra pages to hold the
PTE entries for each mapping of a range.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192626.6026.75387.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for encrypting more than just the kernel during early
boot processing, centralize the use of the PMD flag settings based
on the type of mapping desired. When 4KB aligned encryption is added,
this will allow either PTE flags or large page PMD flags to be used
without requiring the caller to adjust.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192615.6026.14767.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for follow-on patches, combine the PGD mapping parameters
into a struct to reduce the number of function arguments and allow for
direct updating of the next pagetable mapping area pointer.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192605.6026.96206.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Clean up the use of PUSH and POP and when registers are saved in the
__enc_copy() assembly function in order to improve the readability of the code.
Move parameter register saving into general purpose registers earlier
in the code and move all the pushes to the beginning of the function
with corresponding pops at the end.
We do this to prepare fixes.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192556.6026.74187.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
During code inspection, I found an use-after-free possibility during unloading
ipmi_si in the polling mode.
If start_new_msg() is called after kthread_stop(), the function will try to
wake up non-existing kthread using the dangling pointer.
Possible scenario is when a new internal message is generated after
ipmi_unregister_smi()[*1] and remains after stop_timer_and_thread()
in clenaup_one_si() [*2].
Use-after-free could occur as follows depending on BMC replies.
cleanup_one_si
=> ipmi_unregister_smi
[*1]
=> stop_timer_and_thread
=> kthread_stop(smi_info->thread)
[*2]
=> poll
=> smi_event_handler
=> start_new_msg
=> if (smi_info->thread)
wake_up_process(smi_info->thread) <== use-after-free!!
Although currently it seems no such message is generated in the polling mode,
some changes might introduce that in thefuture. For example in the interrupt
mode, disable_si_irq() does that at [*2].
So let's prevent such a critical issue possibility now.
Signed-off-by: Yamazaki Masamitsu <m-yamazaki@ah.jp.nec.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
If a nonexistent file is supplied to objtool, it complains with a
non-helpful error:
open: No such file or directory
Improve it to:
objtool: Can't open 'foo': No such file or directory
Reported-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Objtool segfaults when the gold linker is used with
CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.
With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
being passed to objtool. The gold linker seems to strip unused ELF
symbols by default, which confuses objtool and causes the seg fault when
it's trying to generate ORC metadata.
Objtool should really be running immediately after GCC anyway, without a
linker call in between. Change the makefile ordering so that objtool is
called before the linker.
Reported-and-tested-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Without CONFIG_NF_FLOW_TABLE, the new nft_flow_offload module produces
a link error:
net/netfilter/nft_flow_offload.o: In function `nft_flow_offload_iterate_cleanup':
nft_flow_offload.c:(.text+0xb0): undefined reference to `nf_flow_table_iterate'
net/netfilter/nft_flow_offload.o: In function `flow_offload_iterate_cleanup':
nft_flow_offload.c:(.text+0x160): undefined reference to `flow_offload_dead'
net/netfilter/nft_flow_offload.o: In function `nft_flow_offload_eval':
nft_flow_offload.c:(.text+0xc4c): undefined reference to `flow_offload_alloc'
nft_flow_offload.c:(.text+0xc64): undefined reference to `flow_offload_add'
nft_flow_offload.c:(.text+0xc94): undefined reference to `flow_offload_free'
This adds a Kconfig dependency for it.
Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.
Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.
In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.
When copying the embedded sigval union copy the si_int member. That
ensures the 32bit values passes through the kernel unchanged.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Nothing tests this define so just remove it.
I suspect the intention was to make the uid field in siginfo 16bit
however I can't find any code that ever tested this defined, and
even if it did it the layout has been this way for 8 years so
changing it now would break the ABI with userspace.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later. To avoid that merge the blackfin specific si_codes
into uapi/asm-generic/siginfo.h
Update copy_siginfo_to_user to copy with the absence of BUS_MCEERR_AR that blackfin
defines to be something else.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later. To avoid that merge the tile specific si_codes
into uapi/asm-generic/siginfo.h
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later. To avoid that merce the frv specific si_codes
into uapi/asm-generic/siginfo.h
This allows the removal of arch/frv/uapi/include/asm/siginfo.h as the last
last meaningful definition it held was FPE_MDAOVF.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Having si_codes in many different files simply encourages duplicate
definitions that can cause problems later. To avoid that merge the
ia64 specific si_codes into uapi/asm-generic/siginfo.h
Update the sanity checks in arch/x86/kernel/signal_compat.c to expect
the now lager NSIGILL and NSIGFPE. As nothing excpe the larger count
is exposed on x86 no additional code needs to be updated.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
NSIGTRAP is 4 in the generic siginfo and powerpc just undefines
NSGTRAP and redefine it as 4. That accomplishes nothing so remove
the duplication.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The addr_lsb fields is only valid and available when the
signal is SIGBUS and the si_code is BUS_MCEERR_AR or BUS_MCEERR_AO.
Document this with a comment and place the field in the _sigfault union
to make this clear.
All of the fields stay in the same physical location so both the old
and new definitions of struct siginfo will continue to work.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
linux/compat.h
CONFIG_X86_X32 is set when the user requests X32 support.
CONFIG_X86_X32_ABI is set when the user requests X32 support
and the tool-chain has X32 allowing X32 support to be built.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We can force levels of SCL and SDA, so we can use that for bus recovery.
Note that we cannot read SDA back, because we will only get the internal
state of the bus free detection.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If we managed to get a client release SDA again, send a STOP afterwards
to make sure we have a consistent state on the bus again.
Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
If we have a function to control SDA, we should ensure that SDA is not
held down by us. So, release the GPIO in this case.
Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
This will be needed when we want to create STOP conditions, too, later.
Create the needed fields and populate them for the GPIO case if the GPIO
is set to output.
Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
No reason to have them undefined, so let's add them.
Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
"Used internally" is vague. What it actually means is that those fields
are populated by the core if valid GPIOs are provided. Change the
comments to reflect that.
Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>