Commit graph

737516 commits

Author SHA1 Message Date
Anna-Maria Gleixner
6de6250c75 hrtimer: Fix hrtimer_start[_range_ns]() function descriptions
The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only
when HRTIMER_MODE_PINNED is set.

Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-6-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 02:35:45 +01:00
Anna-Maria Gleixner
907777136f hrtimer: Clean up the 'int clock' parameter of schedule_hrtimeout_range_clock()
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the
clock ID, instead of the customary predefined "clockid_t" type.

In hrtimer coding style the canonical variable name for the clock ID is
'clock_id', therefore change the name of the parameter here as well
to make it all consistent.

While at it, clean up the description for the 'clock_id' and 'mode'
function parameters. The clock modes and the clock IDs are not
restricted as the comment suggests.

Fix the mode description as well for the callers of schedule_hrtimeout_range_clock().

No functional changes intended.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 02:35:44 +01:00
Anna-Maria Gleixner
1fbc78b3c9 hrtimer: Fix kerneldoc syntax for 'struct hrtimer_cpu_base'
The '/**' sequence marks the start of a structure description. Add the
missing second asterisk. While at it adapt the ordering of the struct
members to the struct definition and document the purpose of
expires_next more precisely.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-4-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 02:35:44 +01:00
Thomas Gleixner
d05ca13b8d hrtimer: Correct blatantly incorrect comment
The protection of a hrtimer which runs its callback against migration to a
different CPU has nothing to do with hard interrupt context.

The protection against migration of a hrtimer running the expiry callback
is the pointer in the cpu_base which holds a pointer to the currently
running timer. This pointer is evaluated in the code which potentially
switches the timer base and makes sure it's kept on the CPU on which the
callback is running.

Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-3-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 02:35:44 +01:00
Thomas Gleixner
ae67badaa1 hrtimer: Optimize the hrtimer code by using static keys for migration_enable/nohz_active
The hrtimer_cpu_base::migration_enable and ::nohz_active fields
were originally introduced to avoid accessing global variables
for these decisions.

Still that results in a (cache hot) load and conditional branch,
which can be avoided by using static keys.

Implement it with static keys and optimize for the most critical
case of high performance networking which tends to disable the
timer migration functionality.

No change in functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos
Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 02:35:44 +01:00
Ingo Molnar
57957fb519 Merge branch 'timers/urgent' into timers/core, to pick up dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 02:33:42 +01:00
Paul Mackerras
5855564c8a KVM: PPC: Book3S HV: Enable migration of decrementer register
This adds a register identifier for use with the one_reg interface
to allow the decrementer expiry time to be read and written by
userspace.  The decrementer expiry time is in guest timebase units
and is equal to the sum of the decrementer and the guest timebase.
(The expiry time is used rather than the decrementer value itself
because the expiry time is not constantly changing, though the
decrementer value is, while the guest vcpu is not running.)

Without this, a guest vcpu migrated to a new host will see its
decrementer set to some random value.  On POWER8 and earlier, the
decrementer is 32 bits wide and counts down at 512MHz, so the
guest vcpu will potentially see no decrementer interrupts for up
to about 4 seconds, which will lead to a stall.  With POWER9, the
decrementer is now 56 bits side, so the stall can be much longer
(up to 2.23 years) and more noticeable.

To help work around the problem in cases where userspace has not been
updated to migrate the decrementer expiry time, we now set the
default decrementer expiry at vcpu creation time to the current time
rather than the maximum possible value.  This should mean an
immediate decrementer interrupt when a migrated vcpu starts
running.  In cases where the decrementer is 32 bits wide and more
than 4 seconds elapse between the creation of the vcpu and when it
first runs, the decrementer would have wrapped around to positive
values and there may still be a stall - but this is no worse than
the current situation.  In the large-decrementer case, we are sure
to get an immediate decrementer interrupt (assuming the time from
vcpu creation to first run is less than 2.23 years) and we thus
avoid a very long stall.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-16 11:54:45 +11:00
Arnd Bergmann
41e4b39115 netfilter: nf_defrag: move NF_CONNTRACK bits into #ifdef
We cannot access the skb->_nfct field when CONFIG_NF_CONNTRACK is
disabled:

net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag':
net/ipv4/netfilter/nf_defrag_ipv4.c:83:9: error: 'struct sk_buff' has no member named '_nfct'
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag':
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68:9: error: 'struct sk_buff' has no member named '_nfct'

Both functions already have an #ifdef for this, so let's move the
check in there.

Fixes: 902d6a4c2a ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16 01:52:08 +01:00
Arnd Bergmann
b069b37adb netfilter: nf_defrag: mark xt_table structures 'const' again
As a side-effect of adding the module option, we now get a section
mismatch warning:

WARNING: net/ipv4/netfilter/iptable_raw.o(.data+0x1c): Section mismatch in reference from the variable packet_raw to the function .init.text:iptable_raw_table_init()
The variable packet_raw references
the function __init iptable_raw_table_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console

Apparently it's ok to link to a __net_init function from .rodata but not
from .data. We can address this by rearranging the logic so that the
structure is read-only again. Instead of writing to the .priority field
later, we have an extra copies of the structure with that flag. An added
advantage is that that we don't have writable function pointers with this
approach.

Fixes: 902d6a4c2a ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16 01:52:07 +01:00
Subash Abhinov Kasiviswanathan
83f1999cae netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460
ipv6_defrag pulls network headers before fragment header. In case of
an error, the netfilter layer is currently dropping these packets.
This results in failure of some IPv6 standards tests which passed on
older kernels due to the netfilter framework using cloning.

The test case run here is a check for ICMPv6 error message replies
when some invalid IPv6 fragments are sent. This specific test case is
listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf
in the Extension Header Processing Order section.

A packet with unrecognized option Type 11 is sent and the test expects
an ICMP error in line with RFC2460 section 4.2 -

11 - discard the packet and, only if the packet's Destination
     Address was not a multicast address, send an ICMP Parameter
     Problem, Code 2, message to the packet's Source Address,
     pointing to the unrecognized Option Type.

Since netfilter layer now drops all invalid IPv6 frag packets, we no
longer see the ICMP error message and fail the test case.

To fix this, save the transport header. If defrag is unable to process
the packet due to RFC2460, restore the transport header and allow packet
to be processed by stack. There is no change for other packet
processing paths.

Tested by confirming that stack sends an ICMP error when it receives
these packets. Also tested that fragmented ICMP pings succeed.

v1->v2: Instead of cloning always, save the transport_header and
restore it in case of this specific error. Update the title and
commit message accordingly.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16 01:52:06 +01:00
Florian Westphal
e3eeacbac4 netfilter: x_tables: don't return garbage pointer on modprobe failure
request_module may return a positive error result from modprobe,
if we cast this to ERR_PTR this returns a garbage result (it passes
IS_ERR checks).

Fix it by ignoring modprobe return values entirely, just retry the
table lookup instead.

Reported-by: syzbot+980925dbfbc7f93bc2ef@syzkaller.appspotmail.com
Fixes: 03d13b6868 ("netfilter: xtables: add and use xt_request_find_table_lock")
Fixes: 20651cefd2 ("netfilter: x_tables: unbreak module auto loading")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16 01:51:59 +01:00
Tom Lendacky
107cd25321 x86/mm: Encrypt the initrd earlier for BSP microcode update
Currently the BSP microcode update code examines the initrd very early
in the boot process.  If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet.  Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.

Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:50:59 +01:00
Tom Lendacky
cc5f01e28d x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
In preparation for encrypting more than just the kernel, the encryption
support in sme_encrypt_kernel() needs to support 4KB page aligned
encryption instead of just 2MB large page aligned encryption.

Update the routines that populate the PGD to support non-2MB aligned
addresses.  This is done by creating PTE page tables for the start
and end portion of the address range that fall outside of the 2MB
alignment.  This results in, at most, two extra pages to hold the
PTE entries for each mapping of a range.

Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192626.6026.75387.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:50:58 +01:00
Tom Lendacky
2b5d00b6c2 x86/mm: Centralize PMD flags in sme_encrypt_kernel()
In preparation for encrypting more than just the kernel during early
boot processing, centralize the use of the PMD flag settings based
on the type of mapping desired.  When 4KB aligned encryption is added,
this will allow either PTE flags or large page PMD flags to be used
without requiring the caller to adjust.

Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192615.6026.14767.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:50:58 +01:00
Tom Lendacky
bacf6b499e x86/mm: Use a struct to reduce parameters for SME PGD mapping
In preparation for follow-on patches, combine the PGD mapping parameters
into a struct to reduce the number of function arguments and allow for
direct updating of the next pagetable mapping area pointer.

Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192605.6026.96206.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:50:58 +01:00
Tom Lendacky
1303880179 x86/mm: Clean up register saving in the __enc_copy() assembly code
Clean up the use of PUSH and POP and when registers are saved in the
__enc_copy() assembly function in order to improve the readability of the code.

Move parameter register saving into general purpose registers earlier
in the code and move all the pushes to the beginning of the function
with corresponding pops at the end.

We do this to prepare fixes.

Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192556.6026.74187.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:50:58 +01:00
Masamitsu Yamazaki
bd1c06a4f5 ipmi: Clear smi_info->thread to prevent use-after-free during module unload
During code inspection, I found an use-after-free possibility during unloading
ipmi_si in the polling mode.

If start_new_msg() is called after kthread_stop(), the function will try to
wake up non-existing kthread using the dangling pointer.

Possible scenario is when a new internal message is generated after
ipmi_unregister_smi()[*1] and remains after stop_timer_and_thread()
in clenaup_one_si() [*2].
Use-after-free could occur as follows depending on BMC replies.

  cleanup_one_si
    => ipmi_unregister_smi
       [*1]
    => stop_timer_and_thread
       => kthread_stop(smi_info->thread)
       [*2]
    => poll
       => smi_event_handler
          => start_new_msg
             => if (smi_info->thread)
                    wake_up_process(smi_info->thread) <== use-after-free!!

Although currently it seems no such message is generated in the polling mode,
some changes might introduce that in thefuture. For example in the interrupt
mode, disable_si_irq() does that at [*2].

So let's prevent such a critical issue possibility now.

Signed-off-by: Yamazaki Masamitsu <m-yamazaki@ah.jp.nec.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2018-01-15 18:34:34 -06:00
Josh Poimboeuf
385d11b152 objtool: Improve error message for bad file argument
If a nonexistent file is supplied to objtool, it complains with a
non-helpful error:

  open: No such file or directory

Improve it to:

  objtool: Can't open 'foo': No such file or directory

Reported-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:27:27 +01:00
Josh Poimboeuf
2a0098d706 objtool: Fix seg fault with gold linker
Objtool segfaults when the gold linker is used with
CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.

With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
being passed to objtool.  The gold linker seems to strip unused ELF
symbols by default, which confuses objtool and causes the seg fault when
it's trying to generate ORC metadata.

Objtool should really be running immediately after GCC anyway, without a
linker call in between.  Change the makefile ordering so that objtool is
called before the linker.

Reported-and-tested-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:27:27 +01:00
Arnd Bergmann
9be9d04b28 netfilter: nf_tables: flow_offload depends on flow_table
Without CONFIG_NF_FLOW_TABLE, the new nft_flow_offload module produces
a link error:

net/netfilter/nft_flow_offload.o: In function `nft_flow_offload_iterate_cleanup':
nft_flow_offload.c:(.text+0xb0): undefined reference to `nf_flow_table_iterate'
net/netfilter/nft_flow_offload.o: In function `flow_offload_iterate_cleanup':
nft_flow_offload.c:(.text+0x160): undefined reference to `flow_offload_dead'
net/netfilter/nft_flow_offload.o: In function `nft_flow_offload_eval':
nft_flow_offload.c:(.text+0xc4c): undefined reference to `flow_offload_alloc'
nft_flow_offload.c:(.text+0xc64): undefined reference to `flow_offload_add'
nft_flow_offload.c:(.text+0xc94): undefined reference to `flow_offload_free'

This adds a Kconfig dependency for it.

Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16 01:19:16 +01:00
Eric W. Biederman
eb5346c379 signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
The new unified copy_siginfo_from_user32 takes care of this.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 18:01:19 -06:00
Eric W. Biederman
212a36a17e signal: Unify and correct copy_siginfo_from_user32
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.

Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.

In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.

When copying the embedded sigval union copy the si_int member.  That
ensures the 32bit values passes through the kernel unchanged.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:55:59 -06:00
Eric W. Biederman
56b81456f4 signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
Nothing tests this define so just remove it.

I suspect the intention was to make the uid field in siginfo 16bit
however I can't find any code that ever tested this defined, and
even if it did it the layout has been this way for 8 years so
changing it now would break the ABI with userspace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:36 -06:00
Eric W. Biederman
71ee78d538 signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later.  To avoid that merge the blackfin specific si_codes
into uapi/asm-generic/siginfo.h

Update copy_siginfo_to_user to copy with the absence of BUS_MCEERR_AR that blackfin
defines to be something else.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:35 -06:00
Eric W. Biederman
753e5a8543 signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later.  To avoid that merge the tile specific si_codes
into uapi/asm-generic/siginfo.h

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:35 -06:00
Eric W. Biederman
8bc9e33848 signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
Having si_codes in many different files simply encourages duplicate definitions
that can cause problems later.  To avoid that merce the frv specific si_codes
into uapi/asm-generic/siginfo.h

This allows the removal of arch/frv/uapi/include/asm/siginfo.h as the last
last meaningful definition it held was FPE_MDAOVF.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:34 -06:00
Eric W. Biederman
ac54058d77 signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
Having si_codes in many different files simply encourages duplicate
definitions that can cause problems later.  To avoid that merge the
ia64 specific si_codes into uapi/asm-generic/siginfo.h

Update the sanity checks in arch/x86/kernel/signal_compat.c to expect
the now lager NSIGILL and NSIGFPE.  As nothing excpe the larger count
is exposed on x86 no additional code needs to be updated.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:33 -06:00
Eric W. Biederman
ad2b1ab57d signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
NSIGTRAP is 4 in the generic siginfo and powerpc just undefines
NSGTRAP and redefine it as 4.  That accomplishes nothing so remove
the duplication.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:33 -06:00
Eric W. Biederman
b68a68d3dc signal: Move addr_lsb into the _sigfault union for clarity
The addr_lsb fields is only valid and available when the
signal is SIGBUS and the si_code is BUS_MCEERR_AR or BUS_MCEERR_AO.
Document this with a comment and place the field in the _sigfault union
to make this clear.

All of the fields stay in the same physical location so both the old
and new definitions of struct siginfo will continue to work.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:32 -06:00
Al Viro
b713da69e4 signal: unify compat_siginfo_t
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
      Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
      linux/compat.h

      CONFIG_X86_X32 is set when the user requests X32 support.

      CONFIG_X86_X32_ABI is set when the user requests X32 support
      and the tool-chain has X32 allowing X32 support to be built.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-01-15 17:40:31 -06:00
Wolfram Sang
7d2c17f021 i2c: rcar: implement bus recovery
We can force levels of SCL and SDA, so we can use that for bus recovery.
Note that we cannot read SDA back, because we will only get the internal
state of the bus free detection.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-01-16 00:04:48 +01:00
Wolfram Sang
2806e6ad77 i2c: send STOP after successful bus recovery
If we managed to get a client release SDA again, send a STOP afterwards
to make sure we have a consistent state on the bus again.

Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-01-16 00:04:42 +01:00
Wolfram Sang
72b08fcc15 i2c: ensure SDA is released in recovery if SDA is controllable
If we have a function to control SDA, we should ensure that SDA is not
held down by us. So, release the GPIO in this case.

Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-01-16 00:04:30 +01:00
Wolfram Sang
8092178ffe i2c: add 'set_sda' to bus_recovery_info
This will be needed when we want to create STOP conditions, too, later.
Create the needed fields and populate them for the GPIO case if the GPIO
is set to output.

Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-01-16 00:04:19 +01:00
Wolfram Sang
6c92204e44 i2c: add identifier in declarations for i2c_bus_recovery
No reason to have them undefined, so let's add them.

Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-01-16 00:04:03 +01:00
Wolfram Sang
766a4f27f3 i2c: make kerneldoc about bus recovery more precise
"Used internally" is vague. What it actually means is that those fields
are populated by the core if valid GPIOs are provided. Change the
comments to reflect that.

Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-01-16 00:04:02 +01:00
Xiongfeng Wang
979a459c83 IB/cma: use strlcpy() instead of strncpy()
gcc-8 reports

drivers/infiniband/core/cma_configfs.c: In function 'make_cma_dev':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 64 equals destination size [-Wstringop-truncation]

We need to use strlcpy() to make sure the string is nul-terminated.

Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
a6532e7139 RDMA/core: Clarify rdma_ah_find_type
iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
ports and for clarity it shouldn't have a special test for iWarp.

This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
when wrongly called on an iWarp port.

Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Bodong Wang
cd2a6e7d38 IB/core: Fix ib_wc structure size to remain in 64 bytes boundary
The change of slid from u16 to u32 results in sizeof(struct ib_wc)
cross 64B boundary, which causes more cache misses. This patch
rearranges the fields and remain the size to 64B.

Pahole output before this change:

struct ib_wc {
        union {
                u64                wr_id;                /*           8 */
                struct ib_cqe *    wr_cqe;               /*           8 */
        };                                               /*     0     8 */
        enum ib_wc_status          status;               /*     8     4 */
        enum ib_wc_opcode          opcode;               /*    12     4 */
        u32                        vendor_err;           /*    16     4 */
        u32                        byte_len;             /*    20     4 */
        struct ib_qp *             qp;                   /*    24     8 */
        union {
                __be32             imm_data;             /*           4 */
                u32                invalidate_rkey;      /*           4 */
        } ex;                                            /*    32     4 */
        u32                        src_qp;               /*    36     4 */
        int                        wc_flags;             /*    40     4 */
        u16                        pkey_index;           /*    44     2 */

        /* XXX 2 bytes hole, try to pack */

        u32                        slid;                 /*    48     4 */
        u8                         sl;                   /*    52     1 */
        u8                         dlid_path_bits;       /*    53     1 */
        u8                         port_num;             /*    54     1 */
        u8                         smac[6];              /*    55     6 */

        /* XXX 1 byte hole, try to pack */

        u16                        vlan_id;              /*    62     2 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u8                         network_hdr_type;     /*    64     1 */

        /* size: 72, cachelines: 2, members: 17 */
        /* sum members: 62, holes: 2, sum holes: 3 */
        /* padding: 7 */
        /* last cacheline: 8 bytes */
};

Pahole output after this change:

struct ib_wc {
        union {
                u64                wr_id;                /*           8 */
                struct ib_cqe *    wr_cqe;               /*           8 */
        };                                               /*     0     8 */
        enum ib_wc_status          status;               /*     8     4 */
        enum ib_wc_opcode          opcode;               /*    12     4 */
        u32                        vendor_err;           /*    16     4 */
        u32                        byte_len;             /*    20     4 */
        struct ib_qp *             qp;                   /*    24     8 */
        union {
                __be32             imm_data;             /*           4 */
                u32                invalidate_rkey;      /*           4 */
        } ex;                                            /*    32     4 */
        u32                        src_qp;               /*    36     4 */
        u32                        slid;                 /*    40     4 */
        int                        wc_flags;             /*    44     4 */
        u16                        pkey_index;           /*    48     2 */
        u8                         sl;                   /*    50     1 */
        u8                         dlid_path_bits;       /*    51     1 */
        u8                         port_num;             /*    52     1 */
        u8                         smac[6];              /*    53     6 */

        /* XXX 1 byte hole, try to pack */

        u16                        vlan_id;              /*    60     2 */
        u8                         network_hdr_type;     /*    62     1 */

        /* size: 64, cachelines: 1, members: 17 */
        /* sum members: 62, holes: 1, sum holes: 1 */
        /* padding: 1 */
};

Cc: <stable@vger.kernel.org> # v4.13
Fixes: 7db20ecd1d ("IB/core: Change wc.slid from 16 to 32 bits")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Jack Morgenstein
852f692759 IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH ports
Allocating steerable UD QPs depends on having at least one IB port,
while releasing those QPs does not.

As a result, when there are only ETH ports, the IB (RoCE) driver
requests releasing a qp range whose base qp is zero, with
qp count zero.

When SR-IOV is enabled, and the VF driver is running on a VM over
a hypervisor which treats such qp release calls as errors
(rather than NOPs), we see lines in the VM message log like:

 mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0

Fix this by adding a check for a zero count in mlx4_release_qp_range()
(which thus treats releasing 0 qps as a nop), and eliminating the
check for device managed flow steering when releasing steerable UD QPs.
(Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it
remains NULL when steerable UD QPs are not allocated).

Cc: <stable@vger.kernel.org>
Fixes: 4196670be7 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Jason Gunthorpe
7bed7ebcb7 RDMA/qedr: Fix endian problems around imm_data
The double swap matches what user space rdma-core does to imm_data.

wc->imm_data is not used in the kernel so this change has no practical
impact.

Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Jason Gunthorpe
ccb8a29e7d RDMA/hns: Fix endian problems around imm_data and rkey
This matches the changes made recently to the userspace hns
driver when it was made sparse clean.

See rdma-core commit bffd380cfe56 ("libhns: Make the provider sparse
clean")

wc->imm_data is not used in the kernel so this change has no practical
impact.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Jason Gunthorpe
c966ea12c0 RDMA: Mark imm_data as be32 in the verbs uapi header
This matches what the userspace copy of this header has been doing
for a while. imm_data is an opaque 4 byte array carried over the network,
and invalidate_rkey is in CPU byte order.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
a6753c4d62 IB/core: Limit DMAC resolution to RoCE Connected QPs
Resolving DMAC for RoCE is applicable to only Connected mode QPs.
So resolve DMAC for only for Connected mode QPs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
f2290d6d52 IB/core: Attempt DMAC resolution for only RoCE
Instead of returning 0 (success) for RoCE scenarios where DMAC should
not be resolved, avoid such attempt and make code consistent with
ib_create_user_ah().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
b96ac05a87 IB/core: Limit DMAC resolution to userspace QPs
Currently ah_attr is initialized by the ib_cm layer for rdma_cm
based applications. For RoCE transport ah_attr.roce.dmac is already
initialized by ib_cm, rdma_cm either from wc, path record, route
resolve, explicit path record setting depending on active or passive
side QP. Therefore avoid resolving DMAC for QP of kernel consumers.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Parav Pandit
b2bedfb395 IB/core: Perform modify QP on real one
Currently qp->port stores the port number whenever IB_QP_PORT
QP attribute mask is set (during QP state transition to INIT state).
This port number should be stored for the real QP when XRC target QP
is used.

Follow the ib_modify_qp() implementation and hide the access to ->real_qp.

Fixes: a512c2fbef ("IB/core: Introduce modify QP operation with udata")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Tomasz Majchrzak
1532d9e87e raid5-ppl: PPL support for disks with write-back cache enabled
In order to provide data consistency with PPL for disks with write-back
cache enabled all data has to be flushed to disks before next PPL
entry. The disks to be flushed are marked in the bitmap. It's modified
under a mutex and it's only read after PPL io unit is submitted.

A limitation of 64 disks in the array has been introduced to keep data
structures and implementation simple. RAID5 arrays with so many disks are
not likely due to high risk of multiple disks failure. Such restriction
should not be a real life limitation.

With write-back cache disabled next PPL entry is submitted when data write
for current one completes. Data flush defers next log submission so trigger
it when there are no stripes for handling found.

As PPL assures all data is flushed to disk at request completion, just
acknowledge flush request when PPL is enabled.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
2018-01-15 14:29:42 -08:00
Leon Romanovsky
ae59c3f0b6 RDMA/mlx5: Fix out-of-bound access while querying AH
The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.

==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409

CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xe9/0x18f
 print_address_description+0xa2/0x350
 kasan_report+0x3a5/0x400
 to_rdma_ah_attr+0xa8/0x3b0
 mlx5_ib_query_qp+0xd35/0x1330
 ib_query_qp+0x8a/0xb0
 ib_uverbs_query_qp+0x237/0x7f0
 ib_uverbs_write+0x617/0xd80
 __vfs_write+0xf7/0x500
 vfs_write+0x149/0x310
 SyS_write+0xca/0x190
 entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560

Allocated by task 1:
 __kmalloc+0x3f9/0x430
 alloc_mad_private+0x25/0x50
 ib_mad_post_receive_mads+0x204/0xa60
 ib_mad_init_device+0xa59/0x1020
 ib_register_device+0x83a/0xbc0
 mlx5_ib_add+0x50e/0x5c0
 mlx5_add_device+0x142/0x410
 mlx5_register_interface+0x18f/0x210
 mlx5_ib_init+0x56/0x63
 do_one_initcall+0x15b/0x270
 kernel_init_freeable+0x2d8/0x3d0
 kernel_init+0x14/0x190
 ret_from_fork+0x24/0x30

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff880019ae2000
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
 512-byte region [ffff880019ae2000, ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                          ^
 ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Cc: <stable@vger.kernel.org>
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 14:19:55 -07:00
David S. Miller
79d891c1bb linux-can-next-for-4.16-20180105
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlpPT5ATHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAe/sog7Dgc9tyZB/wNk7hfmWT7qMSq4nB1/l4DvlCVtQR+
 7t7jLltd2ld1bqFr62S1/NExWbgm9GXS25wHgLQQn8I0jwCyuFb8K+VIe/+t9vSu
 PXOihUlIXCqpJwI9FtvGb/jmIbHV1JbnGv1b/J1q34FzhThsXN3DPX5BI5+T+Hy4
 9hnHuYtcveyGlU08RsePyc6WfCzBJafR1YpJYSSsIxmtT6Db0SyRSZjY4MFzv9eA
 mV+wvSpvepiw7tDN9XhSdNQJR9HAh/AXkYRgU448BysqhR5tK5oq8QAjsJK2Usy7
 X1RY/M32fn1QdcwfWEWw5xB9ZblKMnxRzB3vmGLkyvIuPnP/JGQoq5sW
 =BrhI
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.16-20180105' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-12-01,Re: pull-request: can-next

this is a pull request of 7 patches for net-next/master.

All patches are by me. Patch 6 is for the "can_raw" protocol and add
error checking to the bind() function. All other patches clean up the
coding style and remove unused parameters in various CAN drivers and
infrastructure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15 16:13:34 -05:00