Commit graph

58726 commits

Author SHA1 Message Date
Alexei Starovoitov
cc8b0b92a1 bpf: introduce function calls (function boundaries)
Allow arbitrary function calls from bpf function to another bpf function.

Since the beginning of bpf all bpf programs were represented as a single function
and program authors were forced to use always_inline for all functions
in their C code. That was causing llvm to unnecessary inflate the code size
and forcing developers to move code to header files with little code reuse.

With a bit of additional complexity teach verifier to recognize
arbitrary function calls from one bpf function to another as long as
all of functions are presented to the verifier as a single bpf program.
New program layout:
r6 = r1    // some code
..
r1 = ..    // arg1
r2 = ..    // arg2
call pc+1  // function call pc-relative
exit
.. = r1    // access arg1
.. = r2    // access arg2
..
call pc+20 // second level of function call
...

It allows for better optimized code and finally allows to introduce
the core bpf libraries that can be reused in different projects,
since programs are no longer limited by single elf file.
With function calls bpf can be compiled into multiple .o files.

This patch is the first step. It detects programs that contain
multiple functions and checks that calls between them are valid.
It splits the sequence of bpf instructions (one program) into a set
of bpf functions that call each other. Calls to only known
functions are allowed. In the future the verifier may allow
calls to unresolved functions and will do dynamic linking.
This logic supports statically linked bpf functions only.

Such function boundary detection could have been done as part of
control flow graph building in check_cfg(), but it's cleaner to
separate function boundary detection vs control flow checks within
a subprogram (function) into logically indepedent steps.
Follow up patches may split check_cfg() further, but not check_subprogs().

Only allow bpf-to-bpf calls for root only and for non-hw-offloaded programs.
These restrictions can be relaxed in the future.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-17 20:34:35 +01:00
Yonghong Song
f4e2298e63 bpf/tracing: fix kernel/events/core.c compilation error
Commit f371b304f1 ("bpf/tracing: allow user space to
query prog array on the same tp") introduced a perf
ioctl command to query prog array attached to the
same perf tracepoint. The commit introduced a
compilation error under certain config conditions, e.g.,
  (1). CONFIG_BPF_SYSCALL is not defined, or
  (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
       nor CONFIG_KPROBE_EVENTS is defined.

Error message:
  kernel/events/core.o: In function `perf_ioctl':
  core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'

This patch fixed this error by guarding the real definition under
CONFIG_BPF_EVENTS and provided static inline dummy function
if CONFIG_BPF_EVENTS was not defined.
It renamed the function from bpf_event_query_prog_array to
perf_event_query_prog_array and moved the definition from linux/bpf.h
to linux/trace_events.h so the definition is in proximity to
other prog_array related functions.

Fixes: f371b304f1 ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-13 22:44:10 +01:00
Josef Bacik
9802d86585 bpf: add a bpf_override_function helper
Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12 09:02:34 -08:00
Josef Bacik
92ace9991d add infrastructure for tagging functions as error injectable
Using BPF we can override kprob'ed functions and return arbitrary
values.  Obviously this can be a bit unsafe, so make this feature opt-in
for functions.  Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
order to give BPF access to that function for error injection purposes.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12 08:56:26 -08:00
Yonghong Song
f371b304f1 bpf/tracing: allow user space to query prog array on the same tp
Commit e87c6bc385 ("bpf: permit multiple bpf attachments
for a single perf event") added support to attach multiple
bpf programs to a single perf event.
Although this provides flexibility, users may want to know
what other bpf programs attached to the same tp interface.
Besides getting visibility for the underlying bpf system,
such information may also help consolidate multiple bpf programs,
understand potential performance issues due to a large array,
and debug (e.g., one bpf program which overwrites return code
may impact subsequent program results).

Commit 2541517c32 ("tracing, perf: Implement BPF programs
attached to kprobes") utilized the existing perf ioctl
interface and added the command PERF_EVENT_IOC_SET_BPF
to attach a bpf program to a tracepoint. This patch adds a new
ioctl command, given a perf event fd, to query the bpf program
array attached to the same perf tracepoint event.

The new uapi ioctl command:
  PERF_EVENT_IOC_QUERY_BPF

The new uapi/linux/perf_event.h structure:
  struct perf_event_query_bpf {
       __u32	ids_len;
       __u32	prog_cnt;
       __u32	ids[0];
  };

User space provides buffer "ids" for kernel to copy to.
When returning from the kernel, the number of available
programs in the array is set in "prog_cnt".

The usage:
  struct perf_event_query_bpf *query =
    malloc(sizeof(*query) + sizeof(u32) * ids_len);
  query.ids_len = ids_len;
  err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query);
  if (err == 0) {
    /* query.prog_cnt is the number of available progs,
     * number of progs in ids: (ids_len == 0) ? 0 : query.prog_cnt
     */
  } else if (errno == ENOSPC) {
    /* query.ids_len number of progs copied,
     * query.prog_cnt is the number of available progs
     */
  } else {
      /* other errors */
  }

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12 08:46:40 -08:00
David S. Miller
62cd277039 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2017-12-07

The following pull-request contains BPF updates for your net-next tree.

The main changes are:

1) Detailed documentation of BPF development process from Daniel.

2) Addition of is_fullsock, snd_cwnd and srtt_us fields to bpf_sock_ops
   from Lawrence.

3) Minor follow up for bpf_skb_set_tunnel_key() from William.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08 10:48:25 -05:00
Sergei Shtylyov
bafbdd527d phylib: Add device reset GPIO support
The PHY devices sometimes do have their reset signal (maybe even power
supply?) tied to some GPIO and sometimes it also does happen that a boot
loader does not leave it deasserted. So far this issue has been attacked
from (as I believe) a wrong angle: by teaching the MAC driver to manipulate
the GPIO in question; that solution, when applied to the device trees, led
to adding the PHY reset GPIO properties to the MAC device node, with one
exception: Cadence MACB driver which could handle the "reset-gpios" prop
in a PHY device subnode. I believe that the correct approach is to teach
the 'phylib' to get the MDIO device reset GPIO from the device tree node
corresponding to this device -- which this patch is doing...

Note that I had to modify the AT803x PHY driver as it would stop working
otherwise -- it made use of the reset GPIO for its own purposes...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Rob Herring <robh@kernel.org>
[geert: Propagate actual errors from fwnode_get_named_gpiod()]
[geert: Avoid destroying initial setup]
[geert: Consolidate GPIO descriptor acquiring code]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 12:51:19 -05:00
Simon Horman
62b32379fd flow_dissector: dissect tunnel info outside __skb_flow_dissect()
Move dissection of tunnel info to outside of the main flow dissection
function, __skb_flow_dissect(). The sole user of this feature, the flower
classifier, is updated to call tunnel info dissection directly, using
skb_flow_dissect_tunnel_info().

This results in a slightly less complex implementation of
__skb_flow_dissect(), in particular removing logic from that call path
which is not used by the majority of users. The expense of this is borne by
the flower classifier which now has to make an extra call for tunnel info
dissection.

This patch should not result in any behavioural change.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 12:09:18 -05:00
Russell King
8fa7b9b6af phylink: convert to fwnode
Convert phylink to fwnode, switching phylink_create() from taking a
device_node to taking a fwnode_handle. This will allow other firmware
systems to take advantage of sfp/phylink support.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
c19bb00070 sfp: convert to fwnode
Convert sfp-bus to use fwnode rather than device_node internally, so
we can support more than just device tree firmware.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
0a6fcd3fc1 sfp: add documentation for kernel APIs
Add kernel-doc documentation for sfp kernel APIs, and link it into the
networking kapi documentation under "Network device support".

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
8796c8923d phylink: add documentation for kernel APIs
Add kernel-doc documentation for phylink kernel APIs, and link it into
the networking kapi documentation under "Network device support".

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:19 -05:00
Russell King
939eae25d9 phylink: remove phylink_init_eee()
phylink_init_eee() serves no purpose, remove it.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:18 -05:00
Russell King
86a362c49f phylink: get rid of separate Cisco SGMII and 802.3z modes
Since the handling of SGMII and 802.3z is now the same, combine the
MLO_AN_xxx constants into a single MLO_AN_INBAND, and use the PHY
interface mode to distinguish between Cisco SGMII and 802.3z.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:18 -05:00
Russell King
365c1e64ad phy: add phy_interface_mode_is_8023z() helper
Add and use phy_interface_mode_is_8023z() helper to identify the
interface modes that use 802.3z negotiation.  Use it in phylink's
phylink_mac_an_restart().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 11:16:17 -05:00
David S. Miller
7cda4cee13 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 10:44:19 -05:00
Lawrence Brakmo
f19397a5c6 bpf: Add access to snd_cwnd and others in sock_ops
Adds read access to snd_cwnd and srtt_us fields of tcp_sock. Since these
fields are only valid if the socket associated with the sock_ops program
call is a full socket, the field is_fullsock is also added to the
bpf_sock_ops struct. If the socket is not a full socket, reading these
fields returns 0.

Note that in most cases it will not be necessary to check is_fullsock to
know if there is a full socket. The context of the call, as specified by
the 'op' field, can sometimes determine whether there is a full socket.

The struct bpf_sock_ops has the following fields added:

  __u32 is_fullsock;      /* Some TCP fields are only valid if
                           * there is a full socket. If not, the
                           * fields read as zero.
			   */
  __u32 snd_cwnd;
  __u32 srtt_us;          /* Averaged RTT << 3 in usecs */

There is a new macro, SOCK_OPS_GET_TCP32(NAME), to make it easier to add
read access to more 32 bit tcp_sock fields.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-05 14:55:32 +01:00
Linus Torvalds
236fa078c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various TCP control block fixes, including one that crashes with
    SELinux, from David Ahern and Eric Dumazet.

 2) Fix ACK generation in rxrpc, from David Howells.

 3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
    from Gao Feng.

 4) SIT configuration doesn't take on the frag_off ipv4 field
    configuration properly, fix from Hangbin Liu.

 5) TSO can fail after device down/up on stmmac, fix from Lars Persson.

 6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.

 7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
    performance problems). From Wei Xu.

 8) mvpps's TX descriptors were not zero initialized, from Yan Markman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
  tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
  rxrpc: Fix the MAINTAINERS record
  rxrpc: Use correct netns source in rxrpc_release_sock()
  liquidio: fix incorrect indentation of assignment statement
  stmmac: reset last TSO segment size after device open
  ipvlan: Add the skb->mark as flow4's member to lookup route
  s390/qeth: build max size GSO skbs on L2 devices
  s390/qeth: fix GSO throughput regression
  s390/qeth: fix thinko in IPv4 multicast address tracking
  tap: free skb if flags error
  tun: free skb in early errors
  vhost: fix skb leak in handle_rx()
  bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
  bnxt_en: fix dst/src fid for vxlan encap/decap actions
  bnxt_en: wildcard smac while creating tunnel decap filter
  bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
  phylink: ensure we take the link down when phylink_stop() is called
  sfp: warn about modules requiring address change sequence
  sfp: improve RX_LOS handling
  ...
2017-12-04 11:14:46 -08:00
David S. Miller
d671965b54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-03

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Addition of a software model for BPF offloads in order to ease
   testing code changes in that area and make semantics more clear.
   This is implemented in a new driver called netdevsim, which can
   later also be extended for other offloads. SR-IOV support is added
   as well to netdevsim. BPF kernel selftests for offloading are
   added so we can track basic functionality as well as exercising
   all corner cases around BPF offloading, from Jakub.

2) Today drivers have to drop the reference on BPF progs they hold
   due to XDP on device teardown themselves. Change this in order
   to make XDP handling inside the drivers less error prone, and
   move disabling XDP to the core instead, also from Jakub.

3) Misc set of BPF verifier improvements and cleanups as preparatory
   work for upcoming BPF-to-BPF calls. Among others, this set also
   improves liveness marking such that pruning can be slightly more
   effective. Register and stack liveness information is now included
   in the verifier log as well, from Alexei.

4) nfp JIT improvements in order to identify load/store sequences in
   the BPF prog e.g. coming from memcpy lowering and optimizing them
   through the NPU's command push pull (CPP) instruction, from Jiong.

5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
   bpf_prog_attach() magic values and replacing them with actual proper
   attach flag instead, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 12:07:10 -05:00
Heiner Kallweit
a6d1642dab net: phy: core: remove now uneeded disabling of interrupts
After commits c974bdbc3e "net: phy: Use threaded IRQ, to allow IRQ from
sleeping devices" and 664fcf123a "net: phy: Threaded interrupts allow
some simplification" all relevant code pieces run in process context
anyway and I don't think we need the disabling of interrupts any longer.

Interestingly enough, latter commit already removed the comment
explaining why interrupts need to be temporarily disabled.

On my system phy interrupt mode works fine with this patch.
However I may miss something, especially in the context of shared phy
interrupts, therefore I'd appreciate if more people could test this.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 10:23:41 -05:00
Stephen Hemminger
0487426fed vmbus: make hv_get_ringbuffer_availbytes local
The last use of hv_get_ringbuffer_availbytes in drivers is now
gone. Only used by the debug info routine so make it static. Also, add
READ_ONCE() to avoid any possible issues with potentially volatile
index values.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 10:10:02 -05:00
Jakub Kicinski
92f0292b35 net: xdp: report flags program was installed with on query
Some drivers enforce that flags on program replacement and
removal must match the flags passed on install.  This leaves
the possibility open to enable simultaneous loading
of XDP programs both to HW and DRV.

Allow such drivers to report the flags back to the stack.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-03 00:27:57 +01:00
Jakub Kicinski
118b4aa25d net: xdp: avoid output parameters when querying XDP prog
The output parameters will get unwieldy if we want to add more
information about the program.  Simply pass the entire
struct netdev_bpf in.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-03 00:27:57 +01:00
Linus Torvalds
e1ba1c99da RISC-V Cleanups and ABI Fixes for 4.15-rc2
This tag contains a handful of small cleanups that are a result of
 feedback that didn't make it into our original patch set, either because
 the feedback hadn't been given yet, I missed the original emails, or
 we weren't ready to submit the changes yet.
 
 I've been maintaining the various cleanup patch sets I have as their own
 branches, which I then merged together and signed.  Each merge commit
 has a short summary of the changes, and each branch is based on your
 latest tag (4.15-rc1, in this case).  If this isn't the right way to do
 this then feel free to suggest something else, but it seems sane to me.
 
 Here's a short summary of the changes, roughly in order of how
 interesting they are.
 
 * libgcc.h has been moved from include/lib, where it's the only member,
   to include/linux.  This is meant to avoid tab completion conflicts.
 * VDSO entries for clock_get/gettimeofday/getcpu have been added.  These
   are simple syscalls now, but we want to let glibc use them from the
   start so we can make them faster later.
 * A VDSO entry for instruction cache flushing has been added so
   userspace can flush the instruction cache.
 * The VDSO symbol versions for __vdso_cmpxchg{32,64} have been removed,
   as those VDSO entries don't actually exist.
 * __io_writes has been corrected to respect the given type.
 * A new READ_ONCE in arch_spin_is_locked().
 * __test_and_op_bit_ord() is now actually ordered.
 * Various small fixes throughout the tree to enable allmodconfig to
   build cleanly.
 * Removal of some dead code in our atomic support headers.
 * Improvements to various comments in our atomic support headers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlohyvMTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQWkhD/wO/F8vrwsNMOWR8zxvHdB30KD+FHmr
 X1+X9OqnH8AMd4Woj6pS0ap7g0GCKuLiI/bOTrQVVdTJpmKFaJ9rrwRCJzHq43yt
 feRjKyPAFYlvf6YaIEJ3YHU0t3LO1eK27YyFMg6F8y+bZim6oK2GdyfYF0Xiik3B
 L3NkDPSH4oplTJjUI+tzDZdMsuZKhxpXPnbNQA7YZLepz04jOPGWqFrA1C3gAaVQ
 dj1OkOGTSyQFwia7LrIm2g0J5/mqpjAF0KjdiTsvH6G9x3V0HZYU5Br3kHgauWKc
 YrNEbbDl8EakT5QocPf5F4Z8qpO9Hvxjwe2/z27usPtV9FQOPuDDPOygSPykwNNJ
 bDfv9nIE3W7lN26BaRcV2ivY3r9ZpCEcq+qXIiTm3P/uTVqjMq54NkvHnj4ON1ih
 DJZEgkM9L+rm7c9XDn627FBkmkeEndPJcQ3P/nopb5zGTYTb2HGrUt2nM+KR2vuE
 FdYtA9+ll3OzyFO3OVVjiAlxr8Qnwf2wIWXJXxWpcmmchGJ5NeTSZtiD14pAP5eC
 EDpoWwefvhqRMGdOlgq/fkx4Mrhz27euWXine3ZccprABAf7Hxkb/N5ojIJKT7qW
 mN3HL3PC9P0t/HxQEu0q0NLLsP+X/1yZ5HmDl44Y7N8aeCrIUXaB61gsTt6Oi6Ha
 PMJi5PI6VDDQbA==
 =CCe+
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux

Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt:
 "This contains a handful of small cleanups that are a result of
  feedback that didn't make it into our original patch set, either
  because the feedback hadn't been given yet, I missed the original
  emails, or we weren't ready to submit the changes yet.

  I've been maintaining the various cleanup patch sets I have as their
  own branches, which I then merged together and signed. Each merge
  commit has a short summary of the changes, and each branch is based on
  your latest tag (4.15-rc1, in this case). If this isn't the right way
  to do this then feel free to suggest something else, but it seems sane
  to me.

  Here's a short summary of the changes, roughly in order of how
  interesting they are.

   - libgcc.h has been moved from include/lib, where it's the only
     member, to include/linux. This is meant to avoid tab completion
     conflicts.

   - VDSO entries for clock_get/gettimeofday/getcpu have been added.
     These are simple syscalls now, but we want to let glibc use them
     from the start so we can make them faster later.

   - A VDSO entry for instruction cache flushing has been added so
     userspace can flush the instruction cache.

   - The VDSO symbol versions for __vdso_cmpxchg{32,64} have been
     removed, as those VDSO entries don't actually exist.

   - __io_writes has been corrected to respect the given type.

   - A new READ_ONCE in arch_spin_is_locked().

   - __test_and_op_bit_ord() is now actually ordered.

   - Various small fixes throughout the tree to enable allmodconfig to
     build cleanly.

   - Removal of some dead code in our atomic support headers.

   - Improvements to various comments in our atomic support headers"

* tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits)
  RISC-V: __io_writes should respect the length argument
  move libgcc.h to include/linux
  RISC-V: Clean up an unused include
  RISC-V: Allow userspace to flush the instruction cache
  RISC-V: Flush I$ when making a dirty page executable
  RISC-V: Add missing include
  RISC-V: Use define for get_cycles like other architectures
  RISC-V: Provide stub of setup_profiling_timer()
  RISC-V: Export some expected symbols for modules
  RISC-V: move empty_zero_page definition to C and export it
  RISC-V: io.h: type fixes for warnings
  RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
  RISC-V: use generic serial.h
  RISC-V: remove spin_unlock_wait()
  RISC-V: `sfence.vma` orderes the instruction cache
  RISC-V: Add READ_ONCE in arch_spin_is_locked()
  RISC-V: __test_and_op_bit_ord should be strongly ordered
  RISC-V: Remove smb_mb__{before,after}_spinlock()
  RISC-V: Remove __smp_bp__{before,after}_atomic
  RISC-V: Comment on why {,cmp}xchg is ordered how it is
  ...
2017-12-01 19:39:12 -05:00
Christoph Hellwig
4db2b604c0 move libgcc.h to include/linux
Introducing a new include/lib directory just for this file totally
messes up tab completion for include/linux, which is highly annoying.

Move it to include/linux where we have headers for all kinds of other
lib/ code as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2017-12-01 13:09:40 -08:00
Heiner Kallweit
00fde79532 net: phy: core: use genphy version of callbacks read_status and config_aneg per default
read_status and config_aneg are the only mandatory callbacks and most
of the time the generic implementation is used by drivers.
So make the core fall back to the generic version if a driver doesn't
implement the respective callback.

Also currently the core doesn't seem to verify that drivers implement
the mandatory calls. If a driver doesn't do so we'd just get a NPE.
With this patch this potential issue doesn't exit any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01 15:42:21 -05:00
Linus Torvalds
9e0600f5cf * x86 bugfixes: APIC, nested virtualization, IOAPIC
* PPC bugfix: HPT guests on a POWER9 radix host
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJaICi1AAoJEL/70l94x66DjvEIAIML/e9YX1YrJZi0rsB9cbm0
 Le3o5b3wKxPrlZdnpOZQ2mVWubUQdiHMPGX6BkpgyiJWUchnbj5ql1gUf5S0i3jk
 TOk6nae6DU94xBuboeqZJlmx2VfPY/fqzLWsX3HFHpnzRl4XvXL5o7cWguIxVcVO
 yU6bPgbAXyXSBennLWZxC3aQ2Ojikr3uxZQpUZTAPOW5hFINpCKCpqJBMxsb67wq
 rwI0cJhRl92mHpbe8qeNJhavqY5eviy9iPUaZrOW9P4yw1uqjTAjgsUc1ydiaZSV
 rOHeKBOgVfY/KBaNJKyKySfuL1MJ+DLcQqm9RlGpKNpFIeB0vvSf0gtmmqIAXIk=
 =kh2y
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

 - x86 bugfixes: APIC, nested virtualization, IOAPIC

 - PPC bugfix: HPT guests on a POWER9 radix host

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
  KVM: Let KVM_SET_SIGNAL_MASK work as advertised
  KVM: VMX: Fix vmx->nested freeing when no SMI handler
  KVM: VMX: Fix rflags cache during vCPU reset
  KVM: X86: Fix softlockup when get the current kvmclock
  KVM: lapic: Fixup LDR on load in x2apic
  KVM: lapic: Split out x2apic ldr calculation
  KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts
  KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
  KVM: x86: Fix CPUID function for word 6 (80000001_ECX)
  KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
  KVM: x86: ioapic: Preserve read-only values in the redirection table
  KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  KVM: x86: inject exceptions produced by x86_decode_insn
  KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs
  KVM: x86: fix em_fxstor() sleeping while in atomic
  KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
  KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  ...
2017-11-30 08:15:19 -08:00
Richard Leitner
511e3036e3 net: phy: harmonize phy_id{,_mask} data type
Previously phy_id was u32 and phy_id_mask was unsigned int. As the
phy_id_mask defines the important bits of the phy_id (and is therefore
the same size) these two variables should be the same data type.

Signed-off-by: Richard Leitner <richard.leitner@skidata.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30 09:26:31 -05:00
Geert Uytterhoeven
f8821f96ae skbuff: Grammar s/are can/can/, s/change/changes/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30 09:15:53 -05:00
Linus Torvalds
a0908a1b7d Merge branch 'akpm' (patches from Andrew)
Mergr misc fixes from Andrew Morton:
 "28 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
  fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()
  mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
  autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
  autofs: revert "autofs: take more care to not update last_used on path walk"
  fs/fat/inode.c: fix sb_rdonly() change
  mm, memcg: fix mem_cgroup_swapout() for THPs
  mm: migrate: fix an incorrect call of prep_transhuge_page()
  kmemleak: add scheduling point to kmemleak_scan()
  scripts/bloat-o-meter: don't fail with division by 0
  fs/mbcache.c: make count_objects() more robust
  Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"
  mm/madvise.c: fix madvise() infinite loop under special circumstances
  exec: avoid RLIMIT_STACK races with prlimit()
  IB/core: disable memory registration of filesystem-dax vmas
  v4l2: disable filesystem-dax mapping support
  mm: fail get_vaddr_frames() for filesystem-dax mappings
  mm: introduce get_user_pages_longterm
  device-dax: implement ->split() to catch invalid munmap attempts
  mm, hugetlbfs: introduce ->split() to vm_operations_struct
  scripts/faddr2line: extend usage on generic arch
  ...
2017-11-29 19:12:44 -08:00
Ian Kent
5d38f049ce autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
Commit 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT
flag but introduced a semantic change.

In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().

This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.

This has caused more problems than I expected so reverting the change is
needed.

In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications.  So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.

Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Neil Brown <neilb@suse.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Cc: <stable@vger.kernel.org>	[4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:43 -08:00
Zi Yan
40a899ed16 mm: migrate: fix an incorrect call of prep_transhuge_page()
In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
non-THP pages for migration, when THP is on but THP migration is not
enabled.  This leads to a bad state of target pages for migration.

By inspecting the code, if called on a non-THP, prep_transhuge_page()
will

 1) change the value of the mapping of (page + 2), since it is used for
    THP deferred list;

 2) change the lru value of (page + 1), since it is used for THP's dtor.

Both can lead to data corruption of these two pages.

Andrea said:
 "Pragmatically and from the point of view of the memory_hotplug subsys,
  the effect is a kernel crash when pages are being migrated during a
  memory hot remove offline and migration target pages are found in a
  bad state"

This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.

Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
Fixes: 8135d8926c ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reported-by: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:43 -08:00
Dan Williams
2bb6d28370 mm: introduce get_user_pages_longterm
Patch series "introduce get_user_pages_longterm()", v2.

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely.  This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future.  This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel.  The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

This patch (of 4):

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas.  Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

[akpm@linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Dan Williams
31383c6865 mm, hugetlbfs: introduce ->split() to vm_operations_struct
Patch series "device-dax: fix unaligned munmap handling"

When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint.  Instead, these patches introduce a new ->split() vm
operation.

This patch (of 2):

The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units.  Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.

Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee4107924 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Dan Williams
1501899a89 mm: fix device-dax pud write-faults triggered by get_user_pages()
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable.  In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().

    kernel BUG at ./include/linux/hugetlb.h:244!
    [..]
    RIP: 0010:follow_devmap_pud+0x482/0x490
    [..]
    Call Trace:
     follow_page_mask+0x28c/0x6e0
     __get_user_pages+0xe4/0x6c0
     get_user_pages_unlocked+0x130/0x1b0
     get_user_pages_fast+0x89/0xb0
     iov_iter_get_pages_alloc+0x114/0x4a0
     nfs_direct_read_schedule_iovec+0xd2/0x350
     ? nfs_start_io_direct+0x63/0x70
     nfs_file_direct_read+0x1e0/0x250
     nfs_file_read+0x90/0xc0

For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write.  However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted.  Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it.

Note that the generic 'access_permitted' helper fallback is the simple
_PAGE_RW check on architectures that do not define the
'access_permitted' helper(s).

[dan.j.williams@intel.com: fix powerpc compile error]
  Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a00cc7d9dd ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>	[x86]
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Linus Torvalds
b915176102 Highlights:
- Fixes from Trond for some races in the NFSv4 state code.
 	- Fix from Naofumi Honda for a typo in the blocked lock
 	  notificiation code.
 	- Fixes from Vasily Averin for some problems starting and
 	  stopping lockd especially in network namespaces.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaHxq/AAoJECebzXlCjuG+QOYP/jIa9dZnbau3owP8RJJv1+VI
 RSMYAZkIjy1vixn/BymZo55R7+23BhdLe8CDsknXWo85mIj61kpV1bwF2lVc7FWm
 +Pt93DkUsUBEjf+/3/58TLknYs5o7UhsEw2Qjg+D3BkO+z95biNa0hUBle2+Nnwi
 vBQLGqdlCIFZxuzEo7yUlGdKTyefzab4bocgRnh/5JMs+bHzPDD74W1GrGB1oEKX
 VSGzq0d7LLe23yIJwgP1eaa0tQr1/WsxlL8xD5Im6mXcN9aYa/7VZhg/oCluy8ac
 v95IBjQUkFqvw5OjDgSX5ZgKzokmRxLjnaUX2JT/sLCk1WxdJhUomw8qb1AJLvav
 e6xce1M+dR5VihTrD/cEe0xB7CXKXywPd6pXQBosAMInhS79aU8brIPCDtLvNwCw
 XvtNybbqC0Go89YMt2zuRfBkV7W3FmM1h+h4PWVl+iCl/7+AYIXD1qeX/FuIjnk6
 SMEdtTb/cqECuh55YefEljUzY1vKYgquxCNCvcbSrMtVSOZYXXufheY+fBjf5DBb
 Bnsd1FiPtVkwFwX8bTbGlOOub1Ryl9SD4Ae0Ynu2FNYSFL8BVXTHkTHm9UHl83s5
 pr0T6bKlpg+YzZrHVh2Herr9Ze89C9uM7oCU1M062vk4+Cg65paqNTnWVtflYPhG
 y9p0hsY5csyzm0SZ/1Ui
 =pR9D
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "I screwed up my merge window pull request; I only sent half of what I
  meant to.

  There were no new features, just bugfixes of various importance and
  some very minor cleanup, so I think it's all still appropriate for
  -rc2.

  Highlights:

   - Fixes from Trond for some races in the NFSv4 state code.

   - Fix from Naofumi Honda for a typo in the blocked lock notificiation
     code

   - Fixes from Vasily Averin for some problems starting and stopping
     lockd especially in network namespaces"

* tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits)
  lockd: fix "list_add double add" caused by legacy signal interface
  nlm_shutdown_hosts_net() cleanup
  race of nfsd inetaddr notifiers vs nn->nfsd_serv change
  race of lockd inetaddr notifiers vs nlmsvc_rqst change
  SUNRPC: make cache_detail structures const
  NFSD: make cache_detail structures const
  sunrpc: make the function arg as const
  nfsd: check for use of the closed special stateid
  nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
  lockd: lost rollback of set_grace_period() in lockd_down_net()
  lockd: added cleanup checks in exit_net hook
  grace: replace BUG_ON by WARN_ONCE in exit_net hook
  nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
  lockd: remove net pointer from messages
  nfsd: remove net pointer from debug messages
  nfsd: Fix races with check_stateid_generation()
  nfsd: Ensure we check stateid validity in the seqid operation checks
  nfsd: Fix race in lock stateid creation
  nfsd4: move find_lock_stateid
  nfsd: Ensure we don't recognise lock stateids after freeing them
  ...
2017-11-29 14:49:26 -08:00
Linus Torvalds
668533dc07 kallsyms: take advantage of the new '%px' format
The conditional kallsym hex printing used a special fixed-width '%lx'
output (KALLSYM_FMT) in preparation for the hashing of %p, but that
series ended up adding a %px specifier to help with the conversions.

Use it, and avoid the "print pointer as an unsigned long" code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 10:30:13 -08:00
Bhumika Goyal
d34971a65b sunrpc: make the function arg as const
Make the struct cache_detail *tmpl argument of the function
cache_create_net as const as it is only getting passed to kmemup having
the argument as const void *.
Add const to the prototype too.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Linus Torvalds
1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Jan H. Schönherr
20b7035c66 KVM: Let KVM_SET_SIGNAL_MASK work as advertised
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".

This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.

Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:53:47 +01:00
Linus Torvalds
dec0029a59 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Glexiner:

 - unbreak the irq trigger type check for legacy platforms

 - a handful fixes for ARM GIC v3/4 interrupt controllers

 - a few trivial fixes all over the place

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/matrix: Make - vs ?: Precedence explicit
  irqchip/imgpdc: Use resource_size function on resource object
  irqchip/qcom: Fix u32 comparison with value less than zero
  irqchip/exiu: Fix return value check in exiu_init()
  irqchip/gic-v3-its: Remove artificial dependency on PCI
  irqchip/gic-v4: Add forward definition of struct irq_domain_ops
  irqchip/gic-v3: pr_err() strings should end with newlines
  irqchip/s3c24xx: pr_err() strings should end with newlines
  irqchip/gic-v3: Fix ppi-partitions lookup
  irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails
  genirq: Track whether the trigger type has been set
2017-11-26 14:39:20 -08:00
Linus Torvalds
02fc87b117 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
 - topology enumeration fixes
 - KASAN fix
 - two entry fixes (not yet the big series related to KASLR)
 - remove obsolete code
 - instruction decoder fix
 - better /dev/mem sanity checks, hopefully working better this time
 - pkeys fixes
 - two ACPI fixes
 - 5-level paging related fixes
 - UMIP fixes that should make application visible faults more debuggable
 - boot fix for weird virtualization environment

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/decoder: Add new TEST instruction pattern
  x86/PCI: Remove unused HyperTransport interrupt support
  x86/umip: Fix insn_get_code_seg_params()'s return value
  x86/boot/KASLR: Remove unused variable
  x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
  x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
  x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
  x86/pkeys/selftests: Fix protection keys write() warning
  x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
  x86/mpx/selftests: Fix up weird arrays
  x86/pkeys: Update documentation about availability
  x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
  x86/smpboot: Fix __max_logical_packages estimate
  x86/topology: Avoid wasting 128k for package id array
  perf/x86/intel/uncore: Cache logical pkg id in uncore driver
  x86/acpi: Reduce code duplication in mp_override_legacy_irq()
  x86/acpi: Handle SCI interrupts above legacy space gracefully
  x86/boot: Fix boot failure when SMP MP-table is based at 0
  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
  x86/selftests: Add test for mapping placement for 5-level paging
  ...
2017-11-26 14:11:54 -08:00
Linus Torvalds
6830c8db58 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: a documentation fix, a Sparse warning fix and a debugging
  fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Fix task state recording/printout
  sched/deadline: Don't use dubious signed bitfields
  sched/deadline: Fix the description of runtime accounting in the documentation
2017-11-26 13:43:25 -08:00
Linus Torvalds
fcbc38b1b2 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
 "A handful of objtool fixes, most of them related to making the UAPI
  header-syncing warnings easier to read and easier to act upon"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools/headers: Sync objtool UAPI header
  objtool: Fix cross-build
  objtool: Move kernel headers/code sync check to a script
  objtool: Move synced files to their original relative locations
  objtool: Make unreachable annotation inline asms explicitly volatile
  objtool: Add a comment for the unreachable annotation macros
2017-11-26 13:11:18 -08:00
Linus Torvalds
844056fd74 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:

 - The final conversion of timer wheel timers to timer_setup().

   A few manual conversions and a large coccinelle assisted sweep and
   the removal of the old initialization mechanisms and the related
   code.

 - Remove the now unused VSYSCALL update code

 - Fix permissions of /proc/timer_list. I still need to get rid of that
   file completely

 - Rename a misnomed clocksource function and remove a stale declaration

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  m68k/macboing: Fix missed timer callback assignment
  treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
  timer: Remove redundant __setup_timer*() macros
  timer: Pass function down to initialization routines
  timer: Remove unused data arguments from macros
  timer: Switch callback prototype to take struct timer_list * argument
  timer: Pass timer_list pointer to callbacks unconditionally
  Coccinelle: Remove setup_timer.cocci
  timer: Remove setup_*timer() interface
  timer: Remove init_timer() interface
  treewide: setup_timer() -> timer_setup() (2 field)
  treewide: setup_timer() -> timer_setup()
  treewide: init_timer() -> setup_timer()
  treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
  s390: cmm: Convert timers to use timer_setup()
  lightnvm: Convert timers to use timer_setup()
  drivers/net: cris: Convert timers to use timer_setup()
  drm/vc4: Convert timers to use timer_setup()
  block/laptop_mode: Convert timers to use timer_setup()
  net/atm/mpc: Avoid open-coded assignment of timer callback function
  ...
2017-11-25 08:37:16 -10:00
Linus Torvalds
1d3b78bbc6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho.

 2) bpf offload bug fixes from Jakub Kicinski.

 3) Fix bpf verifier to NOP out code which is dead at run time because
    due to branch pruning the verifier will not explore such
    instructions. From Alexei Starovoitov.

 4) Fix crash when deleting secondary chains in packet scheduler
    classifier. From Roman Kapl.

 5) Fix buffer management bugs in smc, from Ursula Braun.

 6) Fix regression in anycast route handling, from David Ahern.

 7) Fix link settings regression in r8169, from Tobias Jakobi.

 8) Add back enough UFO support so that live migration still works, from
    Willem de Bruijn.

 9) Linearize enough packet data for the full extent to which the ipvlan
    code will inspect the packet headers, from Gao Feng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  ipvlan: Fix insufficient skb linear check for ipv6 icmp
  ipvlan: Fix insufficient skb linear check for arp
  geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
  net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
  net: accept UFO datagrams from tuntap and packet
  net: realtek: r8169: implement set_link_ksettings()
  net: ipv6: Fixup device for anycast routes during copy
  net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
  net/smc: use sk_rcvbuf as start for rmb creation
  ipv6: Do not consider linkdown nexthops during multipath
  net: sched: fix crash when deleting secondary chains
  net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
  bpf: fix branch pruning logic
  bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO
  bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO
  bpf: remove explicit handling of 0 for arg2 in bpf_probe_read
  bpf: introduce ARG_PTR_TO_MEM_OR_NULL
  i40evf: Use smp_rmb rather than read_barrier_depends
  fm10k: Use smp_rmb rather than read_barrier_depends
  igb: Use smp_rmb rather than read_barrier_depends
  ...
2017-11-23 21:18:46 -10:00
James Morris
ce44cd8dfc Keys devel
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWhc/3fSw1s6N8H32AQLJxQ/9Gw5ns9bipLQ5VeXjgQjY6U39lHWD7z0e
 cz1jYsqOGvWXqoHZumK6fB0NorYZ3EEiWTVqNkpTv1p5ZIJe612G3oe5SOTn2gA5
 1X4qb/QMCr12TIv/R40mXuEsBUZZaUvxK5G7L7/Ty8a9iC+Pp3Tr009ThPvhfDMc
 RRP5MHxPghat+jwRcBjMz3ndQCkoTIlR9qichzQcndv5yywASQFxQsHEeG0tK48j
 NvicJawsr+0kZ2xqpRjRRJ/aQ+lMpI3SLsaUJBbIf6IYFs+i++OUkwAv0WYG0RZa
 xGjQBaaSmXPp48akIeModsp3SgNwBFpbTiXJR8hdGYjJNaaMNGD5HGQ539Ij+Wpf
 YHTIsdqw3xfFH/FoHMOesF/h/uMoA1NAMFAy/gGHxRRGNIk0wERHdTpdFUODIx9E
 NJk2fwBYpO+uRntgcmt9F3S9+YBzxACHYNmjvtbvUwjkr/hnl6jincTmlkmR9Fgl
 HYy+RcBb9A19wRYnZ5wDFOyk3sua7iq4ZBq0dbpYtSOtR9q4RtFw9wfsT8OoQMKz
 aBBn8AiV2ak+Qu00MFCyqj3jUoWa/8qzy6/57nWNsqoJTBMD0uI5UY4liTBGeq3g
 m02uVqtZgwjeBxUmMh8IxZDntVRZChGgvmifwOzb/BUV8oaOiy4aiQjONSdQBeP6
 j9SDBLRH/PY=
 =wuVV
 -----END PGP SIGNATURE-----

Merge tag 'keys-next-20171123' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next-keys

Merge keys subsystem changes from David Howells, for v4.15.
2017-11-24 11:54:11 +11:00
Bjorn Helgaas
fd2fa6c18b x86/PCI: Remove unused HyperTransport interrupt support
There are no in-tree callers of ht_create_irq(), the driver interface for
HyperTransport interrupts, left.  Remove the unused entry point and all the
supporting code.

See 8b955b0ddd ("[PATCH] Initial generic hypertransport interrupt
support").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-pci@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-23 20:18:18 +01:00
David S. Miller
e4be7baba8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-11-23

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Several BPF offloading fixes, from Jakub. Among others:

    - Limit offload to cls_bpf and XDP program types only.
    - Move device validation into the driver and don't make
      any assumptions about the device in the classifier due
      to shared blocks semantics.
    - Don't pass offloaded XDP program into the driver when
      it should be run in native XDP instead. Offloaded ones
      are not JITed for the host in such cases.
    - Don't destroy device offload state when moved to
      another namespace.
    - Revert dumping offload info into user space for now,
      since ifindex alone is not sufficient. This will be
      redone properly for bpf-next tree.

2) Fix test_verifier to avoid using bpf_probe_write_user()
   helper in test cases, since it's dumping a warning into
   kernel log which may confuse users when only running tests.
   Switch to use bpf_trace_printk() instead, from Yonghong.

3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
   before it becomes uabi, from Gianluca. More specifically:

    - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
      by bpf_csum_diff(), where the argument is either a
      valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
      then enforces a valid pointer in case of non-0 size
      or a valid pointer or NULL in case of size 0. Given
      that, the semantics for ARG_PTR_TO_MEM in combination
      with ARG_CONST_SIZE_OR_ZERO are now such that in case
      of size 0, the pointer must always be valid and cannot
      be NULL. This fix in semantics allows for bpf_probe_read()
      to drop the recently added size == 0 check in the helper
      that would become part of uabi otherwise once released.
      At the same time we can then fix bpf_probe_read_str() and
      bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
      instead of ARG_CONST_SIZE in order to fix recently
      reported issues by Arnaldo et al, where LLVM optimizes
      two boundary checks into a single one for unknown
      variables where the verifier looses track of the variable
      bounds and thus rejects valid programs otherwise.

4) A fix for the verifier for the case when it detects
   comparison of two constants where the branch is guaranteed
   to not be taken at runtime. Verifier will rightfully prune
   the exploration of such paths, but we still pass the program
   to JITs, where they would complain about using reserved
   fields, etc. Track such dead instructions and sanitize
   them with mov r0,r0. Rejection is not possible since LLVM
   may generate them for valid C code and doesn't do as much
   data flow analysis as verifier. For bpf-next we might
   implement removal of such dead code and adjust branches
   instead. Fix from Alexei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24 02:33:01 +09:00
Willem de Bruijn
0c19f846d5 net: accept UFO datagrams from tuntap and packet
Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216f ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
  Booted a v4.13 guest kernel with QEMU. On a host kernel before this
  patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
  enabled, same as on a v4.13 host kernel.

  A UFO packet sent from the guest appears on the tap device:
    host:
      nc -l -p -u 8000 &
      tcpdump -n -i tap0

    guest:
      dd if=/dev/zero of=payload.txt bs=1 count=2000
      nc -u 192.16.1.1 8000 < payload.txt

  Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
  packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
  v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdfe8 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24 01:37:35 +09:00