Commit graph

738424 commits

Author SHA1 Message Date
Chengguang Xu
affff07739 libceph: check kstrndup() return value
Should check result of kstrndup() in case of memory allocation failure.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:12 +01:00
Zhi Zhang
e30ee58121 ceph: try to allocate enough memory for reserved caps
ceph_reserve_caps() may not reserve enough caps under high memory
pressure, but it saved the needed caps number that expected to
be reserved. When getting caps, crash would happen due to number
mismatch.

Now we will try to trim more caps when failing to allocate memory
for caps need to be reserved, then try again. If still failing to
allocate memory, return -ENOMEM.

Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:12 +01:00
Yan, Zheng
0f439c746c ceph: fix race of queuing delayed caps
When called with CHECK_CAPS_AUTHONLY flag, ceph_check_caps() only
processes auth caps. In that case, it's unsafe to remove inode
from mdsc->cap_delay_list, because there can be delayed non-auth
caps.

Besides, ceph_check_caps() may lock/unlock i_ceph_lock several
times, when multiple threads call ceph_check_caps() at the same
time. It's possible that one thread calls __cap_delay_requeue(),
another thread calls __cap_delay_cancel(). __cap_delay_cancel()
should be called at very beginning of ceph_check_caps(), so that
it does not race with __cap_delay_requeue().

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:11 +01:00
Yan, Zheng
ee612d954f ceph: delete unreachable code in ceph_check_caps()
"revoking & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)" has already
been tested before calling try_nonblocking_invalidate()

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:11 +01:00
Yan, Zheng
d84b37f9fa ceph: limit rate of cap import/export error messages
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:10 +01:00
Yan, Zheng
7d9c9193b5 ceph: fix incorrect snaprealm when adding caps
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:09 +01:00
Yan, Zheng
314c4737a4 ceph: fix un-balanced fsc->writeback_count update
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:09 +01:00
Yan, Zheng
5d98830828 ceph: track read contexts in ceph_file_info
Previously ceph_read_iter() uses current->journal to pass context info
to ceph_readpages(), so that ceph_readpages() can distinguish read(2)
from readahead(2)/fadvise(2)/madvise(2). The problem is that page fault
can happen when copying data to userspace memory. Page fault may call
other filesystem's page_mkwrite() if the userspace memory is mapped to a
file. The later filesystem may also want to use current->journal.

The fix is define a on-stack data structure in ceph_read_iter(), add it
to context list in ceph_file_info. ceph_readpages() searches the list,
find if there is a context belongs to current thread.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:08 +01:00
Yan, Zheng
5495c2d04f ceph: avoid dereferencing invalid pointer during cached readdir
Readdir cache keeps array of dentry pointers in page cache. If any
dentry in readdir cache gets pruned, ceph_d_prune() disables readdir
cache for later readdir syscall. The problem is that ceph_d_prune()
ignores unhashed dentry. Ideally MDS should have already revoked
CEPH_CAP_FILE_SHARED (which also disables readdir cache) when dentry
gets unhashed. But if it is somehow MDS does not properly revoke
CEPH_CAP_FILE_SHARED and the unhashed dentry gets pruned later,
ceph_d_prune() will not disable readdir cache, later readdir may
reference invalid dentry pointer.

The fix is make ceph_d_prune() do extra check for unhashed dentry.
Disable readdir cache if the unhashed dentry is still referenced
by readdir cache.

Another fix in this patch is handle d_splice_alias(). If a dentry
gets spliced into new parent dentry, treat it as if it was pruned
(call ceph_d_prune() for it).

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:07 +01:00
Yan, Zheng
97aeb6bf98 ceph: use atomic_t for ceph_inode_info::i_shared_gen
It allows accessing i_shared_gen without holding i_ceph_lock. It is
preparation for later patch.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:07 +01:00
Yan, Zheng
8d8f371c83 ceph: cleanup traceless reply handling for rename
ceph_fill_trace() already calls ceph_invalidate_dir_request() for
traceless reply. No need to duplicate the code in ceph_rename().

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:06 +01:00
Yan, Zheng
87c91a965a ceph: voluntarily drop Fx cap for readdir request
MDS need to rdlock directory inode's filelock when handling readdir
request. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:05 +01:00
Yan, Zheng
be70489eff ceph: properly drop caps for setattr request
For CEPH_SETATTR_ATIME, MDS needs to xlock filelock, Fsxrw caps
are not allowed for xlocked filelock.

For CEPH_SETATTR_SIZE request that truncates file to smaller size,
MDS needs to xlock filelock, Fsxrw caps are not allowed for xlocked
filelock.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:05 +01:00
Yan, Zheng
d19a0b5401 ceph: voluntarily drop Lx cap for link/rename requests
MDS need to xlock inode's linklock when handling link/rename requests.
Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:04 +01:00
Yan, Zheng
222b7f90ba ceph: voluntarily drop Ax cap for requests that create new inode
MDS need to rdlock directory inode's authlock when handling these
requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 18:36:04 +01:00
Ilya Dryomov
e573427a44 rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
This feature bit restricts older clients from performing certain
maintenance operations against an image (e.g. clone, snap create).
krbd does not perform maintenance operations.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-01-29 18:36:03 +01:00
Jason Wang
4cd879515d vhost_net: stop device during reset owner
We don't stop device before reset owner, this means we could try to
serve any virtqueue kick before reset dev->worker. This will result a
warn since the work was pending at llist during owner resetting. Fix
this by stopping device during owner reset.

Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com
Fixes: 3a4d5c94e9 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:26:20 -05:00
David S. Miller
e8368d9ebb Merge branch 'net-Ease-to-follow-an-interface-that-moves-to-another-netns'
Nicolas Dichtel says:

====================
net: Ease to follow an interface that moves to another netns

The goal of this series is to ease the user to follow an interface that
moves to another netns.

After this series, with a patched iproute2:

$ ip netns
bar
foo
$ ip monitor link &
$ ip link set dummy0 netns foo
Deleted 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

=> new nsid: 0, new ifindex: 6 (was 5 in the previous netns)

$ ip link set eth1 netns bar
Deleted 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3

=> new nsid: 1, new ifindex: 3 (same ifindex)

$ ip netns
bar (id: 1)
foo (id: 0)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:23:53 -05:00
Nicolas Dichtel
38e01b3056 dev: advertise the new ifindex when the netns iface changes
The goal is to let the user follow an interface that moves to another
netns.

CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:23:52 -05:00
Nicolas Dichtel
c36ac8e230 dev: always advertise the new nsid when the netns iface changes
The user should be able to follow any interface that moves to another
netns.  There is no reason to hide physical interfaces.

CC: Jiri Benc <jbenc@redhat.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:23:51 -05:00
Vadim Lomovtsev
6b9e65474b net: ethernet: cavium: Correct Cavium Thunderx NIC driver names accordingly to module name
It was found that ethtool provides unexisting module name while
it queries the specified network device for associated driver
information. Then user tries to unload that module by provided
module name and fails.

This happens because ethtool reads value of DRV_NAME macro,
while module name is defined at the driver's Makefile.

This patch is to correct Cavium CN88xx Thunder NIC driver names
(DRV_NAME macro) 'thunder-nicvf' to 'nicvf' and 'thunder-nic'
to 'nicpf', sync bgx and xcv driver names accordingly to their
module names.

Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:22:06 -05:00
Linus Torvalds
49f9c3552c init_task out-of-lining
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWl80tvSw1s6N8H32AQJq8A//ViRN5fExrd678Eh2Bz1ytrJYMUfYY3Hv
 QTH5TH9zFyLFyWLB1Iwe13sdLVTTM88O0qcDb54Lx9fWUqeMZyYvBhLtWPc00lTU
 0m3EyYR87MFWaEV+VxaVWgWaWkMDkd39KubDitcS+YIBDszTuMpYodhPUsgLt7lr
 pePX7eurXKdQPTh4NUOjGA2NaZot3tga76J6D8NKruGYUstQCGxpP1ryiFfACnwf
 NLWNO8ZBMtlDwX1mHYOOMFMaBzFzXorPm7jY4HJDf3mUM84xI3ach6CuH9RTSzfq
 A+qB1U3QILPVFo2HtqOHui4bFjRwqOf6uIrI/KcnioJ37w1O+KFcMJeDnX2I211q
 f2lXehJLQA7kPmxQw8T3//HDRaLXc0Qxt7IPZRFinrlkcN4oh3DD5euMfCFBSoZG
 PTbjxlgMfzJPoZtqAcy0rV5L54a/F4h915OQPJCKLwujIsXD2nT993vNmGDyq4zh
 BzNMxSXJC8p+jYvQpNhWyyxwDBBT/YsVQo/ACwg4eJnD3blVTAioRT9ZZcAcsY0F
 0z1eWW5RiknzIaXQWvjfK0gYKpO+aMSu9+gipHfMbU3yXG+sPj/H6zAHYzqX3uQZ
 jb5Iujjnu49W/YD+RiMenuu59lNXUnLSeRnlV7dw0qxGK1FzGo24+ZzKFhJhKvzG
 tdfUsev1Mc8=
 =jhWg
 -----END PGP SIGNATURE-----

Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull init_task initializer cleanups from David Howells:
 "It doesn't seem useful to have the init_task in a header file rather
  than in a normal source file. We could consolidate init_task handling
  instead and expand out various macros.

  Here's a series of patches that consolidate init_task handling:

   (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
       openrisc.

   (2) Alter the INIT_TASK_DATA linker script macro to set
       init_thread_union and init_stack rather than defining these in C.

       Insert init_task and init_thread_into into the init_stack area in
       the linker script as appropriate to the configuration, with
       different section markers so that they end up correctly ordered.

       We can then get merge ia64's init_task.c into the main one.

       We then have a bunch of single-use INIT_*() macros that seem only
       to be macros because they used to be used per-arch. We can then
       expand these in place of the user and get rid of a few lines and
       a lot of backslashes.

   (3) Expand INIT_TASK() in place.

   (4) Expand in place various small INIT_*() macros that are defined
       conditionally. Expand them and surround them by #if[n]def/#endif
       in the .c file as it takes fewer lines.

   (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.

   (6) Expand INIT_STRUCT_PID in place.

  These macros can then be discarded"

* tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  Expand INIT_STRUCT_PID and remove
  Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
  Expand various INIT_* macros and remove
  Expand INIT_TASK() in init/init_task.c and remove
  Construct init thread stack in the linker script rather than by union
  openrisc: Make THREAD_SIZE available to vmlinux.lds
  hexagon: Make THREAD_SIZE available to vmlinux.lds
  cris: Make THREAD_SIZE available to vmlinux.lds
2018-01-29 09:08:34 -08:00
David S. Miller
bfbe5bab66 Merge branch 'ptr_ring-fixes'
Michael S. Tsirkin says:

====================
ptr_ring fixes

This fixes a bunch of issues around ptr_ring use in net core.
One of these: "tap: fix use-after-free" is also needed on net,
but can't be backported cleanly.

I will post a net patch separately.

Lightly tested - Jason, could you pls confirm this
addresses the security issue you saw with ptr_ring?
Testing reports would be appreciated too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2018-01-29 12:02:55 -05:00
Michael S. Tsirkin
491847f3b2 tools/virtio: fix smp_mb on x86
Offset 128 overlaps the last word of the redzone.
Use 132 which is always beyond that.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:55 -05:00
Michael S. Tsirkin
b4eab7de66 tools/virtio: copy READ/WRITE_ONCE
This is to make ptr_ring test build again.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:55 -05:00
Michael S. Tsirkin
6dd4215783 tools/virtio: more stubs to fix tools build
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
30f1d37074 tools/virtio: switch to __ptr_ring_empty
We don't rely on lockless guarantees, but it
seems cleaner than inverting __ptr_ring_peek.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
a07d29c672 ptr_ring: prevent queue load/store tearing
In theory compiler could tear queue loads or stores in two. It does not
seem to be happening in practice but it seems easier to convert the
cases where this would be a problem to READ/WRITE_ONCE than worry about
it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
f417dc2818 skb_array: use __ptr_ring_empty
__skb_array_empty should use __ptr_ring_empty since that's the only
legal lockless function.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
9fb582b670 Revert "net: ptr_ring: otherwise safe empty checks can overrun array bounds"
This reverts commit bcecb4bbf8.

If we try to allocate an extra entry as the above commit did, and when
the requested size is UINT_MAX, addition overflows causing zero size to
be passed to kmalloc().

kmalloc then returns ZERO_SIZE_PTR with a subsequent crash.

Reported-by: syzbot+87678bcf753b44c39b67@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
84328342a7 ptr_ring: disallow lockless __ptr_ring_full
Similar to bcecb4bbf8 ("net: ptr_ring: otherwise safe empty checks can
overrun array bounds") a lockless use of __ptr_ring_full might
cause an out of bounds access.

We can fix this, but it's easier to just disallow lockless
__ptr_ring_full for now.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:54 -05:00
Michael S. Tsirkin
88fae87327 tap: fix use-after-free
Lockless access to __ptr_ring_full is only legal if ring is
never resized, otherwise it might cause use-after free errors.
Simply drop the lockless test, we'll drop the packet
a bit later when produce fails.

Fixes: 362899b8 ("macvtap: switch to use skb array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Michael S. Tsirkin
a259df36d1 ptr_ring: READ/WRITE_ONCE for __ptr_ring_empty
Lockless __ptr_ring_empty requires that consumer head is read and
written at once, atomically. Annotate accordingly to make sure compiler
does it correctly.  Switch locked callers to __ptr_ring_peek which does
not support the lockless operation.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Michael S. Tsirkin
8619d384eb ptr_ring: clean up documentation
The only function safe to call without locks
is __ptr_ring_empty. Move documentation about
lockless use there to make sure people do not
try to use __ptr_ring_peek outside locks.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Michael S. Tsirkin
406de75554 ptr_ring: keep consumer_head valid at all times
The comment near __ptr_ring_peek says:

 * If ring is never resized, and if the pointer is merely
 * tested, there's no need to take the lock - see e.g.  __ptr_ring_empty.

but this was in fact never possible since consumer_head would sometimes
point outside the ring. Refactor the code so that it's always
pointing within a ring.

Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 12:02:53 -05:00
Bob Peterson
2eb5909dee GFS2: Don't try to end a non-existent transaction in unlink
Before this patch, if function gfs2_unlink failed to get a valid
transaction (for example, not enough journal blocks) it would go
to label out_end_trans which did gfs2_trans_end. But if the
trans_begin failed, there's no transaction to end, and trying to
do so results in: kernel BUG at fs/gfs2/trans.c:117!

This patch changes the goto so that it does not try to end a
non-existent transaction.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-29 10:00:23 -07:00
Ville Syrjälä
2a8d3eac3d drm: Warn if plane/crtc/encoder/connector index exceeds our 32bit bitmasks
We use 32bit bitmasks to track planes/crtcs/encoders/connectors.
Naturally we can only do that if the index of those objects stays
below 32. Issue a warning whenever we exceed that limit, hopefully
prompting someone to fix the problem.

For connectors the issue is a bit more complicated as they can
be created/destroyed at runtime due to MST. So the problem is no
longer a purely theoretical programmer error. As the connector
indexes are allocated via ida, we can simply limit the maximum
value the ida is allowed to hand out. The error handling is already
in place.

v2: Return an error to the caller (Harry)
v3: Print a debug message so that we know what happened (Maarten)

Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180125133020.23845-1-ville.syrjala@linux.intel.com
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
2018-01-29 18:46:53 +02:00
Martin KaFai Lau
7ece54a60e ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
implicitly implies it is an ipv6only socket.  However, in inet6_bind(),
this addr_type checking and setting sk->sk_ipv6only to 1 are only done
after sk->sk_prot->get_port(sk, snum) has been completed successfully.

This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
the 'get_port()'.

In particular, when binding SO_REUSEPORT UDP sockets,
udp_reuseport_add_sock(sk,...) is called.  udp_reuseport_add_sock()
checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
sk2->sk_reuseport_cb.  In this case, ipv6_only_sock(sk2) could be
1 while ipv6_only_sock(sk) is still 0 here.  The end result is,
reuseport_alloc(sk) is called instead of adding sk to the existing
sk2->sk_reuseport_cb.

It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
IPv6 address (!ANY and !MAPPED).  Only one of the socket will
receive packet.

The fix is to set the implicit sk_ipv6only before calling get_port().
The original sk_ipv6only has to be saved such that it can be restored
in case get_port() failed.  The situation is similar to the
inet_reset_saddr(sk) after get_port() has failed.

Thanks to Calvin Owens <calvinowens@fb.com> who created an easy
reproduction which leads to a fix.

Fixes: e32ea7e747 ("soreuseport: fast reuseport UDP socket selection")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:37:40 -05:00
David S. Miller
2479c2c9f4 Merge branch 'rtnetlink-enable-IFLA_IF_NETNSID-for-RTM_DELLINK-RTM_SETINK'
Christian Brauner says:

====================
rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

Based on the previous discussion this enables passing a IFLA_IF_NETNSID
property along with RTM_SETLINK and RTM_DELLINK requests. The patch for
RTM_NEWLINK will be sent out in a separate patch since there are more
corner-cases to think about.

Changelog 2018-01-24:
* Preserve old behavior and report -ENODEV when either ifindex or ifname is
  provided and IFLA_GROUP is set. Spotted by Wolfgang Bumiller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:07 -05:00
Christian Brauner
b61ad68a9f rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK
- Backwards Compatibility:
  If userspace wants to determine whether RTM_DELLINK supports the
  IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
  with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
  does not include IFLA_IF_NETNSID userspace should assume that
  IFLA_IF_NETNSID is not supported on this kernel.
  If the reply does contain an IFLA_IF_NETNSID property userspace
  can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive
  EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
  with RTM_DELLINK. Userpace should then fallback to other means.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:06 -05:00
Christian Brauner
c310bfcb6e rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK
- Backwards Compatibility:
  If userspace wants to determine whether RTM_SETLINK supports the
  IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
  with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
  does not include IFLA_IF_NETNSID userspace should assume that
  IFLA_IF_NETNSID is not supported on this kernel.
  If the reply does contain an IFLA_IF_NETNSID property userspace
  can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive
  EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
  with RTM_SETLINK. Userpace should then fallback to other means.

  To retain backwards compatibility the kernel will first check whether a
  IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either
  one is found it will be used to identify the target network namespace.
  This implies that users who do not care whether their running kernel
  supports IFLA_IF_NETNSID with RTM_SETLINK can pass both
  IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network
  namespace.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:06 -05:00
Christian Brauner
7c4f63ba82 rtnetlink: enable IFLA_IF_NETNSID in do_setlink()
RTM_{NEW,SET}LINK already allow operations on other network namespaces
by identifying the target network namespace through IFLA_NET_NS_{FD,PID}
properties. This is done by looking for the corresponding properties in
do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID
property. This introduces no functional changes since all callers of
do_setlink() currently block IFLA_IF_NETNSID by reporting an error before
they reach do_setlink().

This introduces the helpers:

static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net, struct
                                               nlattr *tb[])

static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
                                             struct net *src_net,
					     struct nlattr *tb[], int cap)

to simplify permission checks and target network namespace retrieval for
RTM_* requests that already support IFLA_NET_NS_{FD,PID} but get extended
to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look
for IFLA_NET_NS_{FD,PID} properties first before checking for
IFLA_IF_NETNSID.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 11:31:06 -05:00
Chris Wilson
c7cc144d8f drm/i915: Assert that we do not try to unsubmit a completed request
Assert that we do not try to unsubmit a completed request, as should we
try to resubmit it later, the ring is already past the request's
breadcrumb and the breadcrumb will not be updated.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129094912.14428-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2018-01-29 15:38:56 +00:00
Chris Wilson
7fb9ee5db2 drm/i915: Simplify guard logic for setup_scratch_page()
Older gcc is complaining it can't follow the guards and thinks that
addr may be used uninitialised

In the process, we can simplify down to one loop,
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-131 (-131)
Function                                     old     new   delta
setup_scratch_page                           545     414    -131

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129102840.19901-1-chris@chris-wilson.co.uk
2018-01-29 15:37:53 +00:00
Meghana Madhyastha
2e4ef3347b video: backlight: Add devres versions of of_find_backlight
Add devm_of_find_backlight and the corresponding release
function because some drivers use devres versions of functions
for acquiring device resources.

Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Meghana Madhyastha <meghana.madhyastha@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/021f8fecfa3f374dc5dcb70fb07a6f6b019bea7b.1516810725.git.meghana.madhyastha@gmail.com
2018-01-29 10:34:53 -05:00
Meghana Madhyastha
c2adda27d2 video: backlight: Add of_find_backlight helper in backlight.c
Add of_find_backlight, a helper function which is a generic version
of tinydrm_of_find_backlight that can be used by other drivers to avoid
repetition of code and simplify things.

Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Meghana Madhyastha <meghana.madhyastha@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/116d160ba78be2e6dcbdcb6855622bce67da9472.1516810725.git.meghana.madhyastha@gmail.com
2018-01-29 10:34:53 -05:00
Meghana Madhyastha
5b698be049 video: backlight: Add helpers to enable and disable backlight
Add helper functions backlight_enable and backlight_disable to
enable/disable a backlight device. These helper functions can
then be used by different drm and tinydrm drivers to avoid
repetition of code and also to enforce a uniform and consistent
way to enable/disable a backlight device.

Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Meghana Madhyastha <meghana.madhyastha@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/39b5bf0a02008a8072d910bdf8231c431e9ef504.1516810725.git.meghana.madhyastha@gmail.com
2018-01-29 10:34:53 -05:00
Christoph Hellwig
1e369b0e19 xfs: remove experimental tag for reflinks
But reject reflink + DAX file systems for now until the code to
support reflinks on DAX is actually implemented.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: port to 4.16]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29 07:27:24 -08:00
Darrick J. Wong
6d8a45ce29 xfs: don't screw up direct writes when freesp is fragmented
xfs_bmap_btalloc is given a range of file offset blocks that must be
allocated to some data/attr/cow fork.  If the fork has an extent size
hint associated with it, the request will be enlarged on both ends to
try to satisfy the alignment hint.  If free space is fragmentated,
sometimes we can allocate some blocks but not enough to fulfill any of
the requested range.  Since bmapi_allocate always trims the new extent
mapping to match the originally requested range, this results in
bmapi_write returning zero and no mapping.

The consequences of this vary -- buffered writes will simply re-call
bmapi_write until it can satisfy at least one block from the original
request.  Direct IO overwrites notice nmaps == 0 and return -ENOSPC
through the dio mechanism out to userspace with the weird result that
writes fail even when we have enough space because the ENOSPC return
overrides any partial write status.  For direct CoW writes the situation
was disastrous because nobody notices us returning an invalid zero-length
wrong-offset mapping to iomap and the write goes off into space.

Therefore, if free space is so fragmented that we managed to allocate
some space but not enough to map into even a single block of the
original allocation request range, we should break the alignment hint in
order to guarantee at least some forward progress for the direct write.
If we return a short allocation to iomap_apply it'll call back about the
remaining blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29 07:27:24 -08:00
Darrick J. Wong
9f37bd11b4 xfs: check reflink allocation mappings
There's a really bad bug in xfs_reflink_allocate_cow -- if bmapi_write
can return a zero error code but no mappings.  This happens if there's
an extent size hint (which causes allocation requests to be rounded to
extsz granularity internally), but there wasn't a big enough chunk of
free space to start filling at the extsz granularity and fill even one
block of the range that we actually requested.

In any case, if we got no mappings we can't possibly do anything useful
with the contents of imap, so we must bail out with ENOSPC here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29 07:27:24 -08:00