Commit graph

57824 commits

Author SHA1 Message Date
Chao Yu
704956ecf5 f2fs: support inode checksum
This patch adds to support inode checksum in f2fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix verification flow]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-03 19:09:26 -07:00
Gary R Hook
e652399edb crypto: ccp - Fix XTS-AES-128 support on v5 CCPs
Version 5 CCPs have some new requirements for XTS-AES: the type field
must be specified, and the key requires 512 bits, with each part
occupying 256 bits and padded with zeroes.

cc: <stable@vger.kernel.org> # 4.9.x+

Signed-off-by: Gary R Hook <ghook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04 09:27:41 +08:00
Linus Torvalds
869c058fbe VFIO fixes for v4.13-rc4
- SPAPR/EEH config build fix (Murilo Opsfelder Araujo)
 
  - Fix possible device lock deadlock (Alex Williamson)
 
  - Correctly size integrated endpoint PCIe capabilities (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJZg5dIAAoJECObm247sIsijNgP/3bcd0y1JQD5n6aCgZGdUxI1
 RY/n1guN3RiiQFKpG3sGKGNOg1GL6HdoP9ifSH44ZCpTdVogGNok5ZPlyl8AkbHA
 r5hAhxKAaSC/PCNusVxAqTFHH2wQoAKe3AVpoWsUJ2sdWmIYBZURdgxKkdHwV+UN
 66ndlUodmmVylVpMdhpEh/T07D4rIDcAIf3fQeXT2ajNr7qL1ObTEyBnR0CL2kAL
 w6DeUOVXp2sKFTA4yVApjb2hIGCass9tlW2fWcieJd06+PGz2ExMrMGZpCqRsQJt
 UonMDzXRav/IFxkK+x2kwE9C4537bmg2DPvKhXd91pt3La2senSCzYGteYIL4/Uc
 madQgUPpVDx5NcqeKW3f1isCK+3qulfw6BtEckQiRqvJI/ftdZf/T93moLEUtQd7
 axbExt2lLh1RS0xiqm2wmpO34oW/UbHhBX435/YPbh8JrmqiHvVtoJAzIq17buPR
 f4z+We3JvyOJ9kX5TgKT+lfobz7ZNYwK9WtmVh7mTNv7WYvfiZtJgQqqsOVXL8ql
 KVecEfbTpfG07ofBxUCQkWHCyH4tglyzyaCz2uh37G2mtLvaW6CWDk8A5SMpFAd+
 ActM4icBiFLwUXIAAtnA3cDqJlYdXVGWynSs4MJAUJZtra5nNSTuGvd4Eil2SaFf
 dRVpner34sWYWGBRLlTo
 =dF3c
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - SPAPR/EEH config build fix (Murilo Opsfelder Araujo)

 - Fix possible device lock deadlock (Alex Williamson)

 - Correctly size integrated endpoint PCIe capabilities (Alex
   Williamson)

* tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio:
  vfio/pci: Fix handling of RC integrated endpoint PCIe capability size
  vfio/pci: Use pci_try_reset_function() on initial open
  include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH
2017-08-03 15:25:14 -07:00
Linus Torvalds
995d03ae26 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

[ This does not merge the "fortify: use WARN instead of BUG for now"
  patch, which needs a bit of extra work to build cleanly with all
  configurations. Arnd is on it.   - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  ocfs2: don't clear SGID when inheriting ACLs
  mm: allow page_cache_get_speculative in interrupt context
  userfaultfd: non-cooperative: flush event_wqh at release time
  ipc: add missing container_of()s for randstruct
  cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
  userfaultfd_zeropage: return -ENOSPC in case mm has gone
  mm: take memory hotplug lock within numa_zonelist_order_handler()
  mm/page_io.c: fix oops during block io poll in swapin path
  zram: do not free pool->size_class
  kthread: fix documentation build warning
  kasan: avoid -Wmaybe-uninitialized warning
  userfaultfd: non-cooperative: notify about unmap of destination during mremap
  mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
  pid: kill pidhash_size in pidhash_init()
  mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
2017-08-03 14:58:13 -07:00
Matthew Minter
004cb78401 PCI: Remove unused pci_fixup_irqs() function
Now we have removed all callers of pci_fixup_irqs() and migrated everything
to pci_assign_irq(), delete the pci_fixup_irqs() function completely.

Signed-off-by: Matthew Minter <matt@masarand.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-03 16:31:28 -05:00
Lukas Wunner
630b3aff8a treewide: Consolidate Apple DMI checks
We're about to amend ACPI bus scan with DMI checks whether we're running
on a Mac to support Apple device properties in AML.  The DMI checks are
performed for every single device, adding overhead for everything x86
that isn't Apple, which is the majority.  Rafael and Andy therefore
request to perform the DMI match only once and cache the result.

Outside of ACPI various other Apple DMI checks exist and it seems
reasonable to use the cached value there as well.  Rafael, Andy and
Darren suggest performing the DMI check in arch code and making it
available with a header in include/linux/platform_data/x86/.

To this end, add early_platform_quirks() to arch/x86/kernel/quirks.c
to perform the DMI check and invoke it from setup_arch().  Switch over
all existing Apple DMI checks, thereby fixing two deficiencies:

* They are now #defined to false on non-x86 arches and can thus be
  optimized away if they're located in cross-arch code.

* Some of them only match "Apple Inc." but not "Apple Computer, Inc.",
  which is used by BIOSes released between January 2006 (when the first
  x86 Macs started shipping) and January 2007 (when the company name
  changed upon introduction of the iPhone).

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Suggested-by: Darren Hart <dvhart@infradead.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-03 23:26:22 +02:00
Xin Long
bb96dec745 sctp: remove the typedef sctp_auth_chunk_t
This patch is to remove the typedef sctp_auth_chunk_t, and
replace with struct sctp_auth_chunk in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
96f7ef4d58 sctp: remove the typedef sctp_authhdr_t
This patch is to remove the typedef sctp_authhdr_t, and
replace with struct sctp_authhdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
68d7546946 sctp: remove the typedef sctp_addip_chunk_t
This patch is to remove the typedef sctp_addip_chunk_t, and
replace with struct sctp_addip_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
65205cc465 sctp: remove the typedef sctp_addiphdr_t
This patch is to remove the typedef sctp_addiphdr_t, and
replace with struct sctp_addiphdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
8b32f2348a sctp: remove the typedef sctp_addip_param_t
This patch is to remove the typedef sctp_addip_param_t, and
replace with struct sctp_addip_param in the places where it's
using this typedef.

It is to use sizeof(variable) instead of sizeof(type), and
also fix some indent problems.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
05b25d0ba6 sctp: remove the typedef sctp_cwr_chunk_t
Remove this typedef including the struct, there is even no places
using it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
65f7710543 sctp: remove the typedef sctp_cwrhdr_t
This patch is to remove the typedef sctp_cwrhdr_t, and
replace with struct sctp_cwrhdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
b515fd2759 sctp: remove the typedef sctp_ecne_chunk_t
This patch is to remove the typedef sctp_ecne_chunk_t, and
replace with struct sctp_ecne_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
1fb6d83bd3 sctp: remove the typedef sctp_ecnehdr_t
This patch is to remove the typedef sctp_ecnehdr_t, and
replace with struct sctp_ecnehdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
2a49321677 sctp: remove the typedef sctp_error_t
This patch is to remove the typedef sctp_error_t, and replace
with enum sctp_error in the places where it's using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
87caeba791 sctp: remove the typedef sctp_operr_chunk_t
This patch is to remove the typedef sctp_operr_chunk_t, and
replace with struct sctp_operr_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
Xin Long
d8238d9dab sctp: remove the typedef sctp_errhdr_t
This patch is to remove the typedef sctp_errhdr_t, and replace
with struct sctp_errhdr in the places where it's using this
typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
Xin Long
ac23e68133 sctp: fix the name of struct sctp_shutdown_chunk_t
This patch is to fix the name of struct sctp_shutdown_chunk_t
, replace with struct sctp_initack_chunk in the places where
it's using it.

It is also to fix some indent problem.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
Xin Long
e61e4055b1 sctp: remove the typedef sctp_shutdownhdr_t
This patch is to remove the typedef sctp_shutdownhdr_t, and
replace with struct sctp_shutdownhdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
Boris Brezillon
d4cb37e716 mtd: nand: Remove support for block locking/unlocking
Commit 7d70f334ad ("mtd: nand: add lock/unlock routines") introduced
support for the Micron LOCK/UNLOCK commands but no one ever used the
nand_lock/unlock() functions.

Remove support for these vendor-specific operations from the core. If
one ever wants to add them back they should be put in nand_micron.c and
mtd->_lock/_unlock should be directly assigned from there instead of
exporting the functions.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-08-03 17:30:40 +02:00
João Paulo Rechi Vita
957b8dffa4 HID: multitouch: Support Asus T304UA media keys
The Asus T304UA convertible sports a magnetic detachable keyboard with
touchpad, which is connected over USB. Most of the keyboard hotkeys are
exposed through the same USB interface as the touchpad, defined in the
report descriptor as follows:

0x06, 0x31, 0xFF,  // Usage Page (Vendor Defined 0xFF31)
0x09, 0x76,        // Usage (0x76)
0xA1, 0x01,        // Collection (Application)
0x05, 0xFF,        //   Usage Page (Reserved 0xFF)
0x85, 0x5A,        //   Report ID (90)
0x19, 0x00,        //   Usage Minimum (0x00)
0x2A, 0xFF, 0x00,  //   Usage Maximum (0xFF)
0x15, 0x00,        //   Logical Minimum (0)
0x26, 0xFF, 0x00,  //   Logical Maximum (255)
0x75, 0x08,        //   Report Size (8)
0x95, 0x0F,        //   Report Count (15)
0xB1, 0x02,        //   Feature (Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position,Non-volatile)
0x05, 0xFF,        //   Usage Page (Reserved 0xFF)
0x85, 0x5A,        //   Report ID (90)
0x19, 0x00,        //   Usage Minimum (0x00)
0x2A, 0xFF, 0x00,  //   Usage Maximum (0xFF)
0x15, 0x00,        //   Logical Minimum (0)
0x26, 0xFF, 0x00,  //   Logical Maximum (255)
0x75, 0x08,        //   Report Size (8)
0x95, 0x02,        //   Report Count (2)
0x81, 0x02,        //   Input (Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position)
0xC0,              // End Collection

This UsagePage is declared as a variable, but we need to treat it as an
array to be able to map each Usage we care about to its corresponding
input key.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-08-03 11:20:17 +02:00
Linus Torvalds
19ec50a438 Two more NFS client bugfixes for 4.13
Stable fix:
 - Fix EXCHANGE_ID corrupt verifier issue
 
 Other fix:
 - Fix double frees in nfs4_test_session_trunk()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlmCNJAACgkQ18tUv7Cl
 QOvlKhAAjjKMtdwYDV7B3SJvDfa5KNH9nNaMjKoD/OzWavDL7jRVRUWayeJiy4sf
 wJAMsG280ztag1Mr/OnClESZThWNrHTnKjq/P8kuefJz+pPvWP+AaQ/8QSM/rulM
 Y7qfh+FZb2t7lK80VAZTXEi4bvQzlkIV2285a34VIUm3VTpwl3HvltkBMLFztDSb
 sdaQh+N48FOYdG9RMncxpzFfRvUILzP9GFMg6WAFAcbr1wSsT9MfLqd+ANgOidE8
 X2Ch9l0jN51m1dFGmMUZ5HMmuhUh2m50SXhmUOTpS7fj+k11OWsV2IIWHKPk/LjI
 5E0aUPDU2TOE9noLSfkguAyxvqAGeAHPDIE7QM9vTGgktzXWGzh++ODeSzkIDp+5
 iZ+cYLkme1lOg1nEqj1/SrIZXHP7ZCvK8iqNgoqT8rIp6zE1tLPnNAbRDmnwC/VH
 PrfINATV8llN/3vFojWJfD1enDe3+dALPINLVQOjwjafq6d5/hzRdCGqWpCDjmlU
 esDoCkoWGUjadIr6EZGtDAzSZgafsUx6d6QnGtkIUsz1d0FIXH6holB5EKaQpOLf
 dVb9iO5R2EcVP+WvB3KjA5y6MCNvoVqebxMvLBPCYsu3fI8fQ0sgSjgSJXi0fRzg
 G7DUsKcBGVKarDMXTUReL2G6lN7h0f5EsM1WnA9KF0TV9aah/ZE=
 =Ddq9
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Two fixes from Trond this time, now that he's back from his vacation.
  The first is a stable fix for the EXCHANGE_ID issue on the mailing
  list, and the other fixes a double-free situation that he found at the
  same time.

  Stable fix:
   - Fix EXCHANGE_ID corrupt verifier issue

  Other fix:
   - Fix double frees in nfs4_test_session_trunk()"

* tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4: Fix double frees in nfs4_test_session_trunk()
  NFSv4: Fix EXCHANGE_ID corrupt verifier issue
2017-08-02 20:56:44 -07:00
Kan Liang
1ee1c3f5b5 mm: allow page_cache_get_speculative in interrupt context
Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
handler.

The bug was introduced by commit 2947ba054a ("x86/mm/gup: Switch GUP
to the generic get_user_page_fast() implementation").

The original x86 __get_user_page_fast used plain get_page() or
page_ref_add().  However, the generic __get_user_page_fast uses
page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).

There is no reason to prevent page_cache_get_speculative from using in
interrupt context.  According to the author, putting a BUG_ON there is
just because the code is not verifying correctness of interrupt races.
I did some tests in interrupt context.  There is no issue found.

Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().

Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
Fixes: 2947ba054a ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:13 -07:00
Dima Zavin
89affbf5d9 cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
In codepaths that use the begin/retry interface for reading
mems_allowed_seq with irqs disabled, there exists a race condition that
stalls the patch process after only modifying a subset of the
static_branch call sites.

This problem manifested itself as a deadlock in the slub allocator,
inside get_any_partial.  The loop reads mems_allowed_seq value (via
read_mems_allowed_begin), performs the defrag operation, and then
verifies the consistency of mem_allowed via the read_mems_allowed_retry
and the cookie returned by xxx_begin.

The issue here is that both begin and retry first check if cpusets are
enabled via cpusets_enabled() static branch.  This branch can be
rewritted dynamically (via cpuset_inc) if a new cpuset is created.  The
x86 jump label code fully synchronizes across all CPUs for every entry
it rewrites.  If it rewrites only one of the callsites (specifically the
one in read_mems_allowed_retry) and then waits for the
smp_call_function(do_sync_core) to complete while a CPU is inside the
begin/retry section with IRQs off and the mems_allowed value is changed,
we can hang.

This is because begin() will always return 0 (since it wasn't patched
yet) while retry() will test the 0 against the actual value of the seq
counter.

The fix is to use two different static keys: one for begin
(pre_enable_key) and one for retry (enable_key).  In cpuset_inc(), we
first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
always return a valid seqcount if are enabling cpusets.  Similarly, when
disabling cpusets via cpuset_dec(), we first ensure that callers of
cpuset_mems_allowed_retry() will start ignoring the seqcount value
before we let cpuset_mems_allowed_begin() return 0.

The relevant stack traces of the two stuck threads:

  CPU: 1 PID: 1415 Comm: mkdir Tainted: G L  4.9.36-00104-g540c51286237 #4
  Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
  task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
  RIP: smp_call_function_many+0x1f9/0x260
  Call Trace:
    smp_call_function+0x3b/0x70
    on_each_cpu+0x2f/0x90
    text_poke_bp+0x87/0xd0
    arch_jump_label_transform+0x93/0x100
    __jump_label_update+0x77/0x90
    jump_label_update+0xaa/0xc0
    static_key_slow_inc+0x9e/0xb0
    cpuset_css_online+0x70/0x2e0
    online_css+0x2c/0xa0
    cgroup_apply_control_enable+0x27f/0x3d0
    cgroup_mkdir+0x2b7/0x420
    kernfs_iop_mkdir+0x5a/0x80
    vfs_mkdir+0xf6/0x1a0
    SyS_mkdir+0xb7/0xe0
    entry_SYSCALL_64_fastpath+0x18/0xad

  ...

  CPU: 2 PID: 1 Comm: init Tainted: G L  4.9.36-00104-g540c51286237 #4
  Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
  task: ffff8818087c0000 task.stack: ffffc90000030000
  RIP: int3+0x39/0x70
  Call Trace:
    <#DB> ? ___slab_alloc+0x28b/0x5a0
    <EOE> ? copy_process.part.40+0xf7/0x1de0
    __slab_alloc.isra.80+0x54/0x90
    copy_process.part.40+0xf7/0x1de0
    copy_process.part.40+0xf7/0x1de0
    kmem_cache_alloc_node+0x8a/0x280
    copy_process.part.40+0xf7/0x1de0
    _do_fork+0xe7/0x6c0
    _raw_spin_unlock_irq+0x2d/0x60
    trace_hardirqs_on_caller+0x136/0x1d0
    entry_SYSCALL_64_fastpath+0x5/0xad
    do_syscall_64+0x27/0x350
    SyS_clone+0x19/0x20
    do_syscall_64+0x60/0x350
    entry_SYSCALL64_slow_path+0x25/0x25

Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
Fixes: 46e700abc4 ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
Signed-off-by: Dima Zavin <dmitriyz@waymo.com>
Reported-by: Cliff Spradlin <cspradlin@waymo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Jonathan Corbet
d16977f3a6 kthread: fix documentation build warning
The kerneldoc comment for kthread_create() had an incorrect argument
name, leading to a warning in the docs build.

Correct it, and make one more small step toward a warning-free build.

Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:47 -07:00
Mel Gorman
3ea277194d mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

        CPU0                            CPU1
        ----                            ----
                                        user accesses memory using RW PTE
                                        [PTE now cached in TLB]
        try_to_unmap_one()
        ==> ptep_get_and_clear()
        ==> set_tlb_ubc_flush_pending()
                                        mprotect(addr, PROT_READ)
                                        ==> change_pte_range()
                                        ==> [ PTE non-present - no flush ]

                                        user writes using cached RW PTE
        ...

        try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access.  For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB.  However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption.  In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch.  Other users such as gup as a race with
reclaim looks just at PTEs.  huge page variants should be ok as they
don't race with reclaim.  mincore only looks at PTEs.  userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected.  His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>	[v4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Paolo Bonzini
3898da947b KVM: avoid using rcu_dereference_protected
During teardown, accesses to memslots and buses are using
rcu_dereference_protected with an always-true condition because
these accesses are done outside the usual mutexes.  This
is because the last reference is gone and there cannot be any
concurrent modifications, but rcu_dereference_protected is
ugly and unobvious.

Instead, check the refcount in kvm_get_bus and __kvm_memslots.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02 22:41:02 +02:00
Bjorn Andersson
1e140df049 remoteproc: qcom: Add support for SSR notifications
This adds the remoteproc part of subsystem restart, which is responsible
for emitting notifications to other processors in the system about a
dying remoteproc instance.

These notifications are propagated to the various communication systems
in the various remote processors to shut down communication links that
was left in a dangling state as the remoteproc was stopped (or crashed).

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2017-08-02 12:43:20 -07:00
Roman Gushchin
1a926e0bba cgroup: implement hierarchy limits
Creating cgroup hierearchies of unreasonable size can affect
overall system performance. A user might want to limit the
size of cgroup hierarchy. This is especially important if a user
is delegating some cgroup sub-tree.

To address this issue, introduce an ability to control
the size of cgroup hierarchy.

The cgroup.max.descendants control file allows to set the maximum
allowed number of descendant cgroups.
The cgroup.max.depth file controls the maximum depth of the cgroup
tree. Both are single value r/w files, with "max" default value.

The control files exist on each hierarchy level (including root).
When a new cgroup is created, we check the total descendants
and depth limits on each level, and if none of them are exceeded,
a new cgroup is created.

Only alive cgroups are counted, removed (dying) cgroups are
ignored.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
2017-08-02 12:05:20 -07:00
Roman Gushchin
0679dee03c cgroup: keep track of number of descent cgroups
Keep track of the number of online and dying descent cgroups.

This data will be used later to add an ability to control cgroup
hierarchy (limit the depth and the number of descent cgroups)
and display hierarchy stats.

Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
2017-08-02 12:05:19 -07:00
WANG Cong
b2f9d432de flow_dissector: remove unused functions
They are introduced by commit f70ea018da
("net: Add functions to get skb->hash based on flow structures")
but never gets used in tree.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:50:03 -07:00
Inbar Karmy
c994f778bb net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support
Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.

Tested:
As accepted, when NIC supports WoL- ethtool reports:
	Supports Wake-on: g
	Wake-on: d
when NIC doesn't support WoL- ethtool reports:
        Supports Wake-on: d
        Wake-on: d

Fixes: 14c07b1358 ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Dmitry Torokhov
581c448476 HID: input: map digitizer battery usage
We already mapped battery strength reports from the generic device
control page, but we did not update capacity from input reports, nor we
mapped the battery strength report from the digitizer page, so let's
implement this now.

Batteries driven by the input reports will now start in "unknown" state,
and will get updated once we receive first report containing battery
strength from the device.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-08-02 10:51:46 +02:00
Boris Brezillon
6d29231000 mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflow
All timings in nand_sdr_timings are expressed in picoseconds but some
of them may not fit in an u32.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: 204e7ecd47 ("mtd: nand: Add a few more timings to nand_sdr_timings")
Reported-by: Alexander Dahl <ada@thorsis.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Dahl <ada@thorsis.com>
Tested-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-08-02 10:26:42 +02:00
Marc Zyngier
a477b9cd37 PCI: Add pci_reset_function_locked()
The implementation of PCI workarounds may require that the device is reset
from its probe function.  This implies that the PCI device lock is already
held, and makes calling pci_reset_function() impossible (since it will
itself try to take that lock).

Add pci_reset_function_locked(), which is the equivalent of
pci_reset_function(), except that it requires the PCI device lock to be
already held by the caller.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[bhelgaas: folded in fix for conflict with 52354b9d1f ("PCI: Remove
__pci_dev_reset() and pci_dev_reset()")]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org	# 4.11: 52354b9d1f: PCI: Remove __pci_dev_reset() and pci_dev_reset()
Cc: stable@vger.kernel.org	# 4.11
2017-08-01 20:11:02 -05:00
Willem de Bruijn
c613c209c3 net: add skb_frag_foreach_page and use with kmap_atomic
Skb frags may contain compound pages. Various operations map frags
temporarily using kmap_atomic, but this function works on single
pages, not whole compound pages. The distinction is only relevant
for high mem pages that require temporary mappings.

Introduce a looping mechanism that for compound highmem pages maps
one page at a time, does not change behavior on other pages.
Use the loop in the kmap_atomic callers in net/core/skbuff.c.

Verified by triggering skb_copy_bits with

    tcpdump -n -c 100 -i ${DEV} -w /dev/null &
    netperf -t TCP_STREAM -H ${HOST}

  and by triggering __skb_checksum with

    ethtool -K ${DEV} tx off

  repeated the tests with looping on a non-highmem platform
  (x86_64) by making skb_frag_must_loop always return true.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 16:07:10 -07:00
Tom Herbert
20bf50de30 skbuff: Function to send an skbuf on a socket
Add skb_send_sock to send an skbuff on a socket within the kernel.
Arguments include an offset so that an skbuf might be sent in mulitple
calls (e.g. send buffer limit is hit).

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:26:18 -07:00
Tom Herbert
306b13eb3c proto_ops: Add locked held versions of sendmsg and sendpage
Add new proto_ops sendmsg_locked and sendpage_locked that can be
called when the socket lock is already held. Correspondingly, add
kernel_sendmsg_locked and kernel_sendpage_locked as front end
functions.

These functions will be used in zero proxy so that we can take
the socket lock in a ULP sendmsg/sendpage and then directly call the
backend transport proto_ops functions.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:26:18 -07:00
Grygorii Strashko
d9535cb7b7 ptp: introduce ptp auxiliary worker
Many PTP drivers required to perform some asynchronous or periodic work,
like periodically handling PHC counter overflow or handle delayed timestamp
for RX/TX network packets. In most of the cases, such work is implemented
using workqueues. Unfortunately, Kernel workqueues might introduce
significant delay in work scheduling under high system load and on -RT,
which could cause misbehavior of PTP drivers due to internal counter
overflow, for example, and there is no way to tune its execution policy and
priority manuallly.

Hence, The kthread_worker can be used insted of workqueues, as it create
separte named kthread for each worker and its its execution policy and
priority can be configured using chrt tool.

This prblem was reported for two drivers TI CPSW CPTS and dp83640, so
instead of modifying each of these driver it was proposed to add PTP
auxiliary worker to the PHC subsystem.

The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker
and kthread_delayed_work and introduces two new PHC subsystem APIs:

- long (*do_aux_work)(struct ptp_clock_info *ptp) callback in
ptp_clock_info structure, which driver should assign if it require to
perform asynchronous or periodic work. Driver should return the delay of
the PTP next auxiliary work scheduling time (>=0) or negative value in case
further scheduling is not required.

- int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which
allows schedule PTP auxiliary work.

The name of kthread_worker thread corresponds PTP PHC device name "ptp%d".

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:22:55 -07:00
Vikas Shivappa
d6aaba615a x86/intel_rdt/cqm: Add tasks file support
The root directory, ctrl_mon and monitor groups are populated
with a read/write file named "tasks". When read, it shows all the task
IDs assigned to the resource group.

Tasks can be added to groups by writing the PID to the file. A task can
be present in one "ctrl_mon" group "and" one "monitor" group. IOW a
PID_x can be seen in a ctrl_mon group and a monitor group at the same
time. When a task is added to a ctrl_mon group, it is automatically
removed from the previous ctrl_mon group where it belonged. Similarly if
a task is moved to a monitor group it is removed from the previous
monitor group . Also since the monitor groups can only have subset of
tasks of parent ctrl_mon group, a task can be moved to a monitor group
only if its already present in the parent ctrl_mon group.

Task membership is indicated by a new field in the task_struct "u32
rmid" which holds the RMID for the task. RMID=0 is reserved for the
default root group where the tasks belong to at mount.

[tony: zero the rmid if rdtgroup was deleted when task was being moved]

Signed-off-by: Tony Luck <tony.luck@linux.intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-16-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01 22:41:24 +02:00
Vikas Shivappa
0734ded1ab x86/intel_rdt: Change closid type from int to u32
OS associates a CLOSid(Class of service id) to a task by writing the
high 32 bits of per CPU IA32_PQR_ASSOC MSR when a task is scheduled in.
CPUID.(EAX=10H, ECX=1):EDX[15:0] enumerates the max CLOSID supported and
it is zero indexed. Hence change the type to u32 from int.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-15-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01 22:41:24 +02:00
Vikas Shivappa
f01d7d51f5 x86/intel_rdt: Introduce a common compile option for RDT
We currently have a CONFIG_RDT_A which is for RDT(Resource directory
technology) allocation based resctrl filesystem interface. As a
preparation to add support for RDT monitoring as well into the same
resctrl filesystem, change the config option to be CONFIG_RDT which
would include both RDT allocation and monitoring code.

No functional change.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01 22:41:19 +02:00
Vikas Shivappa
c39a0e2c88 x86/perf/cqm: Wipe out perf based cqm
'perf cqm' never worked due to the incompatibility between perf
infrastructure and cqm hardware support.  The hardware uses RMIDs to
track the llc occupancy of tasks and these RMIDs are per package. This
makes monitoring a hierarchy like cgroup along with monitoring of tasks
separately difficult and several patches sent to lkml to fix them were
NACKed. Further more, the following issues in the current perf cqm make
it almost unusable:

    1. No support to monitor the same group of tasks for which we do
    allocation using resctrl.

    2. It gives random and inaccurate data (mostly 0s) once we run out
    of RMIDs due to issues in Recycling.

    3. Recycling results in inaccuracy of data because we cannot
    guarantee that the RMID was stolen from a task when it was not
    pulling data into cache or even when it pulled the least data. Also
    for monitoring llc_occupancy, if we stop using an RMID_x and then
    start using an RMID_y after we reclaim an RMID from an other event,
    we miss accounting all the occupancy that was tagged to RMID_x at a
    later perf_count.

    2. Recycling code makes the monitoring code complex including
    scheduling because the event can lose RMID any time. Since MBM
    counters count bandwidth for a period of time by taking snap shot of
    total bytes at two different times, recycling complicates the way we
    count MBM in a hierarchy. Also we need a spin lock while we do the
    processing to account for MBM counter overflow. We also currently
    use a spin lock in scheduling to prevent the RMID from being taken
    away.

    4. Lack of support when we run different kind of event like task,
    system-wide and cgroup events together. Data mostly prints 0s. This
    is also because we can have only one RMID tied to a cpu as defined
    by the cqm hardware but a perf can at the same time tie multiple
    events during one sched_in.

    5. No support of monitoring a group of tasks. There is partial support
    for cgroup but it does not work once there is a hierarchy of cgroups
    or if we want to monitor a task in a cgroup and the cgroup itself.

    6. No support for monitoring tasks for the lifetime without perf
    overhead.

    7. It reported the aggregate cache occupancy or memory bandwidth over
    all sockets. But most cloud and VMM based use cases want to know the
    individual per-socket usage.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01 22:41:18 +02:00
Trond Myklebust
fd40559c86 NFSv4: Fix EXCHANGE_ID corrupt verifier issue
The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit 8d89bd70bc. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.

Fixes: 8d89bd70bc ("NFS setup async exchange_id")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-01 16:28:55 -04:00
Chuck Lever
d31ae25481 sunrpc: Const-ify all instances of struct rpc_xprt_ops
After transport instance creation, these function pointers never
change. Mark them as constant to prevent their use as an attack
vector for code injections.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-01 16:10:35 -04:00
Alexander Sverdlin
c4b3eacc1d mtd: spi-nor: Recover from Spansion/Cypress errors
S25FL{128|256|512}S datasheets say:
"When P_ERR or E_ERR bits are set to one, the WIP bit will remain set to
one indicating the device remains busy and unable to receive new operation
commands. A Clear Status Register (CLSR) command must be received to return
the device to standby mode."

Current spi-nor code works until first error occurs, but write/erase errors
are not just rare hardware failures, they also occur if user tries to flash
write-protected areas. After such attempt no SPI command can be executed
any more and even read fails. This patch adds support for P_ERR and E_ERR
bits in Status Register 1 (so that operation fails immediately and not
after a long timeout) and proper recovery from the error condition.

Tested on Spansion S25FS128S, which is supported by S25FL129P entry.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
2017-08-01 21:15:33 +02:00
Kees Cook
2af6228026 LSM: drop bprm_secureexec hook
This removes the bprm_secureexec hook since the logic has been folded into
the bprm_set_creds hook for all LSMs now.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:10 -07:00
Kees Cook
ee67ae7ef6 commoncap: Move cap_elevated calculation into bprm_set_creds
Instead of a separate function, open-code the cap_elevated test, which
lets us entirely remove bprm->cap_effective (to use the local "effective"
variable instead), and more accurately examine euid/egid changes via the
existing local "is_setid".

The following LTP tests were run to validate the changes:

	# ./runltp -f syscalls -s cap
	# ./runltp -f securebits
	# ./runltp -f cap_bounds
	# ./runltp -f filecaps

All kernel selftests for capabilities and exec continue to pass as well.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
2017-08-01 12:03:09 -07:00
Kees Cook
46d98eb4e1 commoncap: Refactor to remove bprm_secureexec hook
The commoncap implementation of the bprm_secureexec hook is the only LSM
that depends on the final call to its bprm_set_creds hook (since it may
be called for multiple files, it ignores bprm->called_set_creds). As a
result, it cannot safely _clear_ bprm->secureexec since other LSMs may
have set it.  Instead, remove the bprm_secureexec hook by introducing a
new flag to bprm specific to commoncap: cap_elevated. This is similar to
cap_effective, but that is used for a specific subset of elevated
privileges, and exists solely to track state from bprm_set_creds to
bprm_secureexec. As such, it will be removed in the next patch.

Here, set the new bprm->cap_elevated flag when setuid/setgid has happened
from bprm_fill_uid() or fscapabilities have been prepared. This temporarily
moves the bprm_secureexec hook to a static inline. The helper will be
removed in the next patch; this makes the step easier to review and bisect,
since this does not introduce any changes to inputs nor outputs to the
"elevated privileges" calculation.

The new flag is merged with the bprm->secureexec flag in setup_new_exec()
since this marks the end of any further prepare_binprm() calls.

Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:08 -07:00