Commit graph

65589 commits

Author SHA1 Message Date
Shaohua Li
69fd5c3917 blktrace: add an option to allow displaying cgroup path
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.

with the option enabled, blktrace will output something like this:
dd-1353  [007] d..2   293.015252:   8,0   /test/level  D   R 24 + 8 [dd]

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Shaohua Li
007cc56b7e block: always attach cgroup info into bio
blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
info.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Shaohua Li
121508df44 cgroup: export fhandle info for a cgroup
Add an API to export cgroup fhandle info. We don't export a full 'struct
file_handle', there are unrequired info. Sepcifically, cgroup is always
a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle,
we only need export the inode number and generation number just like
what generic_fh_to_dentry does. And we can avoid the overhead of getting
an inode too, since kernfs_node_id (ino and generation) has all the info
required.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Shaohua Li
aa81882534 kernfs: add exportfs operations
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to filter info for the
cgroup.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Shaohua Li
c53cd490b1 kernfs: introduce kernfs_node_id
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Shaohua Li
4a3ef68aca kernfs: implement i_generation
Set i_generation for kernfs inode. This is required to implement
exportfs operations. The generation is 32-bit, so it's possible the
generation wraps up and we find stale files. To reduce the posssibility,
we don't reuse inode numer immediately. When the inode number allocation
wraps, we increase generation number. In this way generation/inode
number consist of a 64-bit number which is unlikely duplicated. This
does make the idr tree more sparse and waste some memory. Since idr
manages 32-bit keys, idr uses a 6-level radix tree, each level covers 6
bits of the key. In a 100k inode kernfs, the worst case will have around
300k radix tree node. Each node is 576bytes, so the tree will use about
~150M memory. Sounds not too bad, if this really is a problem, we should
find better data structure.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Shaohua Li
7d35079f82 kernfs: use idr instead of ida to manage inode number
kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Chris Wilson
db1fc97ca0 dma-buf/sync_file: Allow multiple sync_files to wrap a single dma-fence
Up until recently sync_file were create to export a single dma-fence to
userspace, and so we could canabalise a bit insie dma-fence to mark
whether or not we had enable polling for the sync_file itself. However,
with the advent of syncobj, we do allow userspace to create multiple
sync_files for a single dma-fence. (Similarly, that the sw-sync
validation framework also started returning multiple sync-files wrapping
a single dma-fence for a syncpt also triggering the problem.)

This patch reverts my suggestion in commit e241655373
("dma-buf/sync_file: only enable fence signalling on poll()") to use a
single bit in the shared dma-fence and restores the sync_file->flags for
tracking the bits individually.

Reported-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Fixes: f1e8c67123 ("dma-buf/sw-sync: Use an rbtree to sort fences in the timeline")
Fixes: e9083420bb ("drm: introduce sync objects (v4)")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: dri-devel@lists.freedesktop.org
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170728212951.7818-1-chris@chris-wilson.co.uk
2017-07-29 11:10:52 -03:00
Jeff Layton
80aafd50b6 Documentation: add some docs for errseq_t
...and fix up a few comments in the code.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-29 09:01:02 -04:00
Linus Torvalds
8155469341 s390: SRCU fix.
PPC: host crash fixes.
 x86: bugfixes, including making nested posted interrupts really work.
 Generic: tweaks to kvm_stat and to uevents
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZe2EYAAoJEL/70l94x66D4JYH/AnvioKWTsplhUKt4Y4JlpJX
 EXYjQd/CIZ+MHNNUH+U+XEj6tKQymKrz4TeZSs1o0nyxCeyparR3gK27OYVpPspN
 GkPSit3hyRgW9r5uXp6pZCJuFCAMpMZ6z4sKbT1FxDhnWnpWayV9w8KA+yQT/UUX
 dNQ9JJPUxApcM4NCaj2OCQ8K1koNIDCc52+jATf0iK/Heiaf6UGqCcHXUIy5I5wM
 OWk05Qm32VBAYb6P6FfoyGdLMNAAkJtr1fyOJDkxX730CYgwpjIP0zifnJ1bt8V2
 YRnjvPO5QciDHbZ8VynwAkKi0ZAd8psjwXh0KbyahPL/2/sA2xCztMH25qweriI=
 =fsfr
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "s390:
   - SRCU fix

  PPC:
   - host crash fixes

  x86:
   - bugfixes, including making nested posted interrupts really work

  Generic:
   - tweaks to kvm_stat and to uevents"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: LAPIC: Fix reentrancy issues with preempt notifiers
  tools/kvm_stat: add '-f help' to get the available event list
  tools/kvm_stat: use variables instead of hard paths in help output
  KVM: nVMX: Fix loss of L2's NMI blocking state
  KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
  x86: irq: Define a global vector for nested posted interrupts
  KVM: x86: do mask out upper bits of PAE CR3
  KVM: make pid available for uevents without debugfs
  KVM: s390: take srcu lock when getting/setting storage keys
  KVM: VMX: remove unused field
  KVM: PPC: Book3S HV: Fix host crash on changing HPT size
  KVM: PPC: Book3S HV: Enable TM before accessing TM registers
2017-07-28 13:36:56 -07:00
Linus Torvalds
3d9d7405c0 arm64 fixes:
- Ensure we have a guard page after the kernel image in vmalloc
 
 - Fix incorrect prefetch stride in copy_page
 
 - Ensure irqs are disabled in die()
 
 - Fix for event group validation in QCOM L2 PMU driver
 
 - Fix requesting of PMU IRQs on AMD Seattle
 
 - Minor cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJZey1iAAoJELescNyEwWM0w/0H/1RaHFUSoFUIoL+qFD0eGXcp
 hORI0sIHrUlHRONTFYMTyNko7kxELz5aDm6pc87dzBUNoUq3gxhqeEa0zsmwOPsQ
 m4iDa7r9xXT+nBITe2auAg6miEMX7Ym448dDrIyKNcRK+2SyZoFqS0vr8UVqs1P/
 NwdFGgpKHbV4r1Jeoosom+n7VnuyE0vYBKo8TlRks6NvQJoh2duiPkL+AsBgCfBq
 fznck7jIPL4z4kf4Fp/Yz1QsmMhkDSidPmGD/m97Bj4wvEbMwf0u8Dnv1tySK5wx
 NwKeN0Dn7JphtL5c5j+OGiri7gTcswjxHJ9f6d0Ez+2TwnjWFM6JNQ+xdVqFcxc=
 =EpS9
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "I'd been collecting these whilst we debugged a CPU hotplug failure,
  but we ended up diagnosing that one to tglx, who has taken a fix via
  the -tip tree separately.

  We're seeing some NFS issues that we haven't gotten to the bottom of
  yet, and we've uncovered some issues with our backtracing too so there
  might be another fixes pull before we're done.

  Summary:

   - Ensure we have a guard page after the kernel image in vmalloc

   - Fix incorrect prefetch stride in copy_page

   - Ensure irqs are disabled in die()

   - Fix for event group validation in QCOM L2 PMU driver

   - Fix requesting of PMU IRQs on AMD Seattle

   - Minor cleanups and fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mmu: Place guard page after mapping of kernel image
  drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
  arm64: sysreg: Fix unprotected macro argmuent in write_sysreg
  perf: qcom_l2: fix column exclusion check
  arm64/lib: copy_page: use consistent prefetch stride
  arm64/numa: Drop duplicate message
  perf: Convert to using %pOF instead of full_name
  arm64: Convert to using %pOF instead of full_name
  arm64: traps: disable irq in die()
  arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
  arm64: uaccess: Remove redundant __force from addr cast in __range_ok
2017-07-28 13:29:36 -07:00
Linus Torvalds
1731a47444 - A few DM integrity fixes that improve performance. One that address
inefficiencies in the on-disk journal device layout.  Another that
   makes use of the block layer's on-stack plugging when writing the
   journal.
 
 - A dm-bufio fix for the blk_status_t conversion that went in during the
   merge window.
 
 - A few DM raid fixes that address correctness when suspending the
   device and a validation fix for validation that occurs during device
   activation.
 
 - A couple DM zoned target fixes.  Important one being the fix to not
   use GFP_KERNEL in the IO path due to concerns about deadlock in
   low-memory conditions (e.g. swap over a DM zoned device, etc).
 
 - A DM DAX device fix to make sure dm_dax_flush() is called if the
   underlying DAX device is operating as a write cache.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZe13OAAoJEMUj8QotnQNav/gIAMXMUbXlYHVikVNq+6rNkXRk
 FlsltNcJEDeZCit0nJd/2nOWGpssXdz+7cJTUU28Kp+3IscIolSHS51bzfSFI05V
 7LbYqEX1EdXkTwEeYfHlAoOexvj4oarpAWWQF/ACU8rHCruaqfqIa57mstxLoyDY
 XcxsIY/fds6GZViLB0MD/jBAKaLWX90aFZ9MQcF7AmdpMr56kCO2PUhiqHcrN47t
 BjH7E5QSKGl2pMND1bR6pleWFw8HB7h82Qjaasd5bQuVWseQ4u9Illxny6bhhk2E
 BiEWjzFvZB+JL1zl7JIXnBjhdmbwgAVvoW6EqHuVzHuR0X8gylBF2gDLnSzUZu4=
 =3MxS
 -----END PGP SIGNATURE-----

Merge tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - a few DM integrity fixes that improve performance. One that address
   inefficiencies in the on-disk journal device layout. Another that
   makes use of the block layer's on-stack plugging when writing the
   journal.

 - a dm-bufio fix for the blk_status_t conversion that went in during
   the merge window.

 - a few DM raid fixes that address correctness when suspending the
   device and a validation fix for validation that occurs during device
   activation.

 - a couple DM zoned target fixes. Important one being the fix to not
   use GFP_KERNEL in the IO path due to concerns about deadlock in
   low-memory conditions (e.g. swap over a DM zoned device, etc).

 - a DM DAX device fix to make sure dm_dax_flush() is called if the
   underlying DAX device is operating as a write cache.

* tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm, dax: Make sure dm_dax_flush() is called if device supports it
  dm verity fec: fix GFP flags used with mempool_alloc()
  dm zoned: use GFP_NOIO in I/O path
  dm zoned: remove test for impossible REQ_OP_FLUSH conditions
  dm raid: bump target version
  dm raid: avoid mddev->suspended access
  dm raid: fix activation check in validate_raid_redundancy()
  dm raid: remove WARN_ON() in raid10_md_layout_to_format()
  dm bufio: fix error code in dm_bufio_write_dirty_buffers()
  dm integrity: test for corrupted disk format during table load
  dm integrity: WARN_ON if variables representing journal usage get out of sync
  dm integrity: use plugging when writing the journal
  dm integrity: fix inefficient allocation of journal space
2017-07-28 12:17:17 -07:00
Linus Torvalds
0fa8dc423c Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A small collection of fixes that should go into this series. This
  contains:

   - NVMe pull request from Christoph, with various fixes for nvme
     proper and nvme-fc.

   - disable runtime PM for blk-mq for now.

     With scsi now defaulting to using blk-mq, this reared its head as
     an issue. Longer term we'll fix up runtime PM for blk-mq, for now
     just disable it to prevent a hang on laptop resume for some folks.

   - blk-mq CPU <-> hw queue map fix from Christoph.

   - xen/blkfront pull request from Konrad, with two small fixes for the
     blkfront driver.

   - a few fixups for nbd from Joseph.

   - a stable fix for pblk from Javier"

* 'for-linus' of git://git.kernel.dk/linux-block:
  lightnvm: pblk: advance bio according to lba index
  nvme: validate admin queue before unquiesce
  nbd: clear disconnected on reconnect
  nvme-pci: fix HMB size calculation
  nvme-fc: revise TRADDR parsing
  nvme-fc: address target disconnect race conditions in fcp io submit
  nvme: fabrics commands should use the fctype field for data direction
  nvme: also provide a UUID in the WWID sysfs attribute
  xen/blkfront: always allocate grants first from per-queue persistent grants
  xen-blkfront: fix mq start/stop race
  blk-mq: map queues to all present CPUs
  block: disable runtime-pm for blk-mq
  xen-blkfront: Fix handling of non-supported operations
  nbd: only set sndtimeo if we have a timeout set
  nbd: take tx_lock before disconnecting
  nbd: allow multiple disconnects to be sent
2017-07-28 12:13:34 -07:00
Linus Torvalds
a2d48756ca MMC host:
- sunxi: Correct time phase settings
  - omap_hsmmc: Clean up some dead code
  - dw_mmc: Fix message printed for deprecated num-slots DT binding
  - dw_mmc: Fix DT documentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZeu0JAAoJEP4mhCVzWIwpECAP/0kf5o3lEneUOqSGE9d1EszW
 GpHgX0hA1+FaQpBUhGAfFCjmNigsH8rwz8dOv17PX8iFyQmal0ivgC7DXJPG2Rq1
 ofC4MwfWzZE0AKRSLButvRyNHqmwZsSM8nlFPwAtsINktJCx/WhSr6OS5pNEdz/j
 1tEGDgLzBiq9Yd3FHf07KPPkMhxut0eI1gXke8pRgFkLQIwU4/8zFb6450w0RIxQ
 BtmqEEK0p3cyZLN/FxpyMG6ZVmypTUiMFX9G0xkcKdsTxGqnYpWvCFuEbEtx6vbU
 5IjjKc2oINMs3z53tRiN/vQaSuZMn1O4dKHydADxP68Pm/ff09+pgvnpTV4D83RV
 /gw9olO1Y//ONCT+p/k2fHhOlLUa4YY2+SUCN7VZAqP5gYEjtH9/doOoWND//WPA
 BhFcZsWBoDva+M3OC2wNnZb5aCERVLuHPl3NhdiOpxyGoEEG1c0MvhegaUI4Rm0K
 hoVyuXqWsbu+3A3H+biELb0VEIlgELkCIRh7mjKSq8oicPN7PtymhAgfGyxcCkn+
 qcvlN0UxcJjYxYTXzKjEaTiCHeel0UB3toCWdfq+L5znHgRMjZoxYM/tUhF8+rts
 wTcS/NkLz4DGJGQavfJvdawddafCbFnoL19KnWd2Wl3RxNOrQ2lnvfBODDuPrVoz
 IJqzwddWO/w6gTDT5Xt6
 =xWHL
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "Here are a couple of mmc fixes intended for v4.13-rc1.

  I have also included a couple of cleanup patches in this pull request
  for OMAP2+, related to the omap_hsmmc driver. The reason is because of
  the changes are also depending on OMAP SoC specific code, so this
  simplifies how to deal with this.

  Summary:

  MMC host:
   - sunxi: Correct time phase settings
   - omap_hsmmc: Clean up some dead code
   - dw_mmc: Fix message printed for deprecated num-slots DT binding
   - dw_mmc: Fix DT documentation"

* tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  Documentation: dw-mshc: deprecate num-slots
  mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
  mmc: host: omap_hsmmc: remove unused platform callbacks
  ARM: OMAP2+: hsmmc.c: Remove dead code
  mmc: sunxi: Keep default timing phase settings for new timing mode
2017-07-28 12:04:36 -07:00
Josh Poimboeuf
649ea4d5a6 objtool: Assume unannotated UD2 instructions are dead ends
Arnd reported some false positive warnings with GCC 7:

  drivers/hid/wacom_wac.o: warning: objtool: wacom_bpt3_touch()+0x2a5: stack state mismatch: cfa1=7+8 cfa2=6+16
  drivers/iio/adc/vf610_adc.o: warning: objtool: vf610_adc_calculate_rates() falls through to next function vf610_adc_sample_set()
  drivers/pwm/pwm-hibvt.o: warning: objtool: hibvt_pwm_get_state() falls through to next function hibvt_pwm_remove()
  drivers/pwm/pwm-mediatek.o: warning: objtool: mtk_pwm_config() falls through to next function mtk_pwm_enable()
  drivers/spi/spi-bcm2835.o: warning: objtool: .text: unexpected end of section
  drivers/spi/spi-bcm2835aux.o: warning: objtool: .text: unexpected end of section
  drivers/watchdog/digicolor_wdt.o: warning: objtool: dc_wdt_get_timeleft() falls through to next function dc_wdt_restart()

When GCC 7 detects a potential divide-by-zero condition, it sometimes
inserts a UD2 instruction for the case where the divisor is zero,
instead of letting the hardware trap on the divide instruction.

Objtool doesn't consider UD2 to be fatal unless it's annotated with
unreachable().  So it considers the GCC-generated UD2 to be non-fatal,
and it tries to follow the control flow past the UD2 and gets
confused.

Previously, objtool *did* assume UD2 was always a dead end.  That
changed with the following commit:

  d1091c7fa3 ("objtool: Improve detection of BUG() and other dead ends")

The motivation behind that change was that Peter was planning on using
UD2 for __WARN(), which is *not* a dead end.  However, it turns out
that some emulators rely on UD2 being fatal, so he ended up using
'ud0' instead:

  9a93848fe7 ("x86/debug: Implement __WARN() using UD0")

For GCC 4.5+, it should be safe to go back to the previous assumption
that UD2 is fatal, even when it's not annotated with unreachable().

But for pre-4.5 versions of GCC, the unreachable() macro isn't
supported, so such cases of UD2 need to be explicitly annotated as
reachable.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d1091c7fa3 ("objtool: Improve detection of BUG() and other dead ends")
Link: http://lkml.kernel.org/r/e57fa9dfede25f79487da8126ee9cdf7b856db65.1501188854.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-28 08:33:32 +02:00
Greg Kroah-Hartman
98c2f10d23 First round of IIO new device support, features and cleanups for the 4.14 cycle.
4 completely new drivers in this set and plenty of other stuff.
 
 One ABI change due to a silly mistake a long time back. Hopefully no
 one will notice.  It effects the numerical order of consumer device
 channels which was the reverse of the obvious.  It's going the slow
 way to allow us some margin to spot if we have broken userspace or
 not (seems unlikely)
 
 New Device Support
 * ccs811
   - new driver for the Volatile Organic Compounds (VOC) sensor.
 * dln2 adc
   - new driver for the ADC on this flexible usb board.
 * EP93xx
   - new driver for this Cirrus logic SoC ADC.
 * ltc2471
   - new ADC driver support the ltc2471 and ltc2473
 * st_accel
   - add trivial table entries to support H3LIS331DL, LIS331DL, LIS3LV02DL.
 * st_gyro
   - add L3GD20H support (again) having fixed the various things that were
     broken in the first try.  Includes devicetree binding.
 * stm32 dac
   - add support for the DACs in the STM32F4 series
 
 Features
 * Documentation
   - add missing power attribute documentation to the ABI docs.
 * at91-sama5d2
   - add hardware trigger and buffered capture support with bindings.
   - suspend and resume functionality.
 * bmc150
   - support for the BOSC0200 ACPI device id seen on some tablets.
 * hdc100x
   - devicetree bindings
   - document supported devices
   - match table and device ids.
 * hts221
   - support active low interrupts (with bindings)
   - open drain mode with bindings.
 * htu21
   - OF match table and bindings.
 * lsm6dsx
   - open drain mode with bindings
 * ltc2497
   - add support for board file based consumer mapping.
 * ms5367
   - OF match table and bindings.
 * mt7622
   - binding document and OF match table.
   - suspend and resume support.
 * rpr0521
   - triggered buffer support.
 * tsys01
   - OF match table and bindings.
 
 Cleanups and minor fixes
 * core
   - fix ordering of IIO channels to entry numbers when using
     iio_map_array_register rather than reversing them.
   - use the new %pOF format specifier rather than full name for the
     device tree nodes.
 * ad7280a
   - fix potential issue with macro argument reuse.
 * ad7766
   - drop a pointless NULL value check as it's done in the gpiod code.
 * adis16400
   - unsigned -> unsigned int.
 * at91 adc
   - make some init data static to reduce code size.
 * at91-sama5d2 ADC
   - make some init data static to reduce code size.
 * da311
   - make some init data static to reduce code size.
 * hid-sensor-rotation
   - drop an unnecessary static.
 * hts221
   - refactor the write_with_mask code.
   - move the BDU configuration to probe time as there is no reason for it
     to change.
   - avoid overwriting reserved data during power-down.  This is a fix, but
     the infrastructure need was too invasive to send it to mainline except
     in a merge window.  It's not a regression as it was always wrong.
   - avoid reconfigure the sampling frequency multiple times by just
     doing it in the write_raw function directly.
   - refactor the power_on/off calls into a set_enable.
   - move the dry-enable logic into trig_set_state as that is the only
     place it was used.
 * ina219
   - fix polling of ina226 conversion ready flag.
 * imx7d
   - add vendor name in kconfig for consistency with similar parts.
 * mcp3422
   - Change initial channel to 0 as it feels more logical.
   - Check for some errors in probe.
 * meson-saradc
   - add a check of of_match_device return value.
 * mpu3050
   - allow open drain for any interrupt type.
 * rockchip adc
   - add check on of_match_device return value.
 * sca3000
   - drop a trailing whitespace.
 * stm32 adc
   - make array stm32h7_adc_ckmodes_spec static.
 * stm32 dac
   - fix an error message.
 * stm32 timers
   - fix clock name in docs to match reality after changes.
 * st_accel
   - explicit OF table (spi).
   - add missing entries to OF table (i2c).
   - rename of_device_id table to drop the part name.
   - adding missing lis3l02dq entry to bindings.
   - rename H3LIS331DL_DRIVER_NAME to line up with similar entries in driver.
 * st_gyro
   - explicit OF table (spi).
 * st_magn
   - explicit OF table (spi).
   - enable multiread for lis3mdl.
 * st_pressure
   - explicit OF table (spi).
 * st_sensors common.
   - move st_sensors_of_i2c_probe and rename to make it available for spi
   drivers.
 * tsc3472
   - don't write an extra byte when writing the ATIME register.
   - add a link to the datasheet.
 * tsl2x7x - continued staging cleanups
   - add of_match_table.
   - drop redundant power_state sysfs attribute.
   - drop wrapper tsl2x7x_i2c_read.
   - clean up i2c calls made in tsl2x7x_als_calibrate.
   - refactor the read and write _event_value callbacks to handle additional
     elements.
   - use usleep_range instead of mdelay.
   - check return value from tsl2x7x_invoke_change.
 * zpa2326
   - add some newline to the end of logging macros.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAll3oWIRHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0FoimeBAAs2WBPr8kQZeDfdjNS5isk+GoBJ4btHDL
 +BpBODylDku/WIJb7hpBymH2Xs/7qUuYLYwz0P1XVrSo/1kK+krJQR0DUwcracfy
 gqMS4KEmgWbhmBV2ksPfCGcoT5rimxLUqpka5+WWAszNtwi7YOt5FadTb9yK4WdG
 Di5AzaVLKAUBpQrrHdFPXewxenVs1P/X0ES7fSNU1SIL2bRAaPDj9duu3URivt9l
 XRh7qqNpuNMIQ3MmeEeLDJkyeeeWHYdnps/XDfW0i5VElxwZImTPD+AFAoc2E51J
 pujvXlu1a6FgMH+hp7hbOxNuf3eKlIq9mrfGre4K6DkTB0gro3oU2bCa5BEq/be1
 PrKQZsfkK0KmrLCh0UqwTTcWdorOfussWpZ6Ib4/l4JQEeII/odwyZJ3vHNlm0Gy
 0n/TVfNVQEY2zLswWdUOaQ2bvLGaXoeIBH0sMtCPOKks/4u692qZg3LAC1g5mEKF
 4ykKG8oJ/UvWcqb00Z1H1qkT+B0ZmOEZzf6M7rFlPKr48DHrbu1YZcoaPYoBxeWP
 nL9S0zdygs06HI3/5Rl4Vv7HMTCbKabOp7eamW3IDqRpcW6dLzsXrG/e3YFbvMxq
 sGfd3g8jXaf1rlA9qP5FOHdQM2ySoi/WliwShH46LTY17sHSwQd46FYDKn4ZMT4q
 6oMBnYOeadg=
 =imPz
 -----END PGP SIGNATURE-----

Merge tag 'iio-for-4.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next

Jonathan writes:

First round of IIO new device support, features and cleanups for the 4.14 cycle.

4 completely new drivers in this set and plenty of other stuff.

One ABI change due to a silly mistake a long time back. Hopefully no
one will notice.  It effects the numerical order of consumer device
channels which was the reverse of the obvious.  It's going the slow
way to allow us some margin to spot if we have broken userspace or
not (seems unlikely)

New Device Support
* ccs811
  - new driver for the Volatile Organic Compounds (VOC) sensor.
* dln2 adc
  - new driver for the ADC on this flexible usb board.
* EP93xx
  - new driver for this Cirrus logic SoC ADC.
* ltc2471
  - new ADC driver support the ltc2471 and ltc2473
* st_accel
  - add trivial table entries to support H3LIS331DL, LIS331DL, LIS3LV02DL.
* st_gyro
  - add L3GD20H support (again) having fixed the various things that were
    broken in the first try.  Includes devicetree binding.
* stm32 dac
  - add support for the DACs in the STM32F4 series

Features
* Documentation
  - add missing power attribute documentation to the ABI docs.
* at91-sama5d2
  - add hardware trigger and buffered capture support with bindings.
  - suspend and resume functionality.
* bmc150
  - support for the BOSC0200 ACPI device id seen on some tablets.
* hdc100x
  - devicetree bindings
  - document supported devices
  - match table and device ids.
* hts221
  - support active low interrupts (with bindings)
  - open drain mode with bindings.
* htu21
  - OF match table and bindings.
* lsm6dsx
  - open drain mode with bindings
* ltc2497
  - add support for board file based consumer mapping.
* ms5367
  - OF match table and bindings.
* mt7622
  - binding document and OF match table.
  - suspend and resume support.
* rpr0521
  - triggered buffer support.
* tsys01
  - OF match table and bindings.

Cleanups and minor fixes
* core
  - fix ordering of IIO channels to entry numbers when using
    iio_map_array_register rather than reversing them.
  - use the new %pOF format specifier rather than full name for the
    device tree nodes.
* ad7280a
  - fix potential issue with macro argument reuse.
* ad7766
  - drop a pointless NULL value check as it's done in the gpiod code.
* adis16400
  - unsigned -> unsigned int.
* at91 adc
  - make some init data static to reduce code size.
* at91-sama5d2 ADC
  - make some init data static to reduce code size.
* da311
  - make some init data static to reduce code size.
* hid-sensor-rotation
  - drop an unnecessary static.
* hts221
  - refactor the write_with_mask code.
  - move the BDU configuration to probe time as there is no reason for it
    to change.
  - avoid overwriting reserved data during power-down.  This is a fix, but
    the infrastructure need was too invasive to send it to mainline except
    in a merge window.  It's not a regression as it was always wrong.
  - avoid reconfigure the sampling frequency multiple times by just
    doing it in the write_raw function directly.
  - refactor the power_on/off calls into a set_enable.
  - move the dry-enable logic into trig_set_state as that is the only
    place it was used.
* ina219
  - fix polling of ina226 conversion ready flag.
* imx7d
  - add vendor name in kconfig for consistency with similar parts.
* mcp3422
  - Change initial channel to 0 as it feels more logical.
  - Check for some errors in probe.
* meson-saradc
  - add a check of of_match_device return value.
* mpu3050
  - allow open drain for any interrupt type.
* rockchip adc
  - add check on of_match_device return value.
* sca3000
  - drop a trailing whitespace.
* stm32 adc
  - make array stm32h7_adc_ckmodes_spec static.
* stm32 dac
  - fix an error message.
* stm32 timers
  - fix clock name in docs to match reality after changes.
* st_accel
  - explicit OF table (spi).
  - add missing entries to OF table (i2c).
  - rename of_device_id table to drop the part name.
  - adding missing lis3l02dq entry to bindings.
  - rename H3LIS331DL_DRIVER_NAME to line up with similar entries in driver.
* st_gyro
  - explicit OF table (spi).
* st_magn
  - explicit OF table (spi).
  - enable multiread for lis3mdl.
* st_pressure
  - explicit OF table (spi).
* st_sensors common.
  - move st_sensors_of_i2c_probe and rename to make it available for spi
  drivers.
* tsc3472
  - don't write an extra byte when writing the ATIME register.
  - add a link to the datasheet.
* tsl2x7x - continued staging cleanups
  - add of_match_table.
  - drop redundant power_state sysfs attribute.
  - drop wrapper tsl2x7x_i2c_read.
  - clean up i2c calls made in tsl2x7x_als_calibrate.
  - refactor the read and write _event_value callbacks to handle additional
    elements.
  - use usleep_range instead of mdelay.
  - check return value from tsl2x7x_invoke_change.
* zpa2326
  - add some newline to the end of logging macros.
2017-07-27 21:29:49 -07:00
Eugenia Emantayev
fa3676885e net/mlx5e: Add field select to MTPPS register
In order to mark relevant fields while setting the MTPPS register
add field select. Otherwise it can cause a misconfiguration in
firmware.

Fixes: ee7f12205a ('net/mlx5e: Implement 1PPS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-27 16:40:17 +03:00
Eugenia Emantayev
0b794ffae7 net/mlx5: Fix mlx5_ifc_mtpps_reg_bits structure size
Fix miscalculation in reserved_at_1a0 field.

Fixes: ee7f12205a ('net/mlx5e: Implement 1PPS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-27 16:40:17 +03:00
Thomas Gleixner
8397913303 genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
That commit was part of the changes moving x86 to the generic CPU hotplug
interrupt migration code. The force flag was required on x86 before the
hierarchical irqdomain rework, but invoking set_affinity() with force=true
stayed and had no side effects.

At some point in the past, the force flag got repurposed to support the
exynos timer interrupt affinity setting to a not yet online CPU, so the
interrupt controller callback does not verify the supplied affinity mask
against cpu_online_mask.

Setting the flag in the CPU hotplug code causes the cpu online masking to
be blocked on these irq controllers and results in potentially affining an
interrupt to the CPU which is unplugged, i.e. instead of moving it away,
it's just reassigned to it.

As the force flags is not longer needed on x86, it's safe to revert that
patch so the ARM irqchips which use the force flag work again.

Add comments to that effect, so this won't happen again.

Note: The online mask handling should be done in the generic code and the
force flag and the masking in the irq chips removed all together, but
that's not a change possible for 4.13. 

Fixes: 77f85e66aa ("genirq/cpuhotplug: Set force affinity flag on hotplug migration")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707271217590.3109@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-07-27 15:40:02 +02:00
Jason Gerecke
fc2237a724 HID: introduce hid_is_using_ll_driver
Although HID itself is transport-agnostic, occasionally a driver may
want to interact with the low-level transport that a device is connected
through. To do this, we need to know what kind of bus is in use. The
first guess may be to look at the 'bus' field of the 'struct hid_device',
but this field may be emulated in some cases (e.g. uhid).

More ideally, we can check which ll_driver a device is using. This
function introduces a 'hid_is_using_ll_driver' function and makes the
'struct hid_ll_driver' of the four most common transports accessible
through hid.h.

Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Acked-By: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-07-27 15:14:28 +02:00
Will Deacon
a3287c41ff drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
Since the PMU register interface is banked per CPU, CPU PMU interrrupts
cannot be handled by a CPU other than the one with the PMU asserting the
interrupt. This means that migrating PMU SPIs, as we do during a CPU
hotplug operation doesn't make any sense and can lead to the IRQ being
disabled entirely if we route a spurious IRQ to the new affinity target.

This has been observed in practice on AMD Seattle, where CPUs on the
non-boot cluster appear to take a spurious PMU IRQ when coming online,
which is routed to CPU0 where it cannot be handled.

This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
affinity prior to requesting them, ensuring that they cannot
be migrated during hotplug events. This interacts badly with the DB8500
erratum workaround that ping-pongs the interrupt affinity from the handler,
so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
to be overridden in the platdata.

Fixes: 3cf7ee98b8 ("drivers/perf: arm_pmu: move irq request/free into probe")
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-27 13:43:22 +01:00
Rahul Verma
41822878b2 qed: enhanced per queue max coalesce value.
Maximum coalesce per Rx/Tx queue is extended from
255 to 511.

Signed-off-by: Rahul Verma <rahul.verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:05:22 -07:00
Rahul Verma
bf5a94bfe2 qed: Read per queue coalesce from hardware
Retrieve the actual coalesce value from hardware for every Rx/Tx
queue, instead of Rx/Tx coalesce value cached during set coalesce.

Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:05:22 -07:00
Rahul Verma
477f2d1460 qed: Add support for vf coalesce configuration.
This patch add the ethtool support to set RX/Tx coalesce
value to the VF associated Rx/Tx queues.

Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:05:22 -07:00
Sudarsana Reddy Kalluru
645874e580 qed: Add support for Energy efficient ethernet.
The patch adds required driver support for reading/configuring the
Energy Efficient Ethernet (EEE) parameters.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:05:22 -07:00
Thomas Gleixner
f9f22a8691 scsi: bnx2i: Simplify cpu hotplug code
The CPU hotplug related code of this driver can be simplified by:

1) Consolidating the callbacks into a single state. The CPU thread can be
   torn down on the CPU which goes offline. There is no point in delaying
   that to the CPU dead state

2) Let the core code invoke the online/offline callbacks and remove the
   extra for_each_online_cpu() loops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-26 21:51:25 -04:00
Thomas Gleixner
1937f8a29f scsi: bnx2fc: Simplify CPU hotplug code
The CPU hotplug related code of this driver can be simplified by:

1) Consolidating the callbacks into a single state. The CPU thread can be
   torn down on the CPU which goes offline. There is no point in delaying
   that to the CPU dead state

2) Let the core code invoke the online/offline callbacks and remove the
   extra for_each_online_cpu() loops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-26 21:51:25 -04:00
Dave Airlie
0eb2c0ae57 Linux 4.13-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZdS4PAAoJEHm+PkMAQRiGbEYH/2mukTPOUAfNoWaVjO2YHxuL
 5yI3n1838tKIJm967IUmGdckN/RYGPjJxvZ+muXN2/rv23+9j3LVq9vQcsYqRQop
 vrWP+hvGGJvOGJ2NYBDB+4AUrPPdeX9stolwyAcYvyCZ8AilPIovm4s2poA+fuQX
 D78c8JSfpse32oc93dy4bUz3mRFKTeufstrWEuzqXI691mthF2G9EpA0R3hlbqv+
 GiUnNcZVOnOuCt/47GnpWVKsyv91l3CkGq3bV1GSUi8a/1PnyFxHQxQI/qgbkLXs
 NuswRupSeLDQKRgiDLgWF/BpdHEp4dpFFWXm00KWlgxeGSQnKat9bpW/d5OgnhA=
 =mv3H
 -----END PGP SIGNATURE-----

Backmerge tag 'v4.13-rc2' into drm-next

Linux 4.13-rc2

This is required for drm-misc fixing.
2017-07-27 08:15:43 +10:00
Dennis Zhou (Facebook)
b185cd0dc6 percpu: update free path to take advantage of contig hints
The bitmap allocator must keep metadata consistent. The easiest way is
to scan after every allocation for each affected block and the entire
chunk. This is rather expensive.

The free path can take advantage of current contig hints to prevent
scanning within the start and end block.  If a scan is needed, it can
be done by scanning backwards from the start and forwards from the end
to identify the entire free area this can be combined with. The blocks
can then be updated by some basic checks rather than complete block
scans.

A chunk scan happens when the freed area makes a page free, a block
free, or spans across blocks. This is necessary as the contig hint at
this point could span across blocks. The check uses the minimum of page
size and the block size to allow for variable sized blocks. There is a
tradeoff here with not updating after every free. It is possible a
contig hint in one block can be merged with the contig hint in the next
block. This means the contig hint can be off by up to a page. However,
if the chunk's contig hint is contained in one block, the contig hint
will be accurate.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-26 17:41:06 -04:00
Dennis Zhou (Facebook)
ca460b3c96 percpu: introduce bitmap metadata blocks
This patch introduces the bitmap metadata blocks and adds the skeleton
of the code that will be used to maintain these blocks.  Each chunk's
bitmap is made up of full metadata blocks. These blocks maintain basic
metadata to help prevent scanning unnecssarily to update hints. Full
scanning methods are used for the skeleton and will be replaced in the
coming patches. A number of helper functions are added as well to do
conversion of pages to blocks and manage offsets. Comments will be
updated as the final version of each function is added.

There exists a relationship between PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE,
the region size, and unit_size. Every chunk's region (including offsets)
is page aligned at the beginning to preserve alignment. The end is
aligned to LCM(PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE) to ensure that the end
can fit with the populated page map which is by page and every metadata
block is fully accounted for. The unit_size is already page aligned, but
must also be aligned with PCPU_BITMAP_BLOCK_SIZE to ensure full metadata
blocks.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-26 17:41:05 -04:00
Dennis Zhou (Facebook)
40064aeca3 percpu: replace area map allocator with bitmap
The percpu memory allocator is experiencing scalability issues when
allocating and freeing large numbers of counters as in BPF.
Additionally, there is a corner case where iteration is triggered over
all chunks if the contig_hint is the right size, but wrong alignment.

This patch replaces the area map allocator with a basic bitmap allocator
implementation. Each subsequent patch will introduce new features and
replace full scanning functions with faster non-scanning options when
possible.

Implementation:
This patchset removes the area map allocator in favor of a bitmap
allocator backed by metadata blocks. The primary goal is to provide
consistency in performance and memory footprint with a focus on small
allocations (< 64 bytes). The bitmap removes the heavy memmove from the
freeing critical path and provides a consistent memory footprint. The
metadata blocks provide a bound on the amount of scanning required by
maintaining a set of hints.

In an effort to make freeing fast, the metadata is updated on the free
path if the new free area makes a page free, a block free, or spans
across blocks. This causes the chunk's contig hint to potentially be
smaller than what it could allocate by up to the smaller of a page or a
block. If the chunk's contig hint is contained within a block, a check
occurs and the hint is kept accurate. Metadata is always kept accurate
on allocation, so there will not be a situation where a chunk has a
later contig hint than available.

Evaluation:
I have primarily done testing against a simple workload of allocation of
1 million objects (2^20) of varying size. Deallocation was done by in
order, alternating, and in reverse. These numbers were collected after
rebasing ontop of a80099a152. I present the worst-case numbers here:

  Area Map Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |        310      |     4770
             16B    |        557      |     1325
             64B    |        436      |      273
            256B    |        776      |      131
           1024B    |       3280      |      122

  Bitmap Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |        490      |       70
             16B    |        515      |       75
             64B    |        610      |       80
            256B    |        950      |      100
           1024B    |       3520      |      200

This data demonstrates the inability for the area map allocator to
handle less than ideal situations. In the best case of reverse
deallocation, the area map allocator was able to perform within range
of the bitmap allocator. In the worst case situation, freeing took
nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator
dramatically improves the consistency of the free path. The small
allocations performed nearly identical regardless of the freeing
pattern.

While it does add to the allocation latency, the allocation scenario
here is optimal for the area map allocator. The area map allocator runs
into trouble when it is allocating in chunks where the latter half is
full. It is difficult to replicate this, so I present a variant where
the pages are second half filled. Freeing was done sequentially. Below
are the numbers for this scenario:

  Area Map Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |       4118      |     4892
             16B    |       1651      |     1163
             64B    |        598      |      285
            256B    |        771      |      158
           1024B    |       3034      |      160

  Bitmap Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |        481      |       67
             16B    |        506      |       69
             64B    |        636      |       75
            256B    |        892      |       90
           1024B    |       3262      |      147

The data shows a parabolic curve of performance for the area map
allocator. This is due to the memmove operation being the dominant cost
with the lower object sizes as more objects are packed in a chunk and at
higher object sizes, the traversal of the chunk slots is the dominating
cost. The bitmap allocator suffers this problem as well. The above data
shows the inability to scale for the allocation path with the area map
allocator and that the bitmap allocator demonstrates consistent
performance in general.

The second problem of additional scanning can result in the area map
allocator completing in 52 minutes when trying to allocate 1 million
4-byte objects with 8-byte alignment. The same workload takes
approximately 16 seconds to complete for the bitmap allocator.

V2:
Fixed a bug in pcpu_alloc_first_chunk end_offset was setting the bitmap
using bytes instead of bits.

Added a comment to pcpu_cnt_pop_pages to explain bitmap_weight.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-26 17:41:05 -04:00
Vivek Goyal
273752c9ff dm, dax: Make sure dm_dax_flush() is called if device supports it
Currently dm_dax_flush() is not being called, even if underlying dax
device supports write cache, because DAXDEV_WRITE_CACHE is not being
propagated up to the DM dax device.

If the underlying dax device supports write cache, set
DAXDEV_WRITE_CACHE on the DM dax device.  This will cause dm_dax_flush()
to be called.

Fixes: abebfbe2f7 ("dm: add ->flush() dax operation support")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-07-26 15:55:44 -04:00
Murilo Opsfelder Araujo
bb67b496c3 include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH
When CONFIG_EEH=y and CONFIG_VFIO_SPAPR_EEH=n, build fails with the
following:

    drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_release':
    vfio_pci.c:(.text+0xa98): undefined reference to `.vfio_spapr_pci_eeh_release'
    drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_open':
    vfio_pci.c:(.text+0x1420): undefined reference to `.vfio_spapr_pci_eeh_open'

In this case, vfio_pci.c should use the empty definitions of
vfio_spapr_pci_eeh_open and vfio_spapr_pci_eeh_release functions.

This patch fixes it by guarding these function definitions with
CONFIG_VFIO_SPAPR_EEH, the symbol that controls whether vfio_spapr_eeh.c is
built, which is where the non-empty versions of these functions are. We need to
make use of IS_ENABLED() macro because CONFIG_VFIO_SPAPR_EEH is a tristate
option.

This issue was found during a randconfig build. Logs are here:

    http://kisskb.ellerman.id.au/kisskb/buildresult/12982362/

Signed-off-by: Murilo Opsfelder Araujo <mopsfelder@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-26 13:30:23 -06:00
Claudio Imbrenda
fdeaf7e3eb KVM: make pid available for uevents without debugfs
Simplify and improve the code so that the PID is always available in
the uevent even when debugfs is not available.

This adds a userspace_pid field to struct kvm, as per Radim's
suggestion, so that the PID can be retrieved on destruction too.

Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Fixes: 286de8f6ac ("KVM: trigger uevents when creating or destroying a VM")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-26 18:57:44 +02:00
Jeff Layton
3acdfd280f errseq: rename __errseq_set to errseq_set
Nothing calls this wrapper anymore, so just remove it and rename the
old function to get rid of the double underscore prefix.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-26 12:24:36 -04:00
Dennis Zhou (Facebook)
d2f3c38494 percpu: increase minimum percpu allocation size and align first regions
This patch increases the minimum allocation size of percpu memory to
4-bytes. This change will help minimize the metadata overhead
associated with the bitmap allocator. The assumption is that most
allocations will be of objects or structs greater than 2 bytes with
integers or longs being used rather than shorts.

The first chunk regions are now aligned with the minimum allocation
size. The reserved region is expected to be set as a multiple of the
minimum allocation size. The static region is aligned up and the delta
is removed from the dynamic size. This works because the dynamic size is
increased to be page aligned. If the static size is not minimum
allocation size aligned, then there must be a gap that is added to the
dynamic size. The dynamic size will never be smaller than the set value.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-26 10:23:53 -04:00
Daniel Vetter
6ce31263c9 dma-fence: Don't BUG_ON when not absolutely needed
It makes debugging a massive pain.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170720125107.26693-1-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-26 13:45:07 +02:00
Daniel Vetter
af05559854 Merge airlied/drm-next into drm-misc-next
I need this to be able to apply the deferred fbdev setup patches, I
need the relevant prep work that landed through the drm-intel tree.

Also squash in conflict fixup from Laurent Pinchart.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-26 13:43:33 +02:00
Dmitry Vyukov
f06e8c584f kasan: Allow kasan_check_read/write() to accept pointers to volatiles
Currently kasan_check_read/write() accept 'const void*', make them
accept 'const volatile void*'. This is required for instrumentation
of atomic operations and there is just no reason to not allow that.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/33e5ec275c1ee89299245b2ebbccd63709c6021f.1498140838.git.dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-26 13:08:54 +02:00
Dmitry Osipenko
ebae3e830a iommu: Correct iommu_map / iommu_unmap prototypes
Commit 7d3002cc8c ("iommu/core: split mapping to page sizes as supported
by the hardware") replaced 'int gfp_order' with a 'size_t size' of
iommu_map / iommu_unmap function arguments, but missed the function
prototypes for the disabled CONFIG_IOMMU_API case, let's correct them
for consistency.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-07-26 11:27:16 +02:00
Linus Torvalds
5d4eeb8a61 uuid fixups:
- add a missing "!" in the uuid tests
  - remove the last remaining user of the uuid_be type, and then
    the type and its helpers
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAll3mDcLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPIvw//XeCc0g2xMHJAX7Z6T1KoEubDVxpFaVKMIqZgE8ia
 NRy5RBq3cTKxhVRRj1KDSP7Zf1eNppYycY/fTZ0tRx7ssjFDjtxBMHyMv/wBvR/Z
 Hg0YAyHtlk/S9hzZhB9xj9jarvXYXTvCLOgIsDlaPcgqdlDeSC0thhscJOqvliIo
 +rdVp/fyQcUbtKXyCMtiaf0AJfncNa31VdD/VmFQEM9dltohyaWOzx+ZOcI2OhnD
 YYjt2fMBFOH87q8A+OZMzA1j/LEhMyDxIiPB8N9+qYkuKhyfdZi9lhKwN3YZL0y0
 IZ+AgKWEzAz0t08BTn5AURCytm84i5UtidE9s5WCnOIqtMT5D1hKcrmkgZKywQ2R
 GFpXnw8J+LI4ZPhrC5dMmdVESvGSXeWZoztoPZBSRPrrYA4co2MemiwMP6SzBocu
 S04Hgh5rMXJN/iJxasuNIIyJfA4eOyZVhszlKlkFT8YyGmaV3o9znvSkFd33HxR8
 IpneM1ymMJHZvqKX9OmFPZWWpwyu4eToT+NgPbONzeKRNf3qTMRztCHaERNnFk8u
 Zdhh2mVKAwWcAglJzJ8q72qywec8VIsC+b14BVpWmjtBva5XhC4TBQw3fz+BMpMb
 Bjpj4d9KaynTV1d3ululkkYjSRLUO9/F0pOUJUFEuGJezmF06qkyJQAW/iHyhqze
 ANE=
 =FeA7
 -----END PGP SIGNATURE-----

Merge tag 'uuid-for-4.13-2' of git://git.infradead.org/users/hch/uuid

Pull uuid fixes from Christoph Hellwig:

 - add a missing "!" in the uuid tests

 - remove the last remaining user of the uuid_be type, and then the type
   and its helpers

* tag 'uuid-for-4.13-2' of git://git.infradead.org/users/hch/uuid:
  uuid: remove uuid_be
  thunderbolt: use uuid_t instead of uuid_be
  uuid: fix incorrect uuid_equal conversion in test_uuid_test
2017-07-25 19:46:05 -07:00
Linus Torvalds
cef55b518c dma mapping fixes for 4.13-rc2:
- split the global dma coherent pool from the per-device pool.
    This fixes a regression in the earlier 4.13 pull requests where the
    global pool would override a per-device CMA pool. (Vladimir Murzin).
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAll3l1sLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYN8BhAAqFxy2CrpEBk7gD2byOi9M4kTeXDYCESEoEAwuvTG
 Fesbw5zumliBR2cjt/qk/uIDZ93fP4BuHn89NtIfcGOD1LqYOyIPwUTpmb9AgicD
 y4eO1Gy/3DrG2haZcWYmDvq8yfSuR01H3ecY1KNsX1Y2kXxeBQfVKaUDR6fuix4+
 uCf98LzIWs3TYmj7h48LVB/oNnigvs0oljrB2dWrWVJHbgGYEpmdPjBEe6r95e5U
 5cHtPno5JA1lbBFt/nvsZl/NmzSd745SL3QwJsaVmSTf7oYnAuwyPI+5gqaoeQT6
 24947e8hJjuLhBpO7RiqnJY9QdPxT0XKclkCcjnRb5j3dB9KL09f9Dz60exyJzSe
 18V8+8+1m1BgvPsAOS/pLKYxKr9Kgzl9LFrFQaBkA5+7SPlywfV7HqaCkN/mKB4F
 XJoQyRDLlZiDStDKbrhGEAHG6oYaZXnkpQ5xDitSXcSkh9/2a/elsG3caUBRI5qP
 vKC0qvfBPjnHa/3lYNNoLgADB4tZCE3rRrVP6tqdHQbjuNUNK1wLNT7PiMfeoUVj
 Oqql4le0AKlsxO4vRjavOrtaW1bVT+eAYLEtdQfXWQDvhffriEW6r6I8PGqIOiCO
 OzxemCG2M6fcD9ho/VDpjo3Ei6tZylrxdTbrsm7ogQmo/U3ID9cfs452vIOYtCcB
 9so=
 =fJWP
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma mapping fixes from Christoph Hellwig:
 "split the global dma coherent pool from the per-device pool.

  This fixes a regression in the earlier 4.13 pull requests where the
  global pool would override a per-device CMA pool (Vladimir Murzin)"

* tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping:
  ARM: NOMMU: Wire-up default DMA interface
  dma-coherent: introduce interface for default DMA pool
2017-07-25 17:17:18 -07:00
Viresh Kumar
fe829ed8ef cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flag
The policy->transition_latency field is used for multiple purposes
today and its not straight forward at all. This is how it is used:

A. Set the correct transition_latency value.

B. Set it to CPUFREQ_ETERNAL because:
   1. We don't want automatic dynamic switching (with
      ondemand/conservative) to happen at all.
   2. We don't know the transition latency.

This patch handles the B.1. case in a more readable way. A new flag for
the cpufreq drivers is added to disallow use of cpufreq governors which
have dynamic_switching flag set.

All the current cpufreq drivers which are setting transition_latency
unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't
need to set transition_latency anymore.

There shouldn't be any functional change after this patch.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-26 00:15:46 +02:00
Viresh Kumar
ed4676e254 cpufreq: Replace "max_transition_latency" with "dynamic_switching"
There is no limitation in the ondemand or conservative governors which
disallow the transition_latency to be greater than 10 ms.

The max_transition_latency field is rather used to disallow automatic
dynamic frequency switching for platforms which didn't wanted these
governors to run.

Replace max_transition_latency with a boolean (dynamic_switching) and
check for transition_latency == CPUFREQ_ETERNAL along with that. This
makes it pretty straight forward to read/understand now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-26 00:15:45 +02:00
Paul E. McKenney
a58163d8ca rcu: Migrate callbacks earlier in the CPU-offline timeline
RCU callbacks must be migrated away from an outgoing CPU, and this is
done near the end of the CPU-hotplug operation, after the outgoing CPU is
long gone.  Unfortunately, this means that other CPU-hotplug callbacks
can execute while the outgoing CPU's callbacks are still immobilized
on the long-gone CPU's callback lists.  If any of these CPU-hotplug
callbacks must wait, either directly or indirectly, for the invocation
of any of the immobilized RCU callbacks, the system will hang.

This commit avoids such hangs by migrating the callbacks away from the
outgoing CPU immediately upon its departure, shortly after the return
from __cpu_die() in takedown_cpu().  Thus, RCU is able to advance these
callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
callbacks to wait on these RCU callbacks without risk of a hang.

While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
dead code on the one hand and to avoid define-without-use warnings on the
other hand.

Reported-by: Jeffrey Hugo <jhugo@codeaurora.org>
Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
2017-07-25 13:03:43 -07:00
Marc Gonzalez
2eaa38d9fc net: phy: Remove trailing semicolon in macro definition
Commit e5a03bfd87 ("phy: Add an mdio_device structure")
introduced a spurious trailing semicolon. Remove it.

Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25 12:33:43 -07:00
Tejun Heo
0a94efb5ac workqueue: implicit ordered attribute should be overridable
5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1.  Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.

This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
2017-07-25 13:28:56 -04:00
Paul E. McKenney
931ab4a5ce atomics: Revert addition of comment header to spin_unlock_wait()
There is still considerable confusion as to the semantics of
spin_unlock_wait(), but there seems to be universal agreement that
it is not that of a lock/unlock pair.  This commit therefore removes
the comment added by 6016ffc387 ("atomics: Add header comment so
spin_unlock_wait()") in order to prevent at least that flavor of
confusion.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 09:59:08 -07:00
James Smart
9c5358e15c nvme-fc: revise TRADDR parsing
The FC-NVME spec hasn't locked down on the format string for TRADDR.
Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
where the wwn's are hex values but not prefixed by 0x.

Most implementations so far expect a string format of
"nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
uses the match_u64 parser which requires a leading 0x prefix to set
the base properly. If it's not there, a match will either fail or return
a base 10 value.

The resolution in T11 is pushing out. Therefore, to fix things now and
to cover any eventuality and any implementations already in the field,
this patch adds support for both formats.

The change consists of replacing the token matching routine with a
routine that validates the fixed string format, and then builds
a local copy of the hex name with a 0x prefix before calling
the system parser.

Note: the same parser routine exists in both the initiator and target
transports. Given this is about the only "shared" item, we chose to
replicate rather than create an interdendency on some shared code.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-07-25 18:05:25 +02:00
Jon Derrick
2fd4167fad nvme: fabrics commands should use the fctype field for data direction
Fabrics commands with opcode 0x7F use the fctype field to indicate data
direction.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sai@grmberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: eb793e2c ("nvme.h: add NVMe over Fabrics definitions")
2017-07-25 17:58:32 +02:00