Commit graph

57824 commits

Author SHA1 Message Date
Linus Torvalds
e3181f2c0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix IGMP handling wrt VRF, from David Ahern.

 2) Fix timer access to freed object in dccp, from Eric Dumazet.

 3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
    triggerable by userspace. Also from Eric Dumazet.

 4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
    King.

 5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.

 6) Fix use after free in TIPC, from Eric Dumazet.

 7) When replacing a route in ipv6 we need to reset the round robin
    pointer, from Wei Wang.

 8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
    relaxed ordering changes, from Thierry Redding. I made sure to get
    an explicit ACK from Bjorn this time around :-)

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  ipv6: repair fib6 tree in failure case
  net_sched: fix order of queue length updates in qdisc_replace()
  tools lib bpf: improve warning
  switchdev: documentation: minor typo fixes
  bpf, doc: also add s390x as arch to sysctl description
  net: sched: fix NULL pointer dereference when action calls some targets
  rxrpc: Fix oops when discarding a preallocated service call
  irda: do not leak initialized list.dev to userspace
  net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
  PCI: Allow PCI express root ports to find themselves
  tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
  net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
  ipv6: reset fn->rr_ptr when replacing route
  sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
  tipc: fix use-after-free
  tun: handle register_netdevice() failures properly
  datagram: When peeking datagrams with offset < 0 don't skip empty skbs
  bpf, doc: improve sysctl knob description
  netxen: fix incorrect loop counter decrement
  nfp: fix infinite loop on umapping cleanup
  ...
2017-08-21 13:16:27 -07:00
Oleg Nesterov
dd1c1f2f20 pids: make task_tgid_nr_ns() safe
This was reported many times, and this was even mentioned in commit
52ee2dfdd4 ("pids: refactor vnr/nr_ns helpers to make them safe") but
somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is
not safe because task->group_leader points to nowhere after the exiting
task passes exit_notify(), rcu_read_lock() can not help.

We really need to change __unhash_process() to nullify group_leader,
parent, and real_parent, but this needs some cleanups.  Until then we
can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
fix the problem.

Reported-by: Troy Kensinger <tkensinger@google.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-21 12:47:31 -07:00
Sudeep Holla
7467c9d959 of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered
Instead of the callsites choosing between of_cpu_device_node_get if the
CPUs are registered as of_node is populated by then and of_get_cpu_node
when the CPUs are not yet registered as CPU of_nodes are not yet stashed
thereby needing to parse the device tree, we can call of_get_cpu_node
in case the CPUs are not yet registered.

This will allow to use of_cpu_device_node_get anywhere hiding the
details from the caller.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-08-21 12:19:57 -05:00
Jan Kara
6c83fd5142 quota: Add lock annotations to struct members
Add annotation which lock protects which struct members to struct dquot
and struct mem_dqinfo.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-21 14:06:46 +02:00
Elaine Zhang
a5247ca670 mfd: rk808: Add rk805 regs addr and ID
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
Signed-off-by: Joseph Chen <chenjh@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-08-21 08:54:05 +01:00
Elaine Zhang
9d6105e19f mfd: rk808: Fix up the chip id get failed
the rk8xx chip id is:
((MSB << 8) | LSB) & 0xfff0

Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
Signed-off-by: Joseph Chen <chenjh@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2017-08-21 08:53:57 +01:00
Ingo Molnar
94edf6f3c2 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Removal of spin_unlock_wait()
 - SRCU updates
 - Torture-test updates
 - Documentation updates
 - Miscellaneous fixes
 - CPU-hotplug fixes
 - Miscellaneous non-RCU fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-21 09:45:19 +02:00
Linus Walleij
4342ec5fba Merge branch 'irq/for-gpio' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into devel 2017-08-21 00:10:54 +02:00
Greg Kroah-Hartman
2c68888f1d Second set of IIO fixes for the 4.13 cycle.
Given the late stage of this series, some more involved fixes have been
 held back for the upcoming merge window.
 
 The hid-sensor issue has been causing problems for a long time so it
 is great to have that one finally fixed!  No more bug reports for the
 userspace guys (well about that anyway).
 
 * documentation
   - some warning fixes due to missing colons in kernel-doc.
 * adis16480
   - fix accel scale factor.
 * bmp280
   - properly initialize the device for humidity readings - without this
   the humidity readings may be skipped and a magic value of 0x8000 returned.
 * hid-sensor-strigger
   - fix a race with user space when powering up the sensor.
 * ina291
   - Avoid an underflow for the sleeping time as a result of supporting the
   fastest rates.
 * st-magnetometer
   - Fix the status register address for hte LSM303AGR,
   - Remove the ihl property for LSM303AGR as the sensor doesn't support
   active low for the dataready line.
 * stm32-adc
   - Fix use of a common clock rate.
 * stm32-timer
   - fix the quadrature mode get routine to account for the magic 0 value.
   set on boot.
   - fix the return value of write_raw,
   - fix the get/set down count direction as the enum value was not being
   converted to the relevant bit field,
   - add an enable attribute to actually turn it on when in encoder mode,
   - missing mask when reading the trigger mode.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAlmZnXoRHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0Foj5nA/+J/CB6E6lOiWXplGVgDJWNOgTAE5zS27Z
 mwbUpI6kgGOS42+BJTtfQM9JvdZ8nLl4cR7oJ4v2LWXUV0p5WhKVx3EvKFcMazKj
 FZvEZLx0fBvCg7N7u5+v3dWEHuh3DkLBl36J13oxDM0v0oxHVbg+RFfous9t1kCK
 faV+nQeGBy51TBw7qasAaKV3FGqFmftRhYzNoG40yH0gzcu/YWRL47rotwwQryMj
 XLJg9AZ4w2XQmUgNXGjdTtphDT99sRWxi6zQCtlGQitm2E+4Oc2l7xg4tDpudqzY
 oPo64BVF2UcZO7KL0KBAhvhmxFOXtmayPDLknfqzScLEYGV7AZUkwMiNBY2x3H+d
 pkyLx1wKZd2HX2yNDM9K9We1vuAt/845Unx8iX2cUMzx89n99ODlGvE0R/7IpnJd
 YezXwoDqwUZYzysyZb6weFXdhNCSyYRFnGXw9axvA/3c2vgxVBZd4Q9BnAZ0J2xy
 gtbrosvMD6U713N+CA6V8ccz46UNHr4pZK65qiUfCMam+zs1JUkmrUJtnuPwPfJA
 A1SK55VzrWqq1LVA6qBvbRpx19Ylh1oFJtwsFCPIE85kZCpmop/LOPia4PXeXePr
 HOrNJNT+c6bv6aHyEL8CePqyWcoLL+TLoDET6XzEN0/eQI327jtW1EjTX/Utbw1J
 8iyLXx7CgDo=
 =1ota
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-4.13b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus

Jonathan writes:

Second set of IIO fixes for the 4.13 cycle.

Given the late stage of this series, some more involved fixes have been
held back for the upcoming merge window.

The hid-sensor issue has been causing problems for a long time so it
is great to have that one finally fixed!  No more bug reports for the
userspace guys (well about that anyway).

* documentation
  - some warning fixes due to missing colons in kernel-doc.
* adis16480
  - fix accel scale factor.
* bmp280
  - properly initialize the device for humidity readings - without this
  the humidity readings may be skipped and a magic value of 0x8000 returned.
* hid-sensor-strigger
  - fix a race with user space when powering up the sensor.
* ina291
  - Avoid an underflow for the sleeping time as a result of supporting the
  fastest rates.
* st-magnetometer
  - Fix the status register address for hte LSM303AGR,
  - Remove the ihl property for LSM303AGR as the sensor doesn't support
  active low for the dataready line.
* stm32-adc
  - Fix use of a common clock rate.
* stm32-timer
  - fix the quadrature mode get routine to account for the magic 0 value.
  set on boot.
  - fix the return value of write_raw,
  - fix the get/set down count direction as the enum value was not being
  converted to the relevant bit field,
  - add an enable attribute to actually turn it on when in encoder mode,
  - missing mask when reading the trigger mode.
2017-08-20 10:45:54 -07:00
Greg Kroah-Hartman
5e47adb906 Second set of IIO new device support, features and cleanup for the 4.14 cycle.
New device support:
 * ak8974
   - support the AMI306.
 * st_magnetometer
   - add support for the LIS2MDL with bindings.
 * rockchip-saradc
   - add binding for rv1108 SoC (no driver change).
 * srf08
   - add srf02 (i2c only) and srf10 support.
 * stm32-timer
   - support for the STM32H7 to existing driver.
 
 Features:
 * tools
   - move over to the tools buildsystem rather than hand rolling.
   - add an install section to the build.
 * ak8974
   - use serial number to add device randomness.
   - add AMI306 calibration data output.
 * ccs811
   - triggered buffer support.
 * srf08
   - add a device tree table as the old style i2c probing is going away,
   - add triggered buffer support
 * st32-adc
   - add optional st,min-sample-time-nsecs binding to allow control of
     sampling against analog circuitry.
 * stm32-timer
   - add output compare triggers.
 * ti-ads1015
   - add threshold event support.
 * ti-ads7950
   - Allow use on ACPI platforms including providing a default reference
     voltage as there is no way to obtain this on ACPI currently.
 
 Cleanup and fixes:
 * ad7606
   - fix an error return code in probe.
 * ads1015
   - fix incorrect data rate setting update when capture in progress,
   - fix wrong scale information for the ADS1115,
   - make conversions work when CONFIG_PM is not set,
   - make sure we don't get a stale result after a runtime resume by
     ensuring we wait long enough,
   - avoid returning a false error form the buffer setup callbacks,
   - add enough wait time to get the correct conversion,
   - remove an unnecessary config register update,
   - add a helper to set conversion mode reducing repeated boilerplate,
   - use devm_iio_triggered_buffer_setup to simplify error and remove
     paths,
   - use iio_device_claim_direct_mode instead of opencoding the same.
 * ak8974
   - mark the INT_CLEAR register as precious to prevent debugfs access.
 * apds9300
   - constify the i2c_device_id.
 * at91-sama5 adc
   - add missing Kconfig dependency.
 * bma180 accel
   - constify the i2c_device_id.
 * rockchip_saradc
   - explicitly request exclusive reset control as part of the reset rework
     on going throughout the kernel.
 * st_accel
   - fix drdy configuration for a load of accelerometers that only have
     the int1 line.  Fix is unimportant as presumably no deviec tree actually
     used the non existent hardware line.
 * st_pressure
   - fix drdy configuration for LPS22HB and LPS25H by dropping int2 support
     as they don't have this. Fix is unimportant as presumably no device tree
     actually used the non existent hardware line.
 * stm32-dac
   - explicitly request exclusive reset control (part of reset being reworked).
 * tsl2583
   - constify the i2c_device_id.
 * xadc
   - coding style fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAlmZpCcRHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0FohlmQ/6A9FAI0wPGeEQCXDiLtIAWI1/DByYsHDk
 P3U6sFS17y56KAQcLS34HVR5lLreHya3hsUmPl4gCsjXZsGWAinyXy74BNtBbPmx
 LtzVsyLdjBP3Nxl/OGgNWG+kiq7Op+uw0/OZohtEqzurG0142S6CVvxteFlmhQNH
 pLlmPGnwk6Z4GsasHma/f2FkDD8oRVgvQSP7dJ9HIwq49qQ76cT/+20X1xODHLGw
 qpXfQiLUFW8E1JBTDDcXZD3M23TWG4DZcVlNnWf8fja/bk4WaLBKqVrI9gGZpZsQ
 xXfrSDRwc216w6tzVWjsNV/M0ZuSdm/VCBeyQa17XQVNelkO4dVrCqFMLCh5i/t5
 p4qhhV9mrbweIgDj++6c+4qMzWSAznWybAKMMlcucmwxHKefcrlgUniE03OzyPpG
 gpMS94enlVW8WpE/iYkxb/d6EqTM9CH2CJH6W4Ve/Xr4aQHkF5/P23k+nsW113of
 T1q1SCKegV9p7hIzqDqDmOQC7iNTcBcu+/7RYtkDn9jmmhiQAxwoPJZunkR1cxD8
 hA04x5W9HuPFdbNRlH1MozClbyRUtk0L/XLTKwA9T0VyRUKV1P6Szcp9wYRw+6G8
 QiA5NNrNPf17L6slLZ06N1auu6vO7BhTmYNKnd6VBO+vi7kF/FM2UdZGHof4WT/4
 zPE/BO8IwTk=
 =b6g8
 -----END PGP SIGNATURE-----

Merge tag 'iio-for-4.14b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next

Jonathan writes:

Second set of IIO new device support, features and cleanup for the 4.14 cycle.

New device support:
* ak8974
  - support the AMI306.
* st_magnetometer
  - add support for the LIS2MDL with bindings.
* rockchip-saradc
  - add binding for rv1108 SoC (no driver change).
* srf08
  - add srf02 (i2c only) and srf10 support.
* stm32-timer
  - support for the STM32H7 to existing driver.

Features:
* tools
  - move over to the tools buildsystem rather than hand rolling.
  - add an install section to the build.
* ak8974
  - use serial number to add device randomness.
  - add AMI306 calibration data output.
* ccs811
  - triggered buffer support.
* srf08
  - add a device tree table as the old style i2c probing is going away,
  - add triggered buffer support
* st32-adc
  - add optional st,min-sample-time-nsecs binding to allow control of
    sampling against analog circuitry.
* stm32-timer
  - add output compare triggers.
* ti-ads1015
  - add threshold event support.
* ti-ads7950
  - Allow use on ACPI platforms including providing a default reference
    voltage as there is no way to obtain this on ACPI currently.

Cleanup and fixes:
* ad7606
  - fix an error return code in probe.
* ads1015
  - fix incorrect data rate setting update when capture in progress,
  - fix wrong scale information for the ADS1115,
  - make conversions work when CONFIG_PM is not set,
  - make sure we don't get a stale result after a runtime resume by
    ensuring we wait long enough,
  - avoid returning a false error form the buffer setup callbacks,
  - add enough wait time to get the correct conversion,
  - remove an unnecessary config register update,
  - add a helper to set conversion mode reducing repeated boilerplate,
  - use devm_iio_triggered_buffer_setup to simplify error and remove
    paths,
  - use iio_device_claim_direct_mode instead of opencoding the same.
* ak8974
  - mark the INT_CLEAR register as precious to prevent debugfs access.
* apds9300
  - constify the i2c_device_id.
* at91-sama5 adc
  - add missing Kconfig dependency.
* bma180 accel
  - constify the i2c_device_id.
* rockchip_saradc
  - explicitly request exclusive reset control as part of the reset rework
    on going throughout the kernel.
* st_accel
  - fix drdy configuration for a load of accelerometers that only have
    the int1 line.  Fix is unimportant as presumably no deviec tree actually
    used the non existent hardware line.
* st_pressure
  - fix drdy configuration for LPS22HB and LPS25H by dropping int2 support
    as they don't have this. Fix is unimportant as presumably no device tree
    actually used the non existent hardware line.
* stm32-dac
  - explicitly request exclusive reset control (part of reset being reworked).
* tsl2583
  - constify the i2c_device_id.
* xadc
  - coding style fixes.
2017-08-20 10:42:42 -07:00
Trond Myklebust
7af7a5963c Merge branch 'bugfixes' 2017-08-20 13:04:12 -04:00
Linus Torvalds
e46db8d2ef Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two fixes for the perf subsystem:

   - Fix an inconsistency of RDPMC mm struct tagging across exec() which
     causes RDPMC to fault.

   - Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
     causes incorrect timestamps and total time calculations"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix time on IOC_ENABLE
  perf/x86: Fix RDPMC vs. mm_struct tracking
2017-08-20 09:20:57 -07:00
Linus Torvalds
e18a5ebc2d Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog fix from Thomas Gleixner:
 "A fix for the hardlockup watchdog to prevent false positives with
  extreme Turbo-Modes which make the perf/NMI watchdog fire faster than
  the hrtimer which is used to verify.

  Slightly larger than the minimal fix, which just would increase the
  hrtimer frequency, but comes with extra overhead of more watchdog
  timer interrupts and thread wakeups for all users.

  With this change we restrict the overhead to the extreme Turbo-Mode
  systems"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kernel/watchdog: Prevent false positives with turbo modes
2017-08-20 08:54:30 -07:00
Trond Myklebust
3bde7afdab NFS: Remove unused parameter gfp_flags from nfs_pageio_init()
Now that the mirror allocation has been moved, the parameter can go.
Also remove the redundant symbol export.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-20 11:35:33 -04:00
Jonathan Corbet
f00fd7ae4f PATCH] iio: Fix some documentation warnings
The kerneldoc description for the trig_readonly field of struct iio_dev
lacked a colon, leading to this doc build warning:

  ./include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'

A similar issue for iio_trigger_set_immutable() in trigger.h yielded:

  ./include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
  ./include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'

Fix the formatting and silence the warnings.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-08-20 15:21:46 +01:00
Eran Ben Elisha
efae7f78c4 net/mlx5e: Add outbound PCI buffer overflow counter
Add outbound_pci_buffer_overflow to ethtool output for monitoring the
number of packets that were dropped due to lack of PCIe buffers on
receive path from NIC port toward the host(s).

This counter is valid only in case that tx_overflow_buffer_pkt is
supported in MCAM enhanced features.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20 12:57:19 +03:00
Gal Pressman
2dba07975a net/mlx5: Add RX buffer fullness counters infrastructure
Add capability bit in PCAM register and counters to PPCNT register.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20 12:57:19 +03:00
Gal Pressman
5405fa26c2 net/mlx5: Add PCIe outbound stalls counters infrastructure
Add capability bit in MCAM register and counters to MPCNT register.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-20 12:57:19 +03:00
David S. Miller
d6e1e46f69 bpf: linux/bpf.h needs linux/numa.h
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 23:34:03 -07:00
Martin KaFai Lau
96eabe7a40 bpf: Allow selecting numa node during map creation
The current map creation API does not allow to provide the numa-node
preference.  The memory usually comes from where the map-creation-process
is running.  The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.

One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps).  Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:

[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]

># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec  #<<<

After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec
4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec
7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec
0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<<

This patch adds one field, numa_node, to the bpf_attr.  Since numa node 0
is a valid node, a new flag BPF_F_NUMA_NODE is also added.  The numa_node
field is honored if and only if the BPF_F_NUMA_NODE flag is set.

Numa node selection is not supported for percpu map.

This patch does not change all the kmalloc.  F.e.
'htab = kzalloc()' is not changed since the object
is small enough to stay in the cache.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 21:35:43 -07:00
stephen hemminger
718ad681ef net: drop unused attribute argument from sysfs queue funcs
The show and store functions don't need/use the attribute.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18 22:38:31 -07:00
stephen hemminger
737aec57c6 net: constify net_ns_type_operations
This can be const.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18 22:37:27 -07:00
stephen hemminger
b793dc5c6e net: constify netdev_class_file
These functions are wrapper arount class_create_file which can take a
const attribute.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18 22:37:12 -07:00
Lendacky, Thomas
606c07f308 net: ethtool: Add macro to clear a link mode setting
There are currently macros to set and test an ETHTOOL_LINK_MODE_ setting,
but not to clear one. Add a macro to clear an ETHTOOL_LINK_MODE_ setting.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18 16:30:17 -07:00
Michal Hocko
6b31d5955c mm, oom: fix potential data corruption when oom_reaper races with writer
Wenwei Tao has noticed that our current assumption that the oom victim
is dying and never doing any visible changes after it dies, and so the
oom_reaper can tear it down, is not entirely true.

__task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
So there is a race window when some threads won't have
fatal_signal_pending while the oom_reaper could start unmapping the
address space.  Moreover some paths might not check for fatal signals
before each PF/g-u-p/copy_from_user.

We already have a protection for oom_reaper vs.  PF races by checking
MMF_UNSTABLE.  This has been, however, checked only for kernel threads
(use_mm users) which can outlive the oom victim.  A simple fix would be
to extend the current check in handle_mm_fault for all tasks but that
wouldn't be sufficient because the current check assumes that a kernel
thread would bail out after EFAULT from get_user*/copy_from_user and
never re-read the same address which would succeed because the PF path
has established page tables already.  This seems to be the case for the
only existing use_mm user currently (virtio driver) but it is rather
fragile in general.

This is even more fragile in general for more complex paths such as
generic_perform_write which can re-read the same address more times
(e.g.  iov_iter_copy_from_user_atomic to fail and then
iov_iter_fault_in_readable on retry).

Therefore we have to implement MMF_UNSTABLE protection in a robust way
and never make a potentially corrupted content visible.  That requires
to hook deeper into the PF path and check for the flag _every time_
before a pte for anonymous memory is established (that means all
!VM_SHARED mappings).

The corruption can be triggered artificially
(http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
but there doesn't seem to be any real life bug report.  The race window
should be quite tight to trigger most of the time.

Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
Fixes: aac4536355 ("mm, oom: introduce oom reaper")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18 15:32:01 -07:00
Pavel Tatashin
3010f87650 mm: discard memblock data later
There is existing use after free bug when deferred struct pages are
enabled:

The memblock_add() allocates memory for the memory array if more than
128 entries are needed.  See comment in e820__memblock_setup():

  * The bootstrap memblock region count maximum is 128 entries
  * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
  * than that - so allow memblock resizing.

This memblock memory is freed here:
        free_low_memory_core_early()

We access the freed memblock.memory later in boot when deferred pages
are initialized in this path:

        deferred_init_memmap()
                for_each_mem_pfn_range()
                  __next_mem_pfn_range()
                    type = &memblock.memory;

One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been
exceeded at least on systems where deferred struct pages were enabled.

Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
and verifying in qemu that this code is getting excuted and that the
freed pages are sane.

Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 7e18adb4f8 ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18 15:32:01 -07:00
Luis R. Rodriguez
8ada92799e wait: add wait_event_killable_timeout()
These are the few pending fixes I have queued up for v4.13-final.  One
is a a generic regression fix for recursive loops on kmod and the other
one is a trivial print out correction.

During the v4.13 development we assumed that recursive kmod loops were
no longer possible.  Clearly that is not true.  The regression fix makes
use of a new killable wait.  We use a killable wait to be paranoid in
how signals might be sent to modprobe and only accept a proper SIGKILL.
The signal will only be available to userspace to issue *iff* a thread
has already entered a wait state, and that happens only if we've already
throttled after 50 kmod threads have been hit.

Note that although it may seem excessive to trigger a failure afer 5
seconds if all kmod thread remain busy, prior to the series of changes
that went into v4.13 we would actually *always* fatally fail any request
which came in if the limit was already reached.  The new waiting
implemented in v4.13 actually gives us *more* breathing room -- the wait
for 5 seconds is a wait for *any* kmod thread to finish.  We give up and
fail *iff* no kmod thread has finished and they're *all* running
straight for 5 consecutive seconds.  If 50 kmod threads are running
consecutively for 5 seconds something else must be really bad.

Recursive loops with kmod are bad but they're also hard to implement
properly as a selftest without currently fooling current userspace tools
like kmod [1].  For instance kmod will complain when you run depmod if
it finds a recursive loop with symbol dependency between modules as such
this type of recursive loop cannot go upstream as the modules_install
target will fail after running depmod.

These tests already exist on userspace kmod upstream though (refer to
the testsuite/module-playground/mod-loop-*.c files).  The same is not
true if request_module() is used though, or worst if aliases are used.

Likewise the issue with 64-bit kernels booting 32-bit userspace without
a binfmt handler built-in is also currently not detected and proactively
avoided by userspace kmod tools, or kconfig for all architectures.
Although we could complain in the kernel when some of these individual
recursive issues creep up, proactively avoiding these situations in
userspace at build time is what we should keep striving for.

Lastly, since recursive loops could happen with kmod it may mean
recursive loops may also be possible with other kernel usermode helpers,
this should be investigated and long term if we can come up with a more
sensible generic solution even better!

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final
[1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git

This patch (of 3):

This wait is similar to wait_event_interruptible_timeout() but only
accepts SIGKILL interrupt signal.  Other signals are ignored.

Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Matt Redfearn <matt.redfearn@imgetc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18 15:32:01 -07:00
Johannes Weiner
739f79fc9d mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
to update the memcg stats:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
    IP: test_clear_page_writeback+0x12e/0x2c0
    [...]
    RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
    Call Trace:
     <IRQ>
     end_page_writeback+0x47/0x70
     f2fs_write_end_io+0x76/0x180 [f2fs]
     bio_endio+0x9f/0x120
     blk_update_request+0xa8/0x2f0
     scsi_end_request+0x39/0x1d0
     scsi_io_completion+0x211/0x690
     scsi_finish_command+0xd9/0x120
     scsi_softirq_done+0x127/0x150
     __blk_mq_complete_request_remote+0x13/0x20
     flush_smp_call_function_queue+0x56/0x110
     generic_smp_call_function_single_interrupt+0x13/0x30
     smp_call_function_single_interrupt+0x27/0x40
     call_function_single_interrupt+0x89/0x90
    RIP: 0010:native_safe_halt+0x6/0x10

    (gdb) l *(test_clear_page_writeback+0x12e)
    0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
    614		mod_node_page_state(page_pgdat(page), idx, val);
    615		if (mem_cgroup_disabled() || !page->mem_cgroup)
    616			return;
    617		mod_memcg_state(page->mem_cgroup, idx, val);
    618		pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
    619		this_cpu_add(pn->lruvec_stat->count[idx], val);
    620	}
    621
    622	unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
    623							gfp_t gfp_mask,

The issue is that writeback doesn't hold a page reference and the page
might get freed after PG_writeback is cleared (and the mapping is
unlocked) in test_clear_page_writeback().  The stat functions looking up
the page's node or zone are safe, as those attributes are static across
allocation and free cycles.  But page->mem_cgroup is not, and it will
get cleared if we race with truncation or migration.

It appears this race window has been around for a while, but less likely
to trigger when the memcg stats were updated first thing after
PG_writeback is cleared.  Recent changes reshuffled this code to update
the global node stats before the memcg ones, though, stretching the race
window out to an extent where people can reproduce the problem.

Update test_clear_page_writeback() to look up and pin page->mem_cgroup
before clearing PG_writeback, then not use that pointer afterward.  It
is a partial revert of 62cccb8c8e ("mm: simplify lock_page_memcg()")
but leaves the pageref-holding callsites that aren't affected alone.

Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
Fixes: 62cccb8c8e ("mm: simplify lock_page_memcg()")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reported-by: Bradley Bolen <bradleybolen@gmail.com>
Tested-by: Brad Bolen <bradleybolen@gmail.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18 15:32:01 -07:00
Arnd Bergmann
e517030e26 Reset controller changes for v4.14
- constify zx2967 reset_ops
 - add a convenience API to manage an array of resets
 - let deassert report success and let assert report success for shared resets
   if the reset controller driver does not implement (de)assert.
 - add HSDKv1 reset driver
 - remove Gemini reset controller, the driver is made obsolete
   by a combined clock/reset driver in drivers/clk
 - fix the total number of reset lines in the sunxi driver
 - various uniphier updates and fixes:
   - remove sLD3 SoC support
   - simplify system reset register and bit definitions
   - add audio systems, video input subsystem, and analog amplifiers reset
     controls
 -----BEGIN PGP SIGNATURE-----
 
 iQJMBAABCAA2FiEEBsBxhV1FaKwXuCOBUMKIHHCeYOsFAlmWg5MYHHBoaWxpcHAu
 emFiZWxAZ21haWwuY29tAAoJEFDCiBxwnmDrEqoQAKSQDhwA5ZyBd216PoUSaLqf
 tTGrJc/atfs+cDiqSyQDQ5vN0dTMnPV+YjGmwITg55ZGQTJRTq5nVhwM6y0CaASN
 PE7U/9Ma/8GWOjtC+6hun6tAHeuX3fMq2cYK8BSL+D7MQr+zkIgEQx3YsbR/0Np7
 qmawpZSKAvIxrpVkQiVrsxmVvxi23GdncGUPDHtKhAW4N1T3ywogpW/7t6AcDZZl
 XIa09fIRxfB9VECIXPfe/CDkTHPcKgrxP80/AQJksRmWKBtFU3dRktYVscqUplz7
 gS3EelAJ3NYscqWQevCXSvGsSpiNJfePKNLRrDSTKX+eKbAignHvcRhJ6YuP4QrY
 t7ZGm7VZf+HSY/XYXbcXEO8zcaf3LFT1lO3KAQ3/nmFie/CuVCC+Dpl74OPsBvcD
 +3uhQHqvm8AwnxBmTRtMNH3AtaDFPxT9hbebhsoJKujSb5+5aC4ygDAnY0E6QWfJ
 wYG/qKz6+dtrFVIOZG3TFjz2/kgjXRd8zegYtxn20KH1Qk/i+JqaIofbwPvPWAKS
 A5vi6sLt6AruRUh213E9+Pi+EWQs1Qhexqjk1rtna9RUkMmu2Qbx50hKkgdgE4Q7
 EWcQSB43bpGRo5xt4zhy7fnBPj3zxjJK9gAzeGNrbN16z5gulHarpLWb9X5ujli4
 9zv7wk00SQIaSqlVFKBM
 =YW4N
 -----END PGP SIGNATURE-----

Merge tag 'reset-for-4.14' of git://git.pengutronix.de/git/pza/linux into next/drivers

Pull "Reset controller changes for v4.14" from Philipp Zabel:

- constify zx2967 reset_ops
- add a convenience API to manage an array of resets
- let deassert report success and let assert report success for shared resets
  if the reset controller driver does not implement (de)assert.
- add HSDKv1 reset driver
- remove Gemini reset controller, the driver is made obsolete
  by a combined clock/reset driver in drivers/clk
- fix the total number of reset lines in the sunxi driver
- various uniphier updates and fixes:
  - remove sLD3 SoC support
  - simplify system reset register and bit definitions
  - add audio systems, video input subsystem, and analog amplifiers reset
    controls

* tag 'reset-for-4.14' of git://git.pengutronix.de/git/pza/linux:
  reset: uniphier: add analog amplifiers reset control
  reset: uniphier: add video input subsystem reset control
  reset: uniphier: add audio systems reset control
  reset: sunxi: fix number of reset lines
  reset: uniphier: do not use per-SoC macro for system reset block
  reset: uniphier: remove sLD3 SoC support
  Revert "reset: Add a Gemini reset controller"
  ARC: reset: introduce HSDKv1 reset driver
  reset: make (de)assert report success for self-deasserting reset drivers
  reset: Add APIs to manage array of resets
  reset: zx2967: constify zx2967_reset_ops.
2017-08-19 00:04:27 +02:00
Arnd Bergmann
9da95d8f5b - add mt7623a smp support
- scpsys: reduce code duplication
 - scpsys: add mt7622 support
 - pmic wrapper: make of_device_ids constant
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCAA1FiEEiUuSfQSYnG8EMsBltDliWyzx00MFAlmVfMEXHG1hdHRoaWFz
 LmJnZ0BnbWFpbC5jb20ACgkQtDliWyzx00MGyw//b1Ax9tAl+eX1phkU5r/sdyd6
 KS72i4pMo4yyyuoPCLKhpwH2rL1UrgY/T6f6bEyVf6KNhzgUvx1P4Y5PSpo2R+Jv
 AJ9lIypXCxUvmrQL5FOuQ+8SY694SSm1IBsoJ70pNA+/m7//Yfk0eiueUa7epNI0
 dhrHOvNaZPQbnCUjFa+qdkK8x04MoAKUQcq9sp1vpcvYTL7V6OrW2V6kc0iqwaN1
 G91QW2r+XAgO1KA1PIMqC8E9v06dhYWuaPI3SMCIqp/NFnNZA8j58KySzW7+nB/B
 7IlDARYDyzokw/JTUuWSlOOPh425xVSExidkTbCgb7m0Lt8i3CvW89yXsqKSL/+U
 Ezek9w/rJNGb5OoLhT1xQfy6XrXFCW78gl9HKZhbrM1eBiEeJctLlmQula57nyJH
 qIlCMjuueKYHUKTGT5LXidNbGesfx4vLt6v803XouF4D3NCIqPahdFqtBnEfsOEO
 l1xwbTSq44VSFtUKSdRrgt6e8Uxa25ftOkbKlS8vIPcsYJ7gnK7r2/ERWNJF8F+f
 x4toGOKhOVRTRwAZIx9ncKnQp08g6jegiRHW4JrJNmPnUwwC00I06+Jiltg5KJIP
 ZqgfjXesdjD660pWl70YGMRtdUNLuNgZRnyXg1CM94CfOimUBZi2PXhoK1Wm/7sj
 Db0QrPV4DQHti15Q0ig=
 =IW3o
 -----END PGP SIGNATURE-----

Merge tag 'v4.13-next-soc' of https://github.com/mbgg/linux-mediatek into next/drivers

Pull "arm: mediatek: soc updates for v4.14" from Matthias Brugger:

- add mt7623a smp support
- scpsys: reduce code duplication
- scpsys: add mt7622 support
- pmic wrapper: make of_device_ids constant

* tag 'v4.13-next-soc' of https://github.com/mbgg/linux-mediatek:
  soc: mediatek: add SCPSYS power domain driver for MediaTek MT7622 SoC
  soc: mediatek: add header files required for MT7622 SCPSYS dt-binding
  soc: mediatek: reduce code duplication of scpsys_probe across all SoCs
  dt-bindings: soc: update the binding document for SCPSYS on MediaTek MT7622 SoC
  soc: mtk-pmic-wrap: make of_device_ids const.
  ARM: mediatek: add MT7623a smp bringup code
2017-08-18 23:27:22 +02:00
Trond Myklebust
ce7c252a8c SUNRPC: Add a separate spinlock to protect the RPC request receive list
This further reduces contention with the transport_lock, and allows us
to convert to using a non-bh-safe spinlock, since the list is now never
accessed from a bh context.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-18 14:45:04 -04:00
Doug Ledford
d3cf4d9915 Merge branch 'misc' into k.o/for-next
Conflicts:
	drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
	HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
	we aren't safe for that context) touched the same code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:10:23 -04:00
Greg Kroah-Hartman
92d50fc160 PCI/IB: add support for pci driver attribute groups
Some drivers (specifically the nes IB driver), want to create a lot of
sysfs driver attributes.  Instead of open-coding the creation and
removal of these files (and getting it wrong btw), it's a better idea to
let the driver core handle all of this logic for us.

So add a new field to the pci driver structure, **groups, that allows
pci drivers to specify an attribute group list it wishes to have created
when it is registered with the driver core.

Big bonus is now the driver doesn't race with userspace when the sysfs
files are created vs. when the kobject is announced, so any script/tool
that actually wanted to use these files will not have to poll waiting
for them to show up.

Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:00:53 -04:00
Kishon Vijay Abraham I
f01f969e25 PCI: endpoint: Add an API to get matching "pci_epf_device_id"
Add an API to get "pci_epf_device_id" matching the EPF name. This can be
used by the EPF driver to get the driver data corresponding to the EPF
device name.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
[bhelgaas: folded in "while" loop termination fix from Colin Ian King
<colin.king@canonical.com>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-18 10:42:45 -05:00
Waiman Long
e1cba4b85d cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup
A new mount option "cpuset_v2_mode" is added to the v1 cgroupfs
filesystem to enable cpuset controller to use v2 behavior in a v1
cgroup. This mount option applies only to cpuset controller and have
no effect on other controllers.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-18 08:24:21 -07:00
Bart Van Assche
d352ae205d blk-mq: Make blk_mq_reinit_tagset() calls easier to read
Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18 08:36:58 -06:00
Thomas Gleixner
7edaeb6841 kernel/watchdog: Prevent false positives with turbo modes
The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.

The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.

A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.

Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.

That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.

Fixes: 58687acba5 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
2017-08-18 12:35:02 +02:00
Thomas Gleixner
6629695465 Merge branch 'irq/for-gpio' into irq/core
Merge the flow handlers and irq domain extensions which are in a separate
branch so they can be consumed by the gpio folks.
2017-08-18 11:22:27 +02:00
David Daney
495c38d300 irqdomain: Add irq_domain_{push,pop}_irq() functions
For an already existing irqdomain hierarchy, as might be obtained via
a call to pci_enable_msix_range(), a PCI driver wishing to add an
additional irqdomain to the hierarchy needs to be able to insert the
irqdomain to that already initialized hierarchy.  Calling
irq_domain_create_hierarchy() allows the new irqdomain to be created,
but no existing code allows for initializing the associated irq_data.

Add a couple of helper functions (irq_domain_push_irq() and
irq_domain_pop_irq()) to initialize the irq_data for the new
irqdomain added to an existing hierarchy.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Link: http://lkml.kernel.org/r/1503017616-3252-6-git-send-email-david.daney@cavium.com
2017-08-18 11:21:42 +02:00
David Daney
7703b08cc9 genirq: Add handle_fasteoi_{level,edge}_irq flow handlers
Follow-on patch for gpio-thunderx uses a irqdomain hierarchy which
requires slightly different flow handlers, add them to chip.c which
contains most of the other flow handlers.  Make these conditionally
compiled based on CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Link: http://lkml.kernel.org/r/1503017616-3252-3-git-send-email-david.daney@cavium.com
2017-08-18 11:21:41 +02:00
Marc Zyngier
74def747bc genirq: Restrict effective affinity to interrupts actually using it
Just because CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK is selected
doesn't mean that all the interrupts are using the effective
affinity mask. For a number of them, this mask is likely to
be empty.

In order to deal with this, let's restrict the use of the
effective affinity mask to these interrupts that have a non empty
effective affinity.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Link: http://lkml.kernel.org/r/20170818083925.10108-2-marc.zyngier@arm.com
2017-08-18 10:54:39 +02:00
Ingo Molnar
0c23647913 Merge branch 'x86/asm' into locking/core
We need the ASM_UNREACHABLE() macro for a dependent patch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-18 10:29:54 +02:00
Srinivas Pandruvada
726fb6b4f2 ACPI / PM: Check low power idle constraints for debug only
For SoC to achieve its lowest power platform idle state a set of hardware
preconditions must be met. These preconditions or constraints can be
obtained by issuing a device specific method (_DSM) with function "1".
Refer to the document provided in the link below.

Here during initialization (from attach() callback of LPS0 device), invoke
function 1 to get the device constraints. Each enabled constraint is
stored in a table.

The devices in this table are used to check whether they were in required
minimum state, while entering suspend. This check is done from platform
freeze wake() callback, only when /sys/power/pm_debug_messages attribute
is non zero.

If any constraint is not met and device is ACPI power managed then it
prints the device information to kernel logs.

Also if debug is enabled in acpi/sleep.c, the constraint table and state
of each device on wake is dumped in kernel logs.

Since pm_debug_messages_on setting is used as condition to check
constraints outside kernel/power/main.c, pm_debug_messages_on is changed
to a global variable.

Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-18 01:54:22 +02:00
Kees Cook
c71b02e4d2 Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps"
This reverts commit 68c4a4f8ab, with
various conflict clean-ups.

The capability check required too much privilege compared to simple DAC
controls. A system builder was forced to have crash handler processes
run with CAP_SYSLOG which would give it the ability to read (and wipe)
the _current_ dmesg, which is much more access than being given access
only to the historical log stored in pstorefs.

With the prior commit to make the root directory 0750, the files are
protected by default but a system builder can now opt to give access
to a specific group (via chgrp on the pstorefs root directory) without
being forced to also give away CAP_SYSLOG.

Suggested-by: Nick Kralevich <nnk@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
2017-08-17 16:29:19 -07:00
Jan Kara
7b9ca4c61b quota: Reduce contention on dq_data_lock
dq_data_lock is currently used to protect all modifications of quota
accounting information, consistency of quota accounting on the inode,
and dquot pointers from inode. As a result contention on the lock can be
pretty heavy.

Reduce the contention on the lock by protecting quota accounting
information by a new dquot->dq_dqb_lock and consistency of quota
accounting with inode usage by inode->i_lock.

This change reduces time to create 500000 files on ext4 on ramdisk by 50
different processes in separate directories by 6% when user quota is
turned on. When those 50 processes belong to 50 different users, the
improvement is about 9%.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:07:59 +02:00
Jan Kara
f4a8116a4c fs: Provide __inode_get_bytes()
Provide helper __inode_get_bytes() which assumes i_lock is already
acquired. Quota code will need this to be able to use i_lock to protect
consistency of quota accounting information and inode usage.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:06:03 +02:00
Jan Kara
0ed60de34a quota: Inline functions into their callsites
inode_add_rsv_space() and inode_sub_rsv_space() had only one callsite.
Inline them there directly. inode_claim_rsv_space() and
inode_reclaim_rsv_space() had two callsites so inline them there as
well. This will simplify further locking changes.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:59 +02:00
Jan Kara
834057bf84 quota: Allow disabling tracking of dirty dquots in a list
Filesystems that are journalling quotas generally don't need tracking of
dirty dquots in a list since forcing a transaction commit flushes all
quotas anyway. Allow filesystem to say it doesn't want dquots to be
tracked as it reduces contention on the dq_list_lock.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:45 +02:00
Jan Kara
503330f382 quota: Remove dq_wait_unused from dquot
Currently every dquot carries a wait_queue_head_t used only when we are
turning quotas off to wait for last users to drop dquot references.
Since such rare case is not performance sensitive in any means, just use
a global waitqueue for this and save space in struct dquot. Also convert
the logic to use wait_event() instead of open-coding it.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:40 +02:00
Dave Airlie
8824c751eb omapdrm changes for v4.14
* HDMI hot plug IRQ support (instead of polling)
 * Big driver cleanup from Laurent (no functional changes)
 * OMAP5 DSI support (only the pinmuxing was missing)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZlZ+SAAoJEPo9qoy8lh71x/8P/is/jIhTMCRi4kwH7pDN/eoG
 Ie60ES9eovlN0Ld3HlTQWIT2s3y5zTJGdp1S2C+gbAubrOftnc51wv92p/3HxWZL
 cnbnIPUvJIToF60l1+khy1V0AZdgIpaBgqLlqXx0DRhrAh9e6/fZYot5jjFDI1LW
 ARMPkFP9S4nwOYx1uENTb6C61FaE+Q/D+tdYqSsbKP18h235UiTeqP5DAMUy+vE4
 NR0fKUppQZugp/Nz0teVqHgP10WJJ8Tp878Zurxtb7ecrfdx5+Rm6r8e+yQ6pNty
 ZkKQYxt+HEk8f1N6zWBrQFpREDNKRdFgeZpXmnfDG7qdt/DpE/SI/0Be7k/4hhdy
 0LdjQHfvFeXHg3ZKiFqaGmSUc7GNv1ZTvTjHbpTyHn40Xfd8G5O4RHemEs9g92Mh
 Fw+sUxW0nD1FutZai0VHTZKXyRHvvdQYRdgeluPeVUXdXcyxMq5tr/MUbRYlpYhY
 FYTfyjEnarjZ/BHlokr/cT+odoCSdCorcNmjn56rRufMgavI/yh86fgZDzGMgSgp
 vPaAS77FSU/vldEw/nJCZcXqsNi1xHqkwR1kqH8embvxhooW7njxTszfuXiGJhMJ
 TkWUV36s1cWI83ajBqdj7uuLSfKvBZJUHsiqhd5UShk/JgiahrqN85K1fB5Fz8M3
 AAXArAka4RktMeYyZVIs
 =7wKU
 -----END PGP SIGNATURE-----

Merge tag 'omapdrm-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into drm-next

omapdrm changes for v4.14

* HDMI hot plug IRQ support (instead of polling)
* Big driver cleanup from Laurent (no functional changes)
* OMAP5 DSI support (only the pinmuxing was missing)

* tag 'omapdrm-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (60 commits)
  drm/omap: Potential NULL deref in omap_crtc_duplicate_state()
  drm/omap: remove no-op cleanup code
  drm/omap: rename omapdrm device back
  drm: omapdrm: Remove omapdrm platform data
  ARM: OMAP2+: Don't register omapdss device for omapdrm
  ARM: OMAP2+: Remove unused omapdrm platform device
  drm: omapdrm: Remove the omapdss driver
  drm: omapdrm: Register omapdrm platform device in omapdss driver
  drm: omapdrm: hdmi: Don't allocate PHY features dynamically
  drm: omapdrm: hdmi: Configure the PHY from the HDMI core version
  drm: omapdrm: hdmi: Configure the PLL from the HDMI core version
  drm: omapdrm: hdmi: Pass HDMI core version as integer to HDMI audio
  drm: omapdrm: hdmi: Replace OMAP SoC model check with HDMI xmit version
  drm: omapdrm: hdmi: Rename functions and structures to use hdmi_ prefix
  drm/omap: add OMAP5 DSIPHY lane-enable support
  drm/omap: use regmap_update_bit() when muxing DSI pads
  drm: omapdrm: Remove dss_features.h
  drm: omapdrm: Move supported outputs feature to dss driver
  drm: omapdrm: Move DSS_FCK feature to dss driver
  drm: omapdrm: Move PCD, LINEWIDTH and DOWNSCALE features to dispc driver
  ...
2017-08-18 05:41:32 +10:00