Commit graph

62682 commits

Author SHA1 Message Date
Takashi Iwai
c429bda21f Merge branch 'for-next' into for-linus
Pull 4.15 updates to take over the previous urgent fixes.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-13 15:43:13 +01:00
Thomas Gleixner
f4c09f87ad cpu/hotplug: Get rid of CPU hotplug notifier leftovers
The CPU hotplug notifiers are history. Remove the last reminders.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-11-13 10:03:53 +01:00
Ingo Molnar
fcdfafcb73 kprobes: Don't spam the build log with deprecation warnings
The jprobes APIs are deprecated - but are still in occasional use for code that
few people seem to care about, so stop generating deprecation warnings.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-13 08:00:52 +01:00
Linus Torvalds
516fb7f2e7 /proc/module: use the same logic as /proc/kallsyms for address exposure
The (alleged) users of the module addresses are the same: kernel
profiling.

So just expose the same helper and format macros, and unify the logic.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-12 19:01:23 -08:00
Rafael J. Wysocki
990a848d53 Merge branches 'pm-devfreq' and 'pm-tools'
* pm-devfreq:
  PM / devfreq: Define the constant governor name
  PM / devfreq: Remove unneeded conditional statement
  PM / devfreq: Show the all available frequencies
  PM / devfreq: Change return type of devfreq_set_freq_table()
  PM / devfreq: Use the available min/max frequency
  Revert "PM / devfreq: Add show_one macro to delete the duplicate code"
  PM / devfreq: Set min/max_freq when adding the devfreq device

* pm-tools:
  tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore
  tools/power/cpupower: Add 64 bit library detection
  MAINTAINERS: add maintainer for tools/power/cpupower
  cpupower: Fix no-rounding MHz frequency output
2017-11-13 01:41:39 +01:00
Rafael J. Wysocki
1efef68262 Merge branch 'pm-core'
* pm-core:
  ACPI / PM: Take SMART_SUSPEND driver flag into account
  PCI / PM: Take SMART_SUSPEND driver flag into account
  PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks
  PM / core: Add SMART_SUSPEND driver flag
  PCI / PM: Use the NEVER_SKIP driver flag
  PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  PM / core: Convert timers to use timer_setup()
  PM / core: Fix kerneldoc comments of four functions
  PM / core: Drop legacy class suspend/resume operations
2017-11-13 01:41:26 +01:00
Rafael J. Wysocki
05d658b5b5 Merge branch 'pm-sleep'
* pm-sleep:
  freezer: Fix typo in freezable_schedule_timeout() comment
  PM / s2idle: Clear the events_check_enabled flag
  PM / sleep: Remove pm_complete_with_resume_check()
  PM: ARM: locomo: Drop suspend and resume bus type callbacks
  PM: Use a more common logging style
  PM: Document rules on using pm_runtime_resume() in system suspend callbacks
2017-11-13 01:41:20 +01:00
Rafael J. Wysocki
794c33555f Merge branch 'acpi-pm'
* acpi-pm:
  ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock
  ACPI / LPSS: Consolidate runtime PM and system sleep handling
  ACPI / PM: Combine device suspend routines
  ACPI / LPIT: Add Low Power Idle Table (LPIT) support
  ACPI / PM: Split code validating need for runtime resume in ->prepare()
  ACPI / PM: Restore acpi_subsys_complete()
  ACPI / PM: Combine two identical device resume routines
  ACPI / PM: Remove stale function header
2017-11-13 01:41:11 +01:00
Rafael J. Wysocki
28da43956b Merge branches 'pm-cpufreq-sched' and 'pm-opp'
* pm-cpufreq-sched:
  cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq

* pm-opp:
  PM / OPP: Add dev_pm_opp_{un}register_get_pstate_helper()
  PM / OPP: Support updating performance state of device's power domain
  PM / OPP: add missing of_node_put() for of_get_cpu_node()
  PM / OPP: Rename dev_pm_opp_register_put_opp_helper()
  PM / OPP: Add missing of_node_put(np)
  PM / OPP: Move error message to debug level
  PM / OPP: Use snprintf() to avoid kasprintf() and kfree()
  PM / OPP: Move the OPP directory out of power/
2017-11-13 01:40:52 +01:00
Rafael J. Wysocki
60af981c78 Merge branch 'pm-cpufreq'
* pm-cpufreq: (22 commits)
  cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE
  cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const
  cpufreq: arm_big_little: make function arguments and structure pointer const
  cpufreq: pxa: convert to clock API
  cpufreq: speedstep-lib: mark expected switch fall-through
  cpufreq: ti-cpufreq: add missing of_node_put()
  cpufreq: dt: Remove support for Exynos4212 SoCs
  cpufreq: imx6q: Move speed grading check to cpufreq driver
  cpufreq: ti-cpufreq: kfree opp_data when failure
  cpufreq: SPEAr: pr_err() strings should end with newlines
  cpufreq: powernow-k8: pr_err() strings should end with newlines
  cpufreq: dt-platdev: drop socionext,uniphier-ld6b from whitelist
  arm64: wire cpu-invariant accounting support up to the task scheduler
  arm64: wire frequency-invariant accounting support up to the task scheduler
  arm: wire cpu-invariant accounting support up to the task scheduler
  arm: wire frequency-invariant accounting support up to the task scheduler
  drivers base/arch_topology: allow inlining cpu-invariant accounting support
  drivers base/arch_topology: provide frequency-invariant accounting support
  cpufreq: dt: invoke frequency-invariance setter function
  cpufreq: arm_big_little: invoke frequency-invariance setter function
  ...
2017-11-13 01:34:49 +01:00
Rafael J. Wysocki
4762573b93 Merge branch 'pm-qos'
* pm-qos:
  PM / QoS: Fix device resume latency framework
  PM / QoS: Drop PM_QOS_FLAG_REMOTE_WAKEUP
2017-11-13 01:33:48 +01:00
Rafael J. Wysocki
29aaf90875 Merge branch 'pm-domains'
* pm-domains:
  PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare()
  PM / domains: Rework governor code to be more consistent
  PM / Domains: Remove gpd_dev_ops.active_wakeup() callback
  soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP
  soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP
  ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP
  PM / Domains: Allow genpd users to specify default active wakeup behavior
  PM / Domains: Add support to select performance-state of domains
  PM / Domains: Rename genpd internals from pm_genpd_* to genpd_*
2017-11-13 01:33:35 +01:00
David Howells
b24591e2fc timers: Add a function to start/reduce a timer
Add a function, similar to mod_timer(), that will start a timer if it isn't
running and will modify it if it is running and has an expiry time longer
than the new time.  If the timer is running with an expiry time that's the
same or sooner, no change is made.

The function looks like:

	int timer_reduce(struct timer_list *timer, unsigned long expires);

This can be used by code such as networking code to make it easier to share
a timer for multiple timeouts.  For instance, in upcoming AF_RXRPC code,
the rxrpc_call struct will maintain a number of timeouts:

	unsigned long	ack_at;
	unsigned long	resend_at;
	unsigned long	ping_at;
	unsigned long	expect_rx_by;
	unsigned long	expect_req_by;
	unsigned long	expect_term_by;

each of which is set independently of the others.  With timer reduction
available, when the code needs to set one of the timeouts, it only needs to
look at that timeout and then call timer_reduce() to modify the timer,
starting it or bringing it forward if necessary.  There is no need to refer
to the other timeouts to see which is earliest and no need to take any lock
other than, potentially, the timer lock inside timer_reduce().

Note, that this does not protect against concurrent invocations of any of
the timer functions.

As an example, the expect_rx_by timeout above, which terminates a call if
we don't get a packet from the server within a certain time window, would
be set something like this:

	unsigned long now = jiffies;
	unsigned long expect_rx_by = now + packet_receive_timeout;
	WRITE_ONCE(call->expect_rx_by, expect_rx_by);
	timer_reduce(&call->timer, expect_rx_by);

The timer service code (which might, say, be in a work function) would then
check all the timeouts to see which, if any, had triggered, deal with
those:

	t = READ_ONCE(call->ack_at);
	if (time_after_eq(now, t)) {
		cmpxchg(&call->ack_at, t, now + MAX_JIFFY_OFFSET);
		set_bit(RXRPC_CALL_EV_ACK, &call->events);
	}

and then restart the timer if necessary by finding the soonest timeout that
hasn't yet passed and then calling timer_reduce().

The disadvantage of doing things this way rather than comparing the timers
each time and calling mod_timer() is that you *will* take timer events
unless you can finish what you're doing and delete the timer in time.

The advantage of doing things this way is that you don't need to use a lock
to work out when the next timer should be set, other than the timer's own
lock - which you might not have to take.

[ tglx: Fixed weird formatting and adopted it to pending changes ]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-afs@lists.infradead.org
Link: https://lkml.kernel.org/r/151023090769.23050.1801643667223880753.stgit@warthog.procyon.org.uk
2017-11-12 15:10:27 +01:00
David S. Miller
fdae5f37a8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-12 09:17:05 +09:00
Mat Martineau
39b1752110 net: Remove unused skb_shared_info member
ip6_frag_id was only used by UFO, which has been removed.
ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no
in-tree callers.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 22:09:40 +09:00
Yuchung Cheng
737ff31456 tcp: use sequence distance to detect reordering
Replace the reordering distance measurement in packet unit with
sequence based approach. Previously it trackes the number of "packets"
toward the forward ACK (i.e.  highest sacked sequence)in a state
variable "fackets_out".

Precisely measuring reordering degree on packet distance has not much
benefit, as the degree constantly changes by factors like path, load,
and congestion window. It is also complicated and prone to arcane bugs.
This patch replaces with sequence-based approach that's much simpler.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:53:16 +09:00
Yuchung Cheng
713bafea92 tcp: retire FACK loss detection
FACK loss detection has been disabled by default and the
successor RACK subsumed FACK and can handle reordering better.
This patch removes FACK to simplify TCP loss recovery.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:53:16 +09:00
David S. Miller
f3edacbd69 bpf: Revert bpf_overrid_function() helper changes.
NACK'd by x86 maintainer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:24:55 +09:00
Maciej Żenczykowski
2210d6b2f2 net: ipv6: sysctl to specify IPv6 ND traffic class
Add a per-device sysctl to specify the default traffic class to use for
kernel originated IPv6 Neighbour Discovery packets.

Currently this includes:

  - Router Solicitation (ICMPv6 type 133)
    ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Solicitation (ICMPv6 type 135)
    ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Advertisement (ICMPv6 type 136)
    ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Redirect (ICMPv6 type 137)
    ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()

and if the kernel ever gets around to generating RA's,
it would presumably also include:

  - Router Advertisement (ICMPv6 type 134)
    (radvd daemon could pick up on the kernel setting and use it)

Interface drivers may examine the Traffic Class value and translate
the DiffServ Code Point into a link-layer appropriate traffic
prioritization scheme.  An example of mapping IETF DSCP values to
IEEE 802.11 User Priority values can be found here:

    https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11

The expected primary use case is to properly prioritize ND over wifi.

Testing:
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  0
  jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  255
  jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  34

  jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
  IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
  jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
  length 24, tgt is jzem22.pgc, Flags [solicited]

(based on original change written by Erik Kline, with minor changes)

v2: fix 'suspicious rcu_dereference_check() usage'
    by explicitly grabbing the rcu_read_lock.

Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:13:02 +09:00
Josef Bacik
dd0bb688ea bpf: add a bpf_override_function helper
Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:18:05 +09:00
Jens Axboe
79f720a751 blk-mq: only run the hardware queue if IO is pending
Currently we are inconsistent in when we decide to run the queue. Using
blk_mq_run_hw_queues() we check if the hctx has pending IO before
running it, but we don't do that from the individual queue run function,
blk_mq_run_hw_queue(). This results in a lot of extra and pointless
queue runs, potentially, on flush requests and (much worse) on tag
starvation situations. This is observable just looking at top output,
with lots of kworkers active. For the !async runs, it just adds to the
CPU overhead of blk-mq.

Move the has-pending check into the run function instead of having
callers do it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:55:57 -07:00
Greg Edwards
67f2519fe2 fs: guard_bio_eod() needs to consider partitions
guard_bio_eod() needs to look at the partition capacity, not just the
capacity of the whole device, when determining if truncation is
necessary.

[   60.268688] attempt to access beyond end of device
[   60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
[   60.268693] buffer_io_error: 2 callbacks suppressed
[   60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Cc: stable@vger.kernel.org # v4.13
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:55:57 -07:00
Bart Van Assche
9a95e4ef70 block, nvme: Introduce blk_mq_req_flags_t
Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Bart Van Assche
3a0a529971 block, scsi: Make SCSI quiesce and resume work reliably
The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.

It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after each resume the
fio job was still running:

for ((i=0; i<10; i++)); do
  (
    cd /sys/block/md0/md &&
    while true; do
      [ "$(<sync_action)" = "idle" ] && echo check > sync_action
      sleep 1
    done
  ) &
  pids=($!)
  for d in /sys/class/block/sd*[a-z]; do
    bdev=${d#/sys/class/block/}
    hcil=$(readlink "$d/device")
    hcil=${hcil#../../../}
    echo 4 > "$d/queue/nr_requests"
    echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
    fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \
      --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16       \
      --iodepth_batch=1 --thread --loops=$((2**31)) &
    pids+=($!)
  done
  sleep 1
  echo "$(date) Hibernating ..." >>hibernate-test-log.txt
  systemctl hibernate
  sleep 10
  kill "${pids[@]}"
  echo idle > /sys/block/md0/md/sync_action
  wait
  echo "$(date) Done." >>hibernate-test-log.txt
done

Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Bart Van Assche
c9254f2ddb block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
This flag will be used in the next patch to let the block layer
core know whether or not a SCSI request queue has been quiesced.
A quiesced SCSI queue namely only processes RQF_PREEMPT requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Bart Van Assche
1b6d65a0bf block: Introduce BLK_MQ_REQ_PREEMPT
Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to
blk_get_request_flags().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Bart Van Assche
6a15674d1e block: Introduce blk_get_request_flags()
A side effect of this patch is that the GFP mask that is passed to
several allocation functions in the legacy block layer is changed
from GFP_KERNEL into __GFP_DIRECT_RECLAIM.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Jens Axboe
eb619fdb2d blk-mq: fix issue with shared tag queue re-running
This patch attempts to make the case of hctx re-running on driver tag
failure more robust. Without this patch, it's pretty easy to trigger a
stall condition with shared tags. An example is using null_blk like
this:

modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 hw_queue_depth=1

which sets up 4 devices, sharing the same tag set with a depth of 1.
Running a fio job ala:

[global]
bs=4k
rw=randread
norandommap
direct=1
ioengine=libaio
iodepth=4

[nullb0]
filename=/dev/nullb0
[nullb1]
filename=/dev/nullb1
[nullb2]
filename=/dev/nullb2
[nullb3]
filename=/dev/nullb3

will inevitably end with one or more threads being stuck waiting for a
scheduler tag. That IO is then stuck forever, until someone else
triggers a run of the queue.

Ensure that we always re-run the hardware queue, if the driver tag we
were waiting for got freed before we added our leftover request entries
back on the dispatch list.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Keith Busch
e3d7874dcf nvme: send uevent for some asynchronous events
This will give udev a chance to observe and handle asynchronous event
notifications and clear the log to unmask future events of the same type.
The driver will create a change uevent of the asyncronuos event result
before submitting the next AEN request to the device if a completed AEN
event is of type error, smart, command set or vendor specific,

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Keith Busch
38dabe210f nvme: centralize AEN defines
All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Christoph Hellwig
f00c4d80ff block: pass full fmode_t to blk_verify_command
Use the obvious calling convention.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Christoph Hellwig
d004a5e7d4 block: remove __bio_kmap_atomic
This helper doesn't buy us much over calling kmap_atomic directly.
In fact in the only caller it does a bit of useless work as the
caller already has the bvec at hand, and said caller would even
buggy for a multi-segment bio due to the use of this helper.

So just remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Jens Axboe
83f5f7ed72 block: kill bio_kmap/kunmap_irq()
There are no users of it anymore.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Keith Busch
84fef62d13 nvme: check admin passthru command effects
The NVMe standard provides a command effects log page so the host may
be aware of special requirements it may need to do for a particular
command. For example, the command may need to run with IO quiesced to
prevent timeouts or undefined behavior, or it may change the logical block
formats that determine how the host needs to construct future commands.

This patch saves the nvme command effects log page if the controller
supports it, and performs appropriate actions before and after an admin
passthrough command is completed. If the controller does not support the
command effects log page, the driver will define the effects for known
opcodes. The nvme format and santize are the only commands in this patch
with known effects.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:53:25 -07:00
Mark Brown
704c14554b
Merge remote-tracking branches 'spi/topic/armada', 'spi/topic/axi', 'spi/topic/davinci' and 'spi/topic/fsl-dspi' into spi-next 2017-11-10 21:33:44 +00:00
Mark Brown
50b7baefe3
Merge remote-tracking branches 'regulator/topic/da9211', 'regulator/topic/pfuze100' and 'regulator/topic/tps65218' into regulator-next 2017-11-10 21:33:23 +00:00
Mark Brown
28c426c7ad
Merge remote-tracking branch 'regulator/topic/axp20x' into regulator-next 2017-11-10 21:33:20 +00:00
Mark Brown
242f66c845
Merge remote-tracking branches 'asoc/topic/ac97', 'asoc/topic/ac97-mfd', 'asoc/topic/amd' and 'asoc/topic/arizona-mfd' into asoc-next 2017-11-10 21:31:02 +00:00
Casey Schaufler
f7b53637c0 Audit: remove unused audit_log_secctx function
The function audit_log_secctx() is unused in the upstream kernel.
All it does is wrap another function that doesn't need wrapping.
It claims to give you the SELinux context, but that is not true if
you are using a different security module.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-11-10 16:08:47 -05:00
Keerthy
125b1192bd
regulator: tps65218: remove unused tps_info structure
remove unused tps_info structure.

Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-11-10 12:33:15 +00:00
Marc Zyngier
4f8413a3a7 genirq: Track whether the trigger type has been set
When requesting a shared interrupt, we assume that the firmware
support code (DT or ACPI) has called irqd_set_trigger_type
already, so that we can retrieve it and check that the requester
is being reasonnable.

Unfortunately, we still have non-DT, non-ACPI systems around,
and these guys won't call irqd_set_trigger_type before requesting
the interrupt. The consequence is that we fail the request that
would have worked before.

We can either chase all these use cases (boring), or address it
in core code (easier). Let's have a per-irq_desc flag that
indicates whether irqd_set_trigger_type has been called, and
let's just check it when checking for a shared interrupt.
If it hasn't been set, just take whatever the interrupt
requester asks.

Fixes: 382bd4de61 ("genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Petr Cvek <petrcvekcz@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-10 09:49:48 +00:00
Girish Moodalbail
54985120a1 net: fix incorrect comment with regard to VLAN packet handling
The commit bcc6d47903 ("net: vlan: make non-hw-accel rx path similar
to hw-accel") unified accel and non-accel path for VLAN RX. With that
fix we do not register any packet_type handler for VLANs anymore, so fix
the incorrect comment.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-10 18:04:21 +09:00
Juergen Gross
f72e38e8ec x86/virt, x86/platform: Merge 'struct x86_hyper' into 'struct x86_platform' and 'struct x86_init'
Instead of x86_hyper being either NULL on bare metal or a pointer to a
struct hypervisor_x86 in case of the kernel running as a guest merge
the struct into x86_platform and x86_init.

This will remove the need for wrappers making it hard to find out what
is being called. With dummy functions added for all callbacks testing
for a NULL function pointer can be removed, too.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: devel@linuxdriverproject.org
Cc: haiyangz@microsoft.com
Cc: kvm@vger.kernel.org
Cc: kys@microsoft.com
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: rusty@rustcorp.com.au
Cc: sthemmin@microsoft.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20171109132739.23465-2-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-10 10:03:12 +01:00
Ingo Molnar
91a6a6cfee Merge branch 'linus' into x86/asm, to resolve conflict
Conflicts:
	arch/x86/mm/mem_encrypt.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-10 08:06:47 +01:00
David S. Miller
4fdc3023c6 Merge tag 'mlx5-updates-2017-11-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2017-11-09

This series introduces vlan offloads related improvements for mlx5
ethernet netdev driver, from Gal Pressman.

 - Add support for 802.1ad vlan filter
 - Add support for 802.1ad vlan insertion
 - Add vlan offloads statistics to ethtool (inserted/stripped vlans)
 - CHECKSUM_COMPLETE support for vlan traffic when vlan stripping is off! (Finally)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-10 13:44:46 +09:00
Arnd Bergmann
e609a6b851 sysctl: add register_sysctl() dummy helper
register_sysctl() has been around for five years with commit
fea478d410 ("sysctl: Add register_sysctl for normal sysctl users") but
now that arm64 started using it, I ran into a compile error:

  arch/arm64/kernel/armv8_deprecated.c: In function 'register_insn_emulation_sysctl':
  arch/arm64/kernel/armv8_deprecated.c:257:2: error: implicit declaration of function 'register_sysctl'

This adds a inline function like we already have for
register_sysctl_paths() and register_sysctl_table().

Link: http://lkml.kernel.org/r/20171106133700.558647-1-arnd@arndb.de
Fixes: 38b9aeb32f ("arm64: Port deprecated instruction emulation to new sysctl interface")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Alex Benne <alex.bennee@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-09 17:58:40 -08:00
David S. Miller
4dc6758d78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Simple cases of overlapping changes in the packet scheduler.

Must easier to resolve this time.

Which probably means that I screwed it up somehow.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-10 10:00:18 +09:00
Chris Wilson
5f72db5916 dma-buf/fence: Sparse wants __rcu on the object itself
In order to silence sparse in dma_fence_get_rcu_safe(), we need to mark
the incoming fence object as being RCU protected and not the pointer to
the object.

Cc: Dave Airlie <airlied@redhat.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: linaro-mm-sig@lists.linaro.org
Cc: linux-media@vger.kernel.org
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171102200336.23347-3-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
[vsyrjala: s/silent/silence/ in commit message]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-11-09 20:32:53 +02:00
Linus Torvalds
d1041cdc60 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix use-after-free in IPSEC input parsing, desintation address
    pointer was loaded before pskb_may_pull() which can change the SKB
    data pointers. From Florian Westphal.

 2) Stack out-of-bounds read in xfrm_state_find(), from Steffen
    Klassert.

 3) IPVS state of SKB is not properly reset when moving between
    namespaces, from Ye Yin.

 4) Fix crash in asix driver suspend and resume, from Andrey Konovalov.

 5) Don't deliver ipv6 l2tp tunnel packets to ipv4 l2tp tunnels, and
    vice versa, from Guillaume Nault.

 6) Fix DSACK undo on non-dup ACKs, from Priyaranjan Jha.

 7) Fix regression in bond_xmit_hash()'s behavior after the TCP port
    selection changes back in 4.2, from Hangbin Liu.

 8) Two divide by zero bugs in USB networking drivers when parsing
    descriptors, from Bjorn Mork.

 9) Fix bonding slaves being stuck in BOND_LINK_FAIL state, from Jay
    Vosburgh.

10) Missing skb_reset_mac_header() in qmi_wwan, from Kristian Evensen.

11) Fix the destruction of tc action object races properly, from Cong
    Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
  cls_u32: use tcf_exts_get_net() before call_rcu()
  cls_tcindex: use tcf_exts_get_net() before call_rcu()
  cls_rsvp: use tcf_exts_get_net() before call_rcu()
  cls_route: use tcf_exts_get_net() before call_rcu()
  cls_matchall: use tcf_exts_get_net() before call_rcu()
  cls_fw: use tcf_exts_get_net() before call_rcu()
  cls_flower: use tcf_exts_get_net() before call_rcu()
  cls_flow: use tcf_exts_get_net() before call_rcu()
  cls_cgroup: use tcf_exts_get_net() before call_rcu()
  cls_bpf: use tcf_exts_get_net() before call_rcu()
  cls_basic: use tcf_exts_get_net() before call_rcu()
  net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()
  Revert "net_sched: hold netns refcnt for each action"
  net: usb: asix: fill null-ptr-deref in asix_suspend
  Revert "net: usb: asix: fill null-ptr-deref in asix_suspend"
  qmi_wwan: Add missing skb_reset_mac_header-call
  bonding: fix slave stuck in BOND_LINK_FAIL state
  qrtr: Move to postcore_initcall
  net: qmi_wwan: fix divide by 0 on bad descriptors
  net: cdc_ether: fix divide by 0 on bad descriptors
  ...
2017-11-09 09:31:34 -08:00
Miklos Szeredi
f121aadede vfs: add path_put_init()
This one initializes the struct path to all zeroes so that multiple
path_put_init() on the path is safe.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-11-09 10:23:28 +01:00