Commit graph

54651 commits

Author SHA1 Message Date
Arianna Avanzini
e21b7a0b98 block, bfq: add full hierarchical scheduling and cgroups support
Add complete support for full hierarchical scheduling, with a cgroups
interface. Full hierarchical scheduling is implemented through the
'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues
associated with processes, and groups are represented in general by
entities. Given the bfq_queues associated with the processes belonging
to a given group, the entities representing these queues are sons of
the entity representing the group. At higher levels, if a group, say
G, contains other groups, then the entity representing G is the parent
entity of the entities representing the groups in G.

Hierarchical scheduling is performed as follows: if the timestamps of
a leaf entity (i.e., of a bfq_queue) change, and such a change lets
the entity become the next-to-serve entity for its parent entity, then
the timestamps of the parent entity are recomputed as a function of
the budget of its new next-to-serve leaf entity. If the parent entity
belongs, in its turn, to a group, and its new timestamps let it become
the next-to-serve for its parent entity, then the timestamps of the
latter parent entity are recomputed as well, and so on. When a new
bfq_queue must be set in service, the reverse path is followed: the
next-to-serve highest-level entity is chosen, then its next-to-serve
child entity, and so on, until the next-to-serve leaf entity is
reached, and the bfq_queue that this entity represents is set in
service.

Writeback is accounted for on a per-group basis, i.e., for each group,
the async I/O requests of the processes of the group are enqueued in a
distinct bfq_queue, and the entity associated with this queue is a
child of the entity associated with the group.

Weights can be assigned explicitly to groups and processes through the
cgroups interface, differently from what happens, for single
processes, if the cgroups interface is not used (as explained in the
description of the previous patch). In particular, since each node has
a full scheduler, each group can be assigned its own weight.

Signed-off-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19 08:30:26 -06:00
Javier González
4af3f75d79 lightnvm: allow to init targets on factory mode
Target initialization has two responsibilities: creating the target
partition and instantiating the target. This patch enables to create a
factory partition (e.g., do not trigger recovery on the given target).
This is useful for target development and for being able to restore the
device state at any moment in time without requiring a full-device
erase.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González
a7737f39c7 lightnvm: rename scrambler controller hint
According to the OCSSD 1.2 specification, the 0x200 hint enables the
media scrambler for the read/write opcode, providing that the controller
has been correctly configured by the firmware. Rename the macro to
represent this meaning.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González
17912c49ed lightnvm: submit erases using the I/O path
Until now erases have been submitted as synchronous commands through a
dedicated erase function. In order to enable targets implementing
asynchronous erases, refactor the erase path so that it uses the normal
async I/O submission functions. If a target requires sync I/O, it can
implement it internally. Also, adapt rrpc to use the new erase path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Fixed spelling error.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Omar Sandoval
c05f8525f6 blk-mq-sched: make completed_request() callback more useful
Currently, this callback is called right after put_request() and has no
distinguishable purpose. Instead, let's call it before put_request() as
soon as I/O has completed on the request, before we account it in
blk-stat. With this, Kyber can enable stats when it sees a latency
outlier and make sure the outlier gets accounted.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-14 14:06:57 -06:00
Omar Sandoval
5b72727299 blk-mq: export helpers
blk_mq_finish_request() is required for schedulers that define their own
put_request(). blk_mq_run_hw_queue() is required for schedulers that
hold back requests to be run later.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-14 14:06:55 -06:00
Omar Sandoval
c05e667337 sbitmap: add sbitmap_get_shallow() operation
This operation supports the use case of limiting the number of bits that
can be allocated for a given operation. Rather than setting aside some
bits at the end of the bitmap, we can set aside bits in each word of the
bitmap. This means we can keep the allocation hints spread out and
support sbitmap_resize() nicely at the cost of lower granularity for the
allowed depth.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-14 14:06:52 -06:00
Christoph Hellwig
8425339492 remove the mg_disk driver
This drivers was added in 2008, but as far as a I can tell we never had a
single platform that actually registered resources for the platform driver.

It's also been unmaintained for a long time and apparently has a ATA mode
that can be driven using the IDE/libata subsystem.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-14 14:00:49 -06:00
Christoph Hellwig
48920ff2a5 block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig
cb365b9675 block: add a new BLKDEV_ZERO_NOFALLBACK flag
This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout if
the caller doesn't want them.

Also clean up the convoluted check for the return condition that this
new flag is added to.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig
d928be9f85 block: add a REQ_NOUNMAP flag for REQ_OP_WRITE_ZEROES
If this flag is set logical provisioning capable device should
release space for the zeroed blocks if possible, if it is not set
devices should keep the blocks anchored.

Also remove an out of sync kerneldoc comment for a static function
that would have become even more out of data with this change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig
ee472d835c block: add a flags argument to (__)blkdev_issue_zeroout
Turn the existing discard flag into a new BLKDEV_ZERO_UNMAP flag with
similar semantics, but without referring to diѕcard.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig
ac62d6208a dm: support REQ_OP_WRITE_ZEROES
Copy & paste from the REQ_OP_WRITE_SAME code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig
1d62ac1363 block: renumber REQ_OP_WRITE_ZEROES
Make life easy for implementations that needs to send a data buffer
to the device (e.g. SCSI) by numbering it as a data out command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Omar Sandoval
ee056f9812 blk-mq-sched: provide hooks for initializing hardware queue data
Schedulers need to be informed when a hardware queue is added or removed
at runtime so they can allocate/free per-hardware queue data. So,
replace the blk_mq_sched_init_hctx_data() helper, which only makes sense
at init time, with .init_hctx() and .exit_hctx() hooks.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:45:41 -06:00
Jens Axboe
65f619d253 Merge branch 'for-linus' into for-4.12/block
We've added a considerable amount of fixes for stalls and issues
with the blk-mq scheduling in the 4.11 series since forking
off the for-4.12/block branch. We need to do improvements on
top of that for 4.12, so pull in the previous fixes to make
our lives easier going forward.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:45:20 -06:00
Bart Van Assche
6d8c6c0f97 blk-mq: Restart a single queue if tag sets are shared
To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:40:09 -06:00
Bart Van Assche
7587a5ae7e blk-mq: Introduce blk_mq_delay_run_hw_queue()
Introduce a function that runs a hardware queue unconditionally
after a delay. Note: there is already a function that stops and
restarts a hardware queue after a delay, namely blk_mq_delay_queue().

This function will be used in the next patch in this series.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:27:06 -06:00
NeilBrown
fbbaf700e7 block: trace completion of all bios.
Currently only dm and md/raid5 bios trigger
trace_block_bio_complete().  Now that we have bio_chain() and
bio_inc_remaining(), it is not possible, in general, for a driver to
know when the bio is really complete.  Only bio_endio() knows that.

So move the trace_block_bio_complete() call to bio_endio().

Now trace_block_bio_complete() pairs with trace_block_bio_queue().
Any bio for which a 'queue' event is traced, will subsequently
generate a 'complete' event.

There are a few cases where completion tracing is not wanted.
1/ If blk_update_request() has already generated a completion
   trace event at the 'request' level, there is no point generating
   one at the bio level too.  In this case the bi_sector and bi_size
   will have changed, so the bio level event would be wrong

2/ If the bio hasn't actually been queued yet, but is being aborted
   early, then a trace event could be confusing.  Some filesystems
   call bio_endio() but do not want tracing.

3/ The bio_integrity code interposes itself by replacing bi_end_io,
   then restoring it and calling bio_endio() again.  This would produce
   two identical trace events if left like that.

To handle these, we introduce a flag BIO_TRACE_COMPLETION and only
produce the trace event when this is set.
We address point 1 above by clearing the flag in blk_update_request().
We address point 2 above by only setting the flag when
generic_make_request() is called.
We address point 3 above by clearing the flag after generating a
completion event.

When bio_split() is used on a bio, particularly in blk_queue_split(),
there is an extra complication.  A new bio is split off the front, and
may be handle directly without going through generic_make_request().
The old bio, which has been advanced, is passed to
generic_make_request(), so it will trigger a trace event a second
time.
Probably the best result when a split happens is to see a single
'queue' event for the whole bio, then multiple 'complete' events - one
for each component.  To achieve this was can:
- copy the BIO_TRACE_COMPLETION flag to the new bio in bio_split()
- avoid generating a 'queue' event if BIO_TRACE_COMPLETION is already set.
This way, the split-off bio won't create a queue event, the original
won't either even if it re-submitted to generic_make_request(),
but both will produce completion events, each for their own range.

So if generic_make_request() is called (which generates a QUEUED
event), then bi_endio() will create a single COMPLETE event for each
range that the bio is split into, unless the driver has explicitly
requested it not to.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 09:40:52 -06:00
NeilBrown
dbde775cdb block: simple improvements for bio->flags
The comment for the 'flags' field of 'bio' mentions
"command" which is no longer stored there, and doesn't
mention the bvec pool number, which is.

BIO_RESET_BITS is set in such a way that it would need to be
updated if new bits were added, which is easy to miss.

BVEC_POOL_BITS is larger than needed.  The BVEC_POOL_IDX()
ranges from 0 to 6, so 3 bits are sufficient.

This patch make improvements in each of these areas.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 09:39:29 -06:00
Omar Sandoval
54d5329d42 blk-mq-sched: fix crash in switch error path
In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
back to the original scheduler. However, at this point, we've already
torn down the original scheduler's tags, so this causes a crash. Doing
the fallback like the legacy elevator path is much harder for mq, so fix
it by just falling back to none, instead.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 08:56:48 -06:00
Jens Axboe
1dd5198b2d block: move timeout field in struct request to pack better
After commit 64c7f1d157, we went from 1 to 2 holes in my
test setup. If we move the timeout field a bit, we remove
both of those holes and shrink struct request by 8 bytes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-05 12:16:38 -06:00
Christoph Hellwig
64c7f1d157 block, scsi: move the retries field to struct scsi_request
Instead of bloating the generic struct request with it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-05 12:05:08 -06:00
Bart Van Assche
f2fbc9dd78 blk-mq: Remove blk_mq_queue_data.list
The block layer core sets blk_mq_queue_data.list but no block
drivers read that member. Hence remove it and also the code that
is used to set this member.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-05 09:40:15 -06:00
James Smart
62eeacb0e0 nvme_fc: Clean up host fcpio done status handling
As Dan Carpenter pointed out: mixing 16-bit nvme status with 32-bit
error status from driver. Corrected comment on fcp request struct
status field, and converted done routine to explicitly set nvme status
codes for nvme status.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-04 09:48:23 -06:00
James Smart
0f222ccce3 nvme_fc: Sync FC-NVME header with standard
Update FC-NVME definitions to match FC-NVME r1.14 (16-020vB) plus
change voted in by 2/22 FC-NVME Adhoc (see HOSTID below).

Includes the following:
- Addition of "status_code" field to ERSP IU
- Addition of FC-NVME LS RJT reason_codes and reason_explanations
- CreateAssociation payload, HostID field shortened to 16 bytes

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-04 09:48:23 -06:00
Sagi Grimberg
b1a951fe46 net/utils: generic inet_pton_with_scope helper
Several locations in the stack need to handle ipv4/ipv6
(with scope) and port strings conversion to sockaddr.
Add a helper that takes either AF_INET, AF_INET6 or
AF_UNSPEC (for wildcard) to centralize this handling.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-04 09:48:23 -06:00
Roland Dreier
bf17aa36c0 nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
The enum values for QPTYPE, PRTYPE and CMS are off by 1 from the
values defined in figure 42 of the NVM Express over Fabrics 1.0:

    http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605-1.pdf

Fix our enums to match the final spec.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-04-02 10:24:07 +03:00
Jens Axboe
d708f0d502 Revert "blkcg: allocate struct blkcg_gq outside request queue spinlock"
I inadvertently applied the v5 version of this patch, whereas
the agreed upon version was v5. Revert this one so we can apply
the right one.

This reverts commit 7fc6b87a9f.
2017-03-29 11:25:48 -06:00
Omar Sandoval
334335d2f7 block: warn if sharing request queue across gendisks
Now that the remaining drivers have been converted to one request queue
per gendisk, let's warn if a request queue gets registered more than
once. This will catch future drivers which might do it inadvertently or
any old drivers that I may have missed.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29 08:09:08 -06:00
Ming Lei
1671d522cd block: rename blk_mq_freeze_queue_start()
As the .q_usage_counter is used by both legacy and
mq path, we need to block new I/O if queue becomes
dead in blk_queue_enter().

So rename it and we can use this function in both
paths.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-29 08:03:42 -06:00
Tahsin Erdogan
7fc6b87a9f blkcg: allocate struct blkcg_gq outside request queue spinlock
blkg_conf_prep() currently calls blkg_lookup_create() while holding
request queue spinlock. This means allocating memory for struct
blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM
failures in call paths like below:

  pcpu_alloc+0x68f/0x710
  __alloc_percpu_gfp+0xd/0x10
  __percpu_counter_init+0x55/0xc0
  cfq_pd_alloc+0x3b2/0x4e0
  blkg_alloc+0x187/0x230
  blkg_create+0x489/0x670
  blkg_lookup_create+0x9a/0x230
  blkg_conf_prep+0x1fb/0x240
  __cfqg_set_weight_device.isra.105+0x5c/0x180
  cfq_set_weight_on_dfl+0x69/0xc0
  cgroup_file_write+0x39/0x1c0
  kernfs_fop_write+0x13f/0x1d0
  __vfs_write+0x23/0x120
  vfs_write+0xc2/0x1f0
  SyS_write+0x44/0xb0
  entry_SYSCALL_64_fastpath+0x18/0xad

In the code path above, percpu allocator cannot call vmalloc() due to
queue spinlock.

A failure in this call path gives grief to tools which are trying to
configure io weights. We see occasional failures happen shortly after
reboots even when system is not under any memory pressure. Machines
with a lot of cpus are more vulnerable to this condition.

Update blkg_create() function to temporarily drop the rcu and queue
locks when it is allowed by gfp mask.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-28 15:59:04 -06:00
Linus Torvalds
050fc52d83 All x86-specific, apart from some arch-independent syzkaller fixes.
v1->v2: added one more Reviewed-by
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEbBAABAgAGBQJY2lUOAAoJEL/70l94x66D8awH9joMSpLQV2xoJfq4MbAAevpe
 UvSjeffFxdEzmUcBH8p23l3Fp4jik9wklXSquxQPXf8TgQ7Lgu1Pan6+UFpB8Aaq
 sZNdYyaydYumZpnEVUUtgzIY/fpgifechCqXzizu/EmQDZBrbLCJ7Pr86WSLZX5m
 8fBfOKtymu9sP9SRbDL5Wsx/V5YHnV0oU6iBwd2wWnoOyn7LF2dLtjqW55jE8910
 ZkhnJ2r+nhvxAXe/Qr9GrLGtp2bJQFgzJ6Qx19U5a3u3DEMAJV3NMorum9YLQPTq
 J/jl+1fSERspRuJC/Lr0/+EAF7rGLfpJIa1nNNJi5uFbV0ABnMDBNL3Vsp0x2Q==
 =aR3K
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "All x86-specific, apart from some arch-independent syzkaller fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: cleanup the page tracking SRCU instance
  KVM: nVMX: fix nested EPT detection
  KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
  KVM: kvm_io_bus_unregister_dev() should never fail
  KVM: VMX: Fix enable VPID conditions
  KVM: nVMX: Fix nested VPID vmx exec control
  KVM: x86: correct async page present tracepoint
  kvm: vmx: Flush TLB when the APIC-access address changes
  KVM: x86: use pic/ioapic destructor when destroy vm
  KVM: x86: check existance before destroy
  KVM: x86: clear bus pointer when destroyed
  KVM: Documentation: document MCE ioctls
  KVM: nVMX: don't reset kvm mmu twice
  PTP: fix ptr_ret.cocci warnings
  kvm: fix usage of uninit spinlock in avic_vm_destroy()
  KVM: VMX: downgrade warning on unexpected exit code
2017-03-28 11:33:34 -07:00
Shaohua Li
b9147dd1ba blk-throttle: add a mechanism to estimate IO latency
User configures latency target, but the latency threshold for each
request size isn't fixed. For a SSD, the IO latency highly depends on
request size. To calculate latency threshold, we sample some data, eg,
average latency for request size 4k, 8k, 16k, 32k .. 1M. The latency
threshold of each request size will be the sample latency (I'll call it
base latency) plus latency target. For example, the base latency for
request size 4k is 80us and user configures latency target 60us. The 4k
latency threshold will be 80 + 60 = 140us.

To sample data, we calculate the order base 2 of rounded up IO sectors.
If the IO size is bigger than 1M, it will be accounted as 1M. Since the
calculation does round up, the base latency will be slightly smaller
than actual value. Also if there isn't any IO dispatched for a specific
IO size, we will use the base latency of smaller IO size for this IO
size.

But we shouldn't sample data at any time. The base latency is supposed
to be latency where disk isn't congested, because we use latency
threshold to schedule IOs between cgroups. If disk is congested, the
latency is higher, using it for scheduling is meaningless. Hence we only
do the sampling when block throttling is in the LOW limit, with
assumption disk isn't congested in such state. If the assumption isn't
true, eg, low limit is too high, calculated latency threshold will be
higher.

Hard disk is completely different. Latency depends on spindle seek
instead of request size. Currently this feature is SSD only, we probably
can use a fixed threshold like 4ms for hard disk though.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-28 08:02:20 -06:00
Shaohua Li
88eeca495b block: track request size in blk_issue_stat
Currently there is no way to know the request size when the request is
finished. Next patch will need this info. We could add extra field to
record the size, but blk_issue_stat has enough space to record it, so
this patch just overloads blk_issue_stat. With this, we will have 49bits
to track time, which still is very long time.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-28 08:02:20 -06:00
Shaohua Li
9e234eeafb blk-throttle: add a simple idle detection
A cgroup gets assigned a low limit, but the cgroup could never dispatch
enough IO to cross the low limit. In such case, the queue state machine
will remain in LIMIT_LOW state and all other cgroups will be throttled
according to low limit. This is unfair for other cgroups. We should
treat the cgroup idle and upgrade the state machine to lower state.

We also have a downgrade logic. If the state machine upgrades because of
cgroup idle (real idle), the state machine will downgrade soon as the
cgroup is below its low limit. This isn't what we want. A more
complicated case is cgroup isn't idle when queue is in LIMIT_LOW. But
when queue gets upgraded to lower state, other cgroups could dispatch
more IO and this cgroup can't dispatch enough IO, so the cgroup is below
its low limit and looks like idle (fake idle). In this case, the queue
should downgrade soon. The key to determine if we should do downgrade is
to detect if cgroup is truely idle.

Unfortunately it's very hard to determine if a cgroup is real idle. This
patch uses the 'think time check' idea from CFQ for the purpose. Please
note, the idea doesn't work for all workloads. For example, a workload
with io depth 8 has disk utilization 100%, hence think time is 0, eg,
not idle. But the workload can run higher bandwidth with io depth 16.
Compared to io depth 16, the io depth 8 workload is idle. We use the
idea to roughly determine if a cgroup is idle.

We treat a cgroup idle if its think time is above a threshold (by
default 1ms for SSD and 100ms for HD). The idea is think time above the
threshold will start to harm performance. HD is much slower so a longer
think time is ok.

The patch (and the latter patches) uses 'unsigned long' to track time.
We convert 'ns' to 'us' with 'ns >> 10'. This is fast but loses
precision, should not a big deal.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-28 08:02:20 -06:00
Linus Torvalds
0dc82fa59b Char/Misc driver fixes for 4.11-rc4
A smattering of different small fixes for some random driver subsystems.
 Nothing all that major, just resolutions for reported issues and bugs.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWNedPQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykY+wCeL0dFw/ney0sJ6s7HsmXu3uxFGyoAoIIXL7AP
 48YAht+BOOmBzagXIKbw
 =EFxm
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "A smattering of different small fixes for some random driver
  subsystems. Nothing all that major, just resolutions for reported
  issues and bugs.

  All have been in linux-next with no reported issues"

* tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  extcon: int3496: Set the id pin to direction-input if necessary
  extcon: int3496: Use gpiod_get instead of gpiod_get_index
  extcon: int3496: Add dependency on X86 as it's Intel specific
  extcon: int3496: Add GPIO ACPI mapping table
  extcon: int3496: Rename GPIO pins in accordance with binding
  vmw_vmci: handle the return value from pci_alloc_irq_vectors correctly
  ppdev: fix registering same device name
  parport: fix attempt to write duplicate procfiles
  auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches
  Drivers: hv: vmbus: Don't leak memory when a channel is rescinded
  Drivers: hv: vmbus: Don't leak channel ids
  Drivers: hv: util: don't forget to init host_ts.lock
  Drivers: hv: util: move waiting for release to hv_utils_transport itself
  vmbus: remove hv_event_tasklet_disable/enable
  vmbus: use rcu for per-cpu channel list
  mei: don't wait for os version message reply
  mei: fix deadlock on mei reset
  intel_th: pci: Add Gemini Lake support
  intel_th: pci: Add Denverton SOC support
  intel_th: Don't leak module refcount on failure to activate
  ...
2017-03-26 11:15:54 -07:00
Linus Torvalds
53b4d5911d IIO fixes for 4.11-rc4
Here are some small IIO driver fixes for 4.11-rc4 that resolve a number
 of tiny reported issues.  All of these have been in linux-next for a
 while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWNeeCg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynXYwCcC+ZqoQ2zyAXI9NRNzADGqLgVLDkAoL5emrPe
 10VBu7ocPtuAI12QdcGI
 =xfgR
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull IIO driver fixes from Greg KH:
 "Here are some small IIO driver fixes for 4.11-rc4 that resolve a
  number of tiny reported issues. All of these have been in linux-next
  for a while with no reported issues"

* tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: imu: st_lsm6dsx: fix FIFO_CTRL2 overwrite during watermark configuration
  iio: adc: ti_am335x_adc: fix fifo overrun recovery
  iio: sw-device: Fix config group initialization
  iio: magnetometer: ak8974: remove incorrect __exit markups
  iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
2017-03-26 11:02:00 -07:00
Linus Torvalds
e431e0e427 USB/PHY fixes for 4.11-rc4
Here are a number of small USB and PHY driver fixes for 4.11-rc4.
 Nothing major here, just an bunch of small fixes, and a handfull of good
 fixes from Johan for devices with crazy descriptors.  There are a few
 new device ids in here as well.
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWNefwQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykewwCg1P1HEZV2wLJSalsPxduIKIRLcmkAnRcE0H31
 2egrp1seSGQumYcGQUJ/
 =2fH+
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/PHY fixes from Greg KH:
 "Here are a number of small USB and PHY driver fixes for 4.11-rc4.

  Nothing major here, just an bunch of small fixes, and a handfull of
  good fixes from Johan for devices with crazy descriptors. There are a
  few new device ids in here as well.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  usb: gadget: f_hid: fix: Don't access hidg->req without spinlock held
  usb: gadget: udc: remove pointer dereference after free
  usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
  usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
  usb: gadget: acm: fix endianness in notifications
  usb: dwc3: gadget: delay unmap of bounced requests
  USB: serial: qcserial: add Dell DW5811e
  usb: hub: Fix crash after failure to read BOS descriptor
  ACM gadget: fix endianness in notifications
  USB: usbtmc: fix probe error path
  USB: usbtmc: add missing endpoint sanity check
  USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
  usb: musb: fix possible spinlock deadlock
  usb: musb: dsps: fix iounmap in error and exit paths
  usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
  usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
  uwb: i1480-dfu: fix NULL-deref at probe
  uwb: hwa-rc: fix NULL-deref at probe
  USB: wusbcore: fix NULL-deref at probe
  USB: uss720: fix NULL-deref at probe
  ...
2017-03-26 10:52:52 -07:00
Linus Torvalds
a643f9054c A code cleanup and bugfix for fs/crypto.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAljW4wYACgkQ8vlZVpUN
 gaPYugf9ExFbJhN+iYqUVbGXPvlr5VpEtDeVt7IfO3a37hqCEQ0IEPzksNIfUFul
 B8/rYXpz0B5gqCJeo66CGLkb1SVvSoSKCq9/BTQtugohxM7sGxDFTmdB+A+u0QJH
 leILfaMFuj0DhVOrdYVpGh7e1XPgSTUWy6/G42OJqf3SV2WxGRJtyBfmghZxEdiY
 XYCGqjq47yOIPvzB+ufKe1hnphKMgxlHeuPvByzPCvOs58GlxAYR3Ycuvjc/nz+8
 QVlAEPpGhf9ytEXELsxq/ZbsNj9xtXsNAzkAoMK+xZ2JCxIHRcS1ay/iAwxw+d9r
 bnlpI+8tQ79GIGCv3cusJSwq7j1iuQ==
 =wPlW
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypto fixes from Ted Ts'o:
 "A code cleanup and bugfix for fs/crypto"

* tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: eliminate ->prepare_context() operation
  fscrypt: remove broken support for detecting keyring key revocation
2017-03-25 15:36:56 -07:00
Linus Torvalds
a00da40fc7 hwmon fixes for v4.11-rc4
Bug fixes in asus_atk0110, it87 and max31790 drivers.
 Added missing API definition to hwmon core.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJY1uZbAAoJEMsfJm/On5mB/vAP/1DjalKgmwB/rAHFNqmGzrmm
 nptUQWtpeeThT/8imZ+uMWkADq/CRTkCIXrNt/V5PgLUvDAgFOgw6OnfbHIbw4h3
 BG6XU959UmHoQCksSJ+r24xzMBXIZ3c+hlM1mAfFgA52r1m+S7L7a+5D3X2lfcqc
 TmaqslBL7VNvPKlKG1JzFM6ZTIia3zGFprUgO+j379akk4u4MKnTk2x8hVHjxLfK
 5t3gpCqCVTlycnl4qdnprxWVZg40nU7wU0Cr026aaFzzJGjDt6Zz5XFNJWRwnYrn
 5Anb62mFZLJui38wkIIKnq+o3pKjEy7lfN7Kx2dP31Q1OC38AsgjGoySsxgwrksR
 FektDrkNOoTK6kStXsgua0dGHuxLZHoABGzxx/FjKyrvlh0mUrMUzLxonpKDVyuS
 i3SPVJmg+b8xeam9XcA2JCKnd6g4GVOj+4EJD2LMHxUUFvGP9Ll9jMEMhYTcazhV
 gzu3lE/wyaRFByuh+kX3wuqqhkv3bAaJ0nsXDFBaP68/obpdPhBCm6R8wCuuI/Rt
 vCzjOqQLP247Awh4fAVjZ87BxobTok9vml58kiDSsbZ0fP0NshiIX2vCqLaOSeDG
 0HrbjEpH1UARa9Kzk+2rZvve/agnIXR7c2mw+ZKzfZhc7eGOxC440nPUGYeijJa0
 E8irmld7/poZuZ5LjBtb
 =XQKx
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - bug fixes in asus_atk0110, it87 and max31790 drivers

 - added missing API definition to hwmon core

* tag 'hwmon-for-linus-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (asus_atk0110) fix uninitialized data access
  hwmon: Add missing HWMON_T_ALARM
  hwmon: (it87) Avoid registering the same chip on both SIO addresses
  hwmon: (max31790) Set correct PWM value
2017-03-25 15:31:50 -07:00
Eric Biggers
869ab90f0a block: constify struct blk_integrity_profile
blk_integrity_profile's are never modified, so mark them 'const' so that
they are placed in .rodata and benefit from memory protection.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-24 20:34:39 -06:00
Linus Torvalds
2056b7c7df ARM: SoC fixes for v4.11
- A couple of OMAP 4.11 regression fixes, including a boot regression for
   SmartReflex, hypervisor mode in thumb2 mode, and reference counting of
   device nodes
 
 - A fix for cpu_idle on at91
 
 - Minor DT fixes on across several platforms:
   sunxi, bcm53xx, at91, nsp, ns2, ux500, omap
 
 - A fix to correct an API change in the reset controllers
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAWNWBrGCrR//JCVInAQJE/xAArSDx+6i7On53k3QfxkR1mONHrEOlXhlP
 VbTeBfkYtYB08Pgq99QVJE5qYK2N5w2664UNbgs+KKYnYJTNl0G6EW1l7oj+21M+
 xg3e6ctdfeztYeU34q7D2TxzP/EBRHdbpEuT+JHuQOWFYNfGAj2vSt2cdPUael/Q
 D4s4BgeM1dIOzN3z3AvuQpqIhLedVRFAGYAaalKXiwREYUVDgnqhHCPWChVnSQMR
 gnNDYcb5ZxCaELH9gkdVqyfdlScw9juKMg5v7e7KizhBUqOGBT0bguLC9Kfh3mDO
 tIJGwkuqWvc5tAKuAcGklIVOzP8Wtcq0ObrFzLczy6Waf+6aaQl5J8Uw85UR/zg4
 44AHzk++apXTCDlrRzQZIkvaN9TmOBr65qrW1rnqVN72FRS3arDuMV1rEFRdZy2x
 riXmJmENWo+sakst4fTS0QY+/GlPDB8Md4X++Vkl3DgWFgiiBTcPbJCEazsFVip+
 QQzWfXTSB98bCVUuYr5eZgMlPYHrd1ZQIbgzlzIkdUuZ4XCe4MGw88km6TqviBwf
 dduNKnFctkrLAgM3V8rXZBQsZqJHRDmpOfSZ+9XtGYggy83g5FLbkp9h6Ws66SbI
 KCgLzV0THCs/gKLMeDqFerO1xQzxN84pd+YKetnD1RU+5bo98DMUkqgFn+cZyCFA
 dlImlZcN7Pc=
 =zKIg
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:

 - a couple of OMAP 4.11 regression fixes, including a boot regression
   for SmartReflex, hypervisor mode in thumb2 mode, and reference
   counting of device nodes

 - a fix for cpu_idle on at91

 - minor DT fixes on across several platforms: sunxi, bcm53xx, at91,
   nsp, ns2, ux500, omap

 - a fix to correct an API change in the reset controllers

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (22 commits)
  arm64: dts: NS2: Add dma-coherent to relevant DT entries
  reset: fix optional reset_control_get stubs to return NULL
  ARM: sun8i: a23/a33: drop bl_en_pin GPIO pinmux in reference design DTSI
  ARM: dts: sun7i: lamobo-r1: Fix CPU port RGMII settings
  ARM: dts: NSP: GPIO reboot open-source
  ARM: at91: pm: cpu_idle: switch DDR to power-down mode
  ARM: dts: add the AB8500 clocks to the device tree
  ARM: dts: imx6sx-udoo-neo: Fix reboot hang
  ARM: sun8i: Fix the mali clock rate
  ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
  ARM: dts: BCM5301X: Fix memory start address
  ARM: dts: BCM5301X: Fix UARTs on bcm953012k
  Revert "ARM: at91/dt: sama5d2: Use new compatible for ohci node"
  ARM: OMAP2+: Release device node after it is no longer needed.
  ARM: OMAP2+: Fix device node reference counts
  ARM: OMAP2+: Remove legacy gpmc-nand.c
  ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
  ARM: dts: am335x-pcm953: Fix legacy wakeup source binding
  ARM: omap2plus_defconfig: Enable INPUT_MOUSEDEV as loadable modules
  ARM: dts: am57xx-idk: tpic2810 is on I2C bus, not SPI
  ...
2017-03-24 14:32:21 -07:00
Linus Torvalds
e8fe23ffc9 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes regressions in the crypto ccp driver and the hwrng drivers
  for amd and geode"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: geode - Revert managed API changes
  hwrng: amd - Revert managed API changes
  crypto: ccp - Assign DMA commands to the channel's CCP
2017-03-24 14:11:36 -07:00
Linus Torvalds
213e4eb2da IOMMU Fixes for Linux v4.11-rc3
A few fixes piled up:
 
 	* Fix a NULL-ptr dereference that happens in VT-d on some
 	  platforms
 
 	* A fix for ARM MSI region reporting, so that a sane interface
 	  makes it to a released kernel
 
 	* Fixes for leaf-checking in ARM io-page-table code
 
 	* Two fixes for IO/TLB flushing code on ARM Exynos platforms
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJY1QLfAAoJECvwRC2XARrjiSUQAMS6l5kg+Fo6c8uq5oGgJ27C
 ddxjbI9NkvRAfnGzrHAVm80A7J9C15ik0/dp9hHVo9BjURWfhslxKXRp+9aBPjaY
 TzH7UD1avU9kwXvg8E+RYBaZrHY5on51Wz5IYcuFC2JHE3LYXGJgeNYupbXll8KI
 GM/sFhUfPcBODlwQwl+331fDqvSVL3/QaqojLsugGK2426dwYmovmNVrcFxT/01w
 fUrzG3DcHKSMJr1e2743A6Are3+Z+E7b435vclStl0ebUf+TKKAsQlluA99jyJ/y
 b0ZsddVK1OhJyoJafBuAbkQkcp5IuBV3R40MZORRv+IOqCOTe/8BkLnH+4TbRAui
 JmRmpREIfq4fxZbYepVis88dpbQriVu6ONI1FnGEEqdjDquhfnsC66cSAmgoRYEH
 Q913jMSvU4ehH0yRhBZdSi/giA5T48idEmtt2WjMOBx+Zq1ZLZkd02dq6rZH5j4x
 maIvyclg5N7nAG/KO8XJdqhET8K/0M2Fdp3H6YoBLMNyF2u2D52XFahOvIwV3Rf7
 4bW3fiy2Jv3sxlgeTy9yo8mtFwIGzCqjpB/PSvSRfrrE7G2EGNPd7sauTfALBU6w
 8z7HIj4/5DncFGbYIeEK3crrr5oqN9WtlJJGlm2qPPaKZMSJ78dKmtc2TKAxdFhU
 cqfL0qAZr3rRCYtZbLiq
 =EcLT
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v4.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "A few fixes piled up:

   - fix a NULL-ptr dereference that happens in VT-d on some platforms

   - a fix for ARM MSI region reporting, so that a sane interface makes
     it to a released kernel

   - fixes for leaf-checking in ARM io-page-table code

   - two fixes for IO/TLB flushing code on ARM Exynos platforms"

* tag 'iommu-fixes-v4.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Disambiguate MSI region types
  iommu/exynos: Workaround FLPD cache flush issues for SYSMMU v5
  iommu/exynos: Block SYSMMU while invalidating FLPD cache
  iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
  iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it
  iommu/io-pgtable-arm: Check for leaf entry before dereferencing it
2017-03-24 13:42:17 -07:00
Greg Kroah-Hartman
5617c05d44 usb: fixes for v4.11-rc4
f_acm got an endianness fix by Oliver Neukum. This has been around for a
 long time but it's finally fixed.
 
 f_hid learned that it should never access hidg->req without first
 grabbing the spinlock.
 
 Roger Quadros fixed two bugs in the f_uvc function driver.
 
 Janusz Dziedzic fixed a very peculiar bug with EP0, one that's rather
 difficult to trigger. When we're dealing with bounced EP0 requests, we
 should delay unmap until after ->complete() is called.
 
 UDC class got a use-after-free fix.
 -----BEGIN PGP SIGNATURE-----
 
 iQJRBAABCAA7FiEElLzh7wn96CXwjh2IzL64meEamQYFAljTv0EdHGZlbGlwZS5i
 YWxiaUBsaW51eC5pbnRlbC5jb20ACgkQzL64meEamQbhehAAvauDyMvvL4S2PpCS
 zkarfhJfF4xOB8jzyewdz8oN57XogoUQIY6KOXFIIj9sWBt5Kx2zfA7uK3FBeflm
 lEaf2Mkbo3qXw9ChkrpUDEzpezYhnLPOpm7rMoXxseiVhew1Jt6AUjyzxMkOSG+I
 ng/bMRFvJY1Crp7V0kIba0PiR/owlVxZNTad5/C5Fi1wJQtpjd6Ry0cH7Pa/Eh1W
 KI5Mrdwgh0gZJUF8u8O1MZehyUXSTzkjvHeNV7lnL4TE19hoSGngQhvTYhnoU5lT
 XdphOntQ3m4p8rvNuEvqcwuS5BT2HQoW+BwyNdJF+FUtbbrNN2gUjbY5KD7ZuQtT
 cb67cLrQD04t0ig5zFo51SEZQiYei3rBfr8y0RXepY0RAqCzpqsSoXHxZJyfp1xb
 XzNnnYinbOe2bU3b/Ovs3mMQ4kfpygTHVDMT9iYsRJZmD7DGBhG9J6AR5jD+44Js
 etY82xW7pXiDazMm9OnQ4kuMii8nse3QDynhURU7H39Jw8ty4AdIgbngHgCjRvZA
 ulAeanjGtNNsDs0bsL9L3Q2gLUGiW/y5Ds+AQXKH6388FrwwoUYqCeqBIH5xI+wF
 LIErvqZyi3+Jd/SkmvKeubix3vTxgPZ72WY4xdO45vWznKWscyUWdj0OzXpdycLx
 gOK1mf6y9vRsyoyJ9l09JZ6CAsc=
 =BWuW
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus

Felipe writes:

usb: fixes for v4.11-rc4

f_acm got an endianness fix by Oliver Neukum. This has been around for a
long time but it's finally fixed.

f_hid learned that it should never access hidg->req without first
grabbing the spinlock.

Roger Quadros fixed two bugs in the f_uvc function driver.

Janusz Dziedzic fixed a very peculiar bug with EP0, one that's rather
difficult to trigger. When we're dealing with bounced EP0 requests, we
should delay unmap until after ->complete() is called.

UDC class got a use-after-free fix.
2017-03-23 22:05:10 +01:00
Linus Torvalds
f341d9f08a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Several netfilter fixes from Pablo and the crew:
      - Handle fragmented packets properly in netfilter conntrack, from
        Florian Westphal.
      - Fix SCTP ICMP packet handling, from Ying Xue.
      - Fix big-endian bug in nftables, from Liping Zhang.
      - Fix alignment of fake conntrack entry, from Steven Rostedt.

 2) Fix feature flags setting in fjes driver, from Taku Izumi.

 3) Openvswitch ipv6 tunnel source address not set properly, from Or
    Gerlitz.

 4) Fix jumbo MTU handling in amd-xgbe driver, from Thomas Lendacky.

 5) sk->sk_frag.page not released properly in some cases, from Eric
    Dumazet.

 6) Fix RTNL deadlocks in nl80211, from Johannes Berg.

 7) Fix erroneous RTNL lockdep splat in crypto, from Herbert Xu.

 8) Cure improper inflight handling during AF_UNIX GC, from Andrey
    Ulanov.

 9) sch_dsmark doesn't write to packet headers properly, from Eric
    Dumazet.

10) Fix SCM_TIMESTAMPING_OPT_STATS handling in TCP, from Soheil Hassas
    Yeganeh.

11) Add some IDs for Motorola qmi_wwan chips, from Tony Lindgren.

12) Fix nametbl deadlock in tipc, from Ying Xue.

13) GRO and LRO packets not counted correctly in mlx5 driver, from Gal
    Pressman.

14) Fix reset of internal PHYs in bcmgenet, from Doug Berger.

15) Fix hashmap allocation handling, from Alexei Starovoitov.

16) nl_fib_input() needs stronger netlink message length checking, from
    Eric Dumazet.

17) Fix double-free of sk->sk_filter during sock clone, from Daniel
    Borkmann.

18) Fix RX checksum offloading in aquantia driver, from Pavel Belous.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
  net:ethernet:aquantia: Fix for RX checksum offload.
  amd-xgbe: Fix the ECC-related bit position definitions
  sfc: cleanup a condition in efx_udp_tunnel_del()
  Bluetooth: btqcomsmd: fix compile-test dependency
  inet: frag: release spinlock before calling icmp_send()
  tcp: initialize icsk_ack.lrcvtime at session start time
  genetlink: fix counting regression on ctrl_dumpfamily()
  socket, bpf: fix sk_filter use after free in sk_clone_lock
  ipv4: provide stronger user input validation in nl_fib_input()
  bpf: fix hashmap extra_elems logic
  enic: update enic maintainers
  net: bcmgenet: remove bcmgenet_internal_phy_setup()
  ipv6: make sure to initialize sockc.tsflags before first use
  fjes: Do not load fjes driver if extended socket device is not power on.
  fjes: Do not load fjes driver if system does not have extended socket device.
  net/mlx5e: Count LRO packets correctly
  net/mlx5e: Count GSO packets correctly
  net/mlx5: Increase number of max QPs in default profile
  net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps
  net/mlx5e: Use the proper UAPI values when offloading TC vlan actions
  ...
2017-03-23 11:29:49 -07:00
David Hildenbrand
90db10434b KVM: kvm_io_bus_unregister_dev() should never fail
No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.

There is nothing the callers could do, except retrying over and over
again.

So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).

Fixes: e93f8a0f82 ("KVM: convert io_bus to SRCU")
Cc: stable@vger.kernel.org # 3.4+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23 19:02:25 +01:00
Dan Carpenter
7a88fa1919 block: make nr_iovecs unsigned in bio_alloc_bioset()
There isn't a bug here, but Smatch is not smart enough to know that
"nr_iovecs" can't be negative so it complains about underflows.
Really, it's slightly cleaner to make this parameter unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-23 08:16:11 -06:00
Christoph Hellwig
7642747d67 blk-mq: remove BLK_MQ_F_DEFER_ISSUE
This flag was never used since it was introduced.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22 20:16:56 -06:00