Commit graph

74074 commits

Author SHA1 Message Date
Konstantin Khlebnikov
6abc49468e dm: add support for REQ_NOWAIT and enable it for linear target
Add DM target feature flag DM_TARGET_NOWAIT which advertises that
target works with REQ_NOWAIT bios.

Add dm_table_supports_nowait() and update dm_table_set_restrictions()
to set/clear QUEUE_FLAG_NOWAIT accordingly.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 08:20:03 -06:00
Mike Snitzer
021a24460d block: add QUEUE_FLAG_NOWAIT
Add QUEUE_FLAG_NOWAIT to allow a block device to advertise support for
REQ_NOWAIT. Bio-based devices may set QUEUE_FLAG_NOWAIT where
applicable.

Update QUEUE_FLAG_MQ_DEFAULT to include QUEUE_FLAG_NOWAIT.  Also
update submit_bio_checks() to verify it is set for REQ_NOWAIT bios.

Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 08:20:03 -06:00
Christoph Hellwig
fa01b1e973 block: add a bdev_is_partition helper
Add a littler helper to make the somewhat arcane bd_contains checks a
little more obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 08:18:57 -06:00
Christoph Hellwig
f56753ac2a bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag
Replace the two negative flags that are always used together with a
single positive flag that indicates the writeback capability instead
of two related non-capabilities.  Also remove the pointless wrappers
to just check the flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
823423ef55 bdi: invert BDI_CAP_NO_ACCT_WB
Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to
make the checks more obvious.  Also remove the pointless
bdi_cap_account_writeback wrapper that just obsfucates the check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
1cb039f3dc bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
a8b456d01c bdi: remove BDI_CAP_SYNCHRONOUS_IO
BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to
decided if ->rw_page can be used on a block device.  Just check up for
the method instead.  The only complication is that zram needs a second
set of block_device_operations as it can switch between modes that
actually support ->rw_page and those who don't.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
ed7b6b4f6e bdi: remove BDI_CAP_CGROUP_WRITEBACK
Just checking SB_I_CGROUPWB for cgroup writeback support is enough.
Either the file system allocates its own bdi (e.g. btrfs), in which case
it is known to support cgroup writeback, or the bdi comes from the block
layer, which always supports cgroup writeback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
c2e4cd57cf block: lift setting the readahead size into the block layer
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.  To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
402dd2cf46 fs: remove the unused SB_I_MULTIROOT flag
The last user of SB_I_MULTIROOT is disappeared with commit f2aedb713c
("NFS: Add fs_context support.")

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:38 -06:00
Christoph Hellwig
1fb1a2ad75 block: mark blkdev_get static
There are no users outside the core block code left now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:43:19 -06:00
Christoph Hellwig
21bd900572 mm: split swap_type_of
swap_type_of is used for two entirely different purposes:

 (1) check what swap type a given device/offset corresponds to
 (2) find the first available swap device that can be written to

Mixing both in a single function creates an unreadable mess.  Create two
separate functions instead, and switch both to pass a dev_t instead of
a struct block_device to further simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:43:19 -06:00
Christoph Hellwig
bb3247a399 PM: rewrite is_hibernate_resume_dev to not require an inode
Just check the dev_t to help simplifying the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:43:19 -06:00
Christoph Hellwig
38430f0876 block: move the NEED_PART_SCAN flag to struct gendisk
We can only scan for partitions on the whole disk, so move the flag
from struct block_device to struct gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:43:18 -06:00
Mike Snitzer
07d098e6bb block: allow 'chunk_sectors' to be non-power-of-2
It is possible, albeit more unlikely, for a block device to have a non
power-of-2 for chunk_sectors (e.g. 10+2 RAID6 with 128K chunk_sectors,
which results in a full-stripe size of 1280K. This causes the RAID6's
io_opt to be advertised as 1280K, and a stacked device _could_ then be
made to use a blocksize, aka chunk_sectors, that matches non power-of-2
io_opt of underlying RAID6 -- resulting in stacked device's
chunk_sectors being a non power-of-2).

Update blk_queue_chunk_sectors() and blk_max_size_offset() to
accommodate drivers that need a non power-of-2 chunk_sectors.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23 10:38:14 -06:00
Song Liu
7b26410b05 block: introduce part_[begin|end]_io_acct
These functions can be used to enable iostat for partitions on devices
like md, bcache.

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-11 16:41:30 -06:00
Christoph Hellwig
b92b53079a block: remove check_disk_change
Remove the now unused check_disk_change helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-10 09:32:31 -06:00
Christoph Hellwig
fec2cf607b ide-gd: stop using the disk events mechanism
ide-gd is only using the disk events mechanism to be able to force an
invalidation and partition scan on opening removable media.  Just open
code the logic without invoving the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-10 09:32:31 -06:00
Christoph Hellwig
95f6f3a46f block: add a bdev_check_media_change helper
Like check_disk_changed, except that it does not call ->revalidate_disk
but leaves that to the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-10 09:32:30 -06:00
Jan Kara
384d87ef2c block: Do not discard buffers under a mounted filesystem
Discarding blocks and buffers under a mounted filesystem is hardly
anything admin wants to do. Usually it will confuse the filesystem and
sometimes the loss of buffer_head state (including b_private field) can
even cause crashes like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G O     --------- -  - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
...
Call Trace:
 __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
 jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
 kjournald2+0xbd/0x270 [jbd2]

So if we don't have block device open with O_EXCL already, claim the
block device while we truncate buffer cache. This makes sure any
exclusive block device user (such as filesystem) cannot operate on the
device while we are discarding buffer cache.

Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[axboe: fix !CONFIG_BLOCK error in truncate_bdev_range()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-07 20:10:55 -06:00
Kashyap Desai
b445547ec1 blk-mq, elevator: Count requests per hctx to improve performance
High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
contention is possible for mq-deadline and bfq IO schedulers
when nr_hw_queues is more than one.

It is because kblockd work queue can submit IO from all online CPUs
(through blk_mq_run_hw_queues()) even though only one hctx has pending
commands.

The elevator callback .has_work for mq-deadline and bfq scheduler considers
pending work if there are any IOs on request queue but it does not account
hctx context.

Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering
the elevator even though there are no requests queued.

[jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap]

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-03 15:20:47 -06:00
John Garry
f1b49fdc1c blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap
For when using a shared sbitmap, no longer should the number of active
request queues per hctx be relied on for when judging how to share the tag
bitmap.

Instead maintain the number of active request queues per tag_set, and make
the judgement based on that.

Originally-from: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-03 15:20:47 -06:00
John Garry
bccf5e26d9 blk-mq: Record nr_active_requests per queue for when using shared sbitmap
The per-hctx nr_active value can no longer be used to fairly assign a share
of tag depth per request queue for when using a shared sbitmap, as it does
not consider that the tags are shared tags over all hctx's.

For this case, record the nr_active_requests per request_queue, and make
the judgement based on that value.

Co-developed-with: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-03 15:20:47 -06:00
John Garry
32bc15afed blk-mq: Facilitate a shared sbitmap per tagset
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec060
("blk-mq: drain I/O when all CPUs in a hctx are offline").

However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
queues need to be exposed to the upper layer.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e092 ("scsi:
 core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.

This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-03 15:20:47 -06:00
Ming Lei
51db1c37ee blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.

So rename it to make the point explicitly.

[jpg: rebase a few times, add rnbd-clt.c change]

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-03 15:20:46 -06:00
Christoph Hellwig
de09077c89 block: remove revalidate_disk()
Remove the now unused helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02 08:00:26 -06:00
Christoph Hellwig
b8086d3f5a block: use revalidate_disk_size in set_capacity_revalidate_and_notify
Only virtio_blk and xen-blkfront set the revalidate argument to true,
and both do not implement the ->revalidate_disk method.  So switch
to the helper that just updates the size instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02 08:00:07 -06:00
Christoph Hellwig
659e56ba86 block: add a new revalidate_disk_size helper
revalidate_disk is a relative awkward helper for driver use, as it first
calls an optional driver method and then updates the block device size,
while most callers either don't need the method call at all, or want to
keep state between the caller and the called method.

Add a revalidate_disk_size helper that just performs the update of the
block device size from the gendisk one, and switch all drivers that do
not implement ->revalidate_disk to use the new helper instead of
revalidate_disk()

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02 08:00:07 -06:00
Christoph Hellwig
f4ad06f2bb block: rename bd_invalidated
Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
variable to better describe the condition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02 08:00:02 -06:00
Christoph Hellwig
46d40cfad1 block: remove an outdated comment on the bd_dev field
kdev_t is long gone, so we don't need to comment a field isn't one..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:26 -06:00
Christoph Hellwig
7cf34d97ab block: remove the discard_alignment field from struct hd_struct
The alignment offset is only used in slow path callers, so just calculate
it on the fly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:26 -06:00
Christoph Hellwig
7b8917f5e2 block: remove the alignment_offset field from struct hd_struct
The alignment offset is only used in slow path callers, so just calculate
it on the fly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:26 -06:00
Baolin Wang
bdc6a287bc block: Move blk_mq_bio_list_merge() into blk-merge.c
Move the blk_mq_bio_list_merge() into blk-merge.c and
rename it as a generic name.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:26 -06:00
Christoph Hellwig
3310eebafe block: remove the BIO_USER_MAPPED flag
Just check if there is private data, in which case the bio must have
originated from bio_copy_user_iov.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:26 -06:00
Christoph Hellwig
f3256075ba block: remove the BIO_NULL_MAPPED flag
We can simply use a boolean flag in the bio_map_data data structure
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:25 -06:00
Christoph Hellwig
c2b4bb8cb3 block: fix locking for struct block_device size updates
Two different callers use two different mutexes for updating the
block device size, which obviously doesn't help to actually protect
against concurrent updates from the different callers.  In addition
one of the locks, bd_mutex is rather prone to deadlocks with other
parts of the block stack that use it for high level synchronization.

Switch to using a new spinlock protecting just the size updates, as
that is all we need, and make sure everyone does the update through
the proper helper.

This fixes a bug reported with the nvme revalidating disks during a
hot removal operation, which can currently deadlock on bd_mutex.

Reported-by: Xianting Tian <xianting_tian@126.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:25 -06:00
Christoph Hellwig
611bee526b block: replace bd_set_size with bd_set_nr_sectors
Replace bd_set_size with a version that takes the number of sectors
instead, as that fits most of the current and future callers much better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:25 -06:00
Geert Uytterhoeven
db04e18dbb block: Make request_queue.rpm_status an enum
request_queue.rpm_status is assigned values of the rpm_status enum only,
so reflect that in its type.

Note that including <linux/pm.h> is (currently) a no-op, as it is
already included through <linux/genhd.h> and <linux/device.h>, but it is
better to play it safe.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01 16:49:25 -06:00
Linus Torvalds
d2283cdc18 A set of fixes for interrupt chip drivers:
- Revert the platform driver conversion of interrupt chip drivers as it
    turned out to create more problems than it solves.
 
  - Fix a trivial typo in the new module helpers which made probing reliably
    fail.
 
  - Small fixes in the STM32 and MIPS Ingenic drivers
 
  - The TI firmware rework which had badly managed dependencies and had to
    wait post rc1.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L4wITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobWiEACmMdJl1ZHC+IcKzAVngX82rJUQ/4dC
 zGBRnddkPYjobXpK+w+W28vdPHYcGZ4de89X1N0l6zocSRQh2NdmJ080ivH8LD/V
 AaT82Sf8mjuulUPolyfiF09+e4OeBUC94CsyM2LkGmicoMLeWyWv4EEh95anh3s9
 GsaZ6cW6Cvq1whK2SSCoa5V1pkCGLIMxw2gV4DDw/s2RuqwrIGeleq9bcMNA8RVS
 eBF91N6nUpXI9PMS68p72ZcRbpK0cKTARt99lS4cDEOS11b0jsRxZ525Xem736V/
 5zgXvYOdErZaBRO3WeSaxlYTwwRiLKTRNOL4ZFgI1xQEK4Pcv9S+80zaXSvDXSsn
 s6LzXYifgPCE6edIXKAMwcNcbRXcGdn4PZpSTwvtNJpbVskRQgPmtEVDUD+J5upU
 7NN/myosEkA5JUQSWdaaaa4nLe4H8qmxnQrn+B41NzzikekZuN+ai9OAOTTdsv1j
 W+hNhX78qgVbYtm4/8mOQp4BAIOIGPDTgw3WfXYW3FbecEM6EQBEQgT9E1RCivTX
 cETaV34jJY7XGm3iVQ97jLsKBhK0lcIfJwltuY1Ck+4o9fqZkJ0iZ045EArJMms6
 sG+tp6+0MWPkLf2P31qdLKTMt/9HZCHao7F1K6Fz218k4mwGHa12V/OUgPkdr8vJ
 hb57EF+zXOaFag==
 =guRc
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for interrupt chip drivers:

   - Revert the platform driver conversion of interrupt chip drivers as
     it turned out to create more problems than it solves.

   - Fix a trivial typo in the new module helpers which made probing
     reliably fail.

   - Small fixes in the STM32 and MIPS Ingenic drivers

   - The TI firmware rework which had badly managed dependencies and had
     to wait post rc1"

* tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/ingenic: Leave parent IRQ unmasked on suspend
  irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake
  irqchip: Revert modular support for drivers using IRQCHIP_PLATFORM_DRIVER helperse
  irqchip: Fix probing deferal when using IRQCHIP_PLATFORM_DRIVER helpers
  arm64: dts: k3-am65: Update the RM resource types
  arm64: dts: k3-am65: ti-sci-inta/intr: Update to latest bindings
  arm64: dts: k3-j721e: ti-sci-inta/intr: Update to latest bindings
  irqchip/ti-sci-inta: Add support for INTA directly connecting to GIC
  irqchip/ti-sci-inta: Do not store TISCI device id in platform device id field
  dt-bindings: irqchip: Convert ti, sci-inta bindings to yaml
  dt-bindings: irqchip: ti, sci-inta: Update docs to support different parent.
  irqchip/ti-sci-intr: Add support for INTR being a parent to INTR
  dt-bindings: irqchip: Convert ti, sci-intr bindings to yaml
  dt-bindings: irqchip: ti, sci-intr: Update bindings to drop the usage of gic as parent
  firmware: ti_sci: Add support for getting resource with subtype
  firmware: ti_sci: Drop unused structure ti_sci_rm_type_map
  firmware: ti_sci: Drop the device id to resource type translation
2020-08-30 11:56:54 -07:00
Linus Torvalds
0063a82de9 A single fix for the scheduler:
- Make is_idle_task() __always_inline to prevent the compiler from putting
    it out of line into the wrong section because it's used inside noinstr
    sections.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L5woTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYoDD/4hjGLSH3A1YwxnE/tUmLzIZhQKWveD
 s25DHX2dIn+ZBSF96qWHVvvZoPNZo1zS3FZWoM6wEHT7qPkUm/61Zlme/1ZUlwP9
 jVkBXiUHhOlemH9VM6udAOYG6whVguCkFixUhI1rLN8cd3Ko7v3jSax+6nV2g762
 PrEnVM9Ps0YGNnoLBWl9G0zsfaLHfRJpWUHY6kqYTmmjN1NWJEUnmO9IxIytTG/A
 3wOokwJJSfMMSB8WG+VAAHMtYmLirUrpoXFfhschsEiw+SBqNI8I/KieGG8P+i4W
 P1am4sJe0vPapEvwyNKHcR8iQSUEHQ5ZmUiBvCWu+2HDYn6KLSioKaPVh45/+UPh
 EeSXutPFXIvX9WwbNnQctwl/xX7BSpGb0tBiegiqlwk2i8QFEWvKr8HrjznLS+kW
 7cCaQdM5a46eAoo/4JP4izm4JjqFD2LuNt32f9lizRkFim0zrMswmA01Fog5A3j6
 FSQmNhqOPSLpC24/rWhW1sk2dnOefrBxcE67yFmsbEtf+BmoFGVYkulI36P/trmY
 nTua0v2Okmjj0zIh23VUOzhBGDzD+KQJFEj/8nyFJShlBbv6yxRQiqtzWUmsuhfu
 o+B/WEAllBnSOjohbp4cBmTM9VOkNAgBfRJs26ELuBucEgMdzPByVNygKzdUcW/j
 6vH1bbgu+FXA1Q==
 =oKWA
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
 "A single fix for the scheduler:

   - Make is_idle_task() __always_inline to prevent the compiler from
     putting it out of line into the wrong section because it's used
     inside noinstr sections"

* tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Use __always_inline on is_idle_task()
2020-08-30 11:50:53 -07:00
Linus Torvalds
b69bea8a65 A set of fixes for lockdep, tracing and RCU:
- Prevent recursion by using raw_cpu_* operations
 
   - Fixup the interrupt state in the cpu idle code to be consistent
 
   - Push rcu_idle_enter/exit() invocations deeper into the idle path so
     that the lock operations are inside the RCU watching sections
 
   - Move trace_cpu_idle() into generic code so it's called before RCU goes
     idle.
 
   - Handle raw_local_irq* vs. local_irq* operations correctly
 
   - Move the tracepoints out from under the lockdep recursion handling
     which turned out to be fragile and inconsistent.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L5qETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoV/NEADG+h02tj2I4gP7IQ3nVodEzS1+odPI
 orabY5ggH0kn4YIhPB4UtOd5zKZjr3FJs9wEhyhQpV6ZhvFfgaIKiYqfg+Q81aMO
 /BXrfh6jBD2Hu7gaPBnVdkKeh1ehl+w0PhTeJhPBHEEvbGeLUYWwyPNlaKz//VQl
 XCWl7e7o/Uw2UyJ469SCx3z+M2DMNqwdMys/zcqvTLiBdLNCwp4TW5ACzEA0rfHh
 Pepu3eIKnMURyt82QanrOATvT2io9pOOaUh59zeKi2WM8ikwKd/Eho2kXYng6GvM
 GzX4Kn13MsNobZXf9BhqEGICdRkaJqLsXlmBNmbJdSTCn5W2lLZqu2wCEp5VZHCc
 XwMbey8ek+BRskJMqAV4oq2GA8Om9KEYWOOdixyOG0UJCiW5qDowuDYBXTLV7FWj
 XhzLGuHpUF9eKLKokJ7ideLaDcpzwYjHr58pFLQrqPwmjVKWguLeYMg5BhhTiEuV
 wNfiLIGdMNsCpYKhnce3o9paV8+hy1ZveWhNy+/4HaDLoEwI2T62i8R7xxbrcWMg
 sgdAiQG+kVLwSJ13bN+Cz79uLYTIbqGaZHtOXmeIT3jSxBjx5RlXfzocwTHSYrNk
 GuLYHd7+QaemN49Rrf4bPR16Db7ifL32QkUtLBTBLcnos9jM+fcl+BWyqYRxhgDv
 xzDS+vfK8DvRiA==
 =Hgt6
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "A set of fixes for lockdep, tracing and RCU:

   - Prevent recursion by using raw_cpu_* operations

   - Fixup the interrupt state in the cpu idle code to be consistent

   - Push rcu_idle_enter/exit() invocations deeper into the idle path so
     that the lock operations are inside the RCU watching sections

   - Move trace_cpu_idle() into generic code so it's called before RCU
     goes idle.

   - Handle raw_local_irq* vs. local_irq* operations correctly

   - Move the tracepoints out from under the lockdep recursion handling
     which turned out to be fragile and inconsistent"

* tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep,trace: Expose tracepoints
  lockdep: Only trace IRQ edges
  mips: Implement arch_irqs_disabled()
  arm64: Implement arch_irqs_disabled()
  nds32: Implement arch_irqs_disabled()
  locking/lockdep: Cleanup
  x86/entry: Remove unused THUNKs
  cpuidle: Move trace_cpu_idle() into generic code
  cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
  sched,idle,rcu: Push rcu_idle deeper into the idle path
  cpuidle: Fixup IRQ state
  lockdep: Use raw_cpu_*() for per-cpu variables
2020-08-30 11:43:50 -07:00
Linus Torvalds
8bb5021cc2 powerpc fixes for 5.9 #4
Revert our removal of PROT_SAO, at least one user expressed an interest in using
 it on Power9. Instead don't allow it to be used in guests unless enabled
 explicitly at compile time.
 
 A fix for a crash introduced by a recent change to FP handling.
 
 Revert a change to our idle code that left Power10 with no idle support.
 
 One minor fix for the new scv system call path to set PPR.
 
 Fix a crash in our "generic" PMU if branch stack events were enabled.
 
 A fix for the IMC PMU, to correctly identify host kernel samples.
 
 The ADB_PMU powermac code was found to be incompatible with VMAP_STACK, so make
 them incompatible in Kconfig until the code can be fixed.
 
 A build fix in drivers/video/fbdev/controlfb.c, and a documentation fix.
 
 Thanks to:
   Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy, Giuseppe Sacco,
   Madhavan Srinivasan, Milton Miller, Nicholas Piggin, Pratik Rajesh Sampat,
   Randy Dunlap, Shawn Anastasio, Vaidyanathan Srinivasan.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl9LlF8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEwJD/4nEkp9id7bZyiGruoawqxdpmc9viIp
 JFRH3+eHWbE5rfoXn7fwM1zTE9SsHxCd0q09cHk2rtAwKMXcJW83/pXNuWEjIzcy
 7Ra8Zq2jRl6qgWAx84VKoZVg+W40yNFex0M0akMQV55SjYOTN8gpGe+algi+wPaH
 44oYBYctDi3B9X8CsaUQEdov1EZdWT6TxcN9xIJiIdr53VXMER6C+ytYV8VgkGHW
 Qt+Ardyvp6eNq9+foGegRSk3OmNcmj+CJZYzhkp5+1k9ko9GQ8wg9NzxTV4ZoSJ9
 g5rgD4ztBfLGyUDu6oUypzOnSVbfzJh9JPH/h1zaSOjSv9MnJ20zqvqjD7QXFNbs
 j960PiylTfVWdnOoUUkvON0UOYZM9XiZP63i8z/mBsMJ5BFaLB1TonZ+lDwXc1vK
 MHXhjahP2qP0LnJZ/M5gT3zfLPyrKoeIlmLTOkLjrM5C9mcSxpPnagq+AHacfYpG
 sGrg2LGLfBo/9PomUNHseQhBfsc2uYwM924si9MpNWN6BT+TNgTJYeNPDOnvRCbG
 ivDQ7HFZ6aiOj+b5iTZI2RV3EOaBKZgo+VEryNDnqd7etjyDr5PNbooGaHJDgsnz
 mNFxUNusxzv0vMI3zyFtLMTe/99/NlRSYyMXPL8SL7MvlRt624ngrrxYv+2+dBRt
 aIpxSpgdqTVXSw==
 =t+yB
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Revert our removal of PROT_SAO, at least one user expressed an
   interest in using it on Power9. Instead don't allow it to be used in
   guests unless enabled explicitly at compile time.

 - A fix for a crash introduced by a recent change to FP handling.

 - Revert a change to our idle code that left Power10 with no idle
   support.

 - One minor fix for the new scv system call path to set PPR.

 - Fix a crash in our "generic" PMU if branch stack events were enabled.

 - A fix for the IMC PMU, to correctly identify host kernel samples.

 - The ADB_PMU powermac code was found to be incompatible with
   VMAP_STACK, so make them incompatible in Kconfig until the code can
   be fixed.

 - A build fix in drivers/video/fbdev/controlfb.c, and a documentation
   fix.

Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy,
Giuseppe Sacco, Madhavan Srinivasan, Milton Miller, Nicholas Piggin,
Pratik Rajesh Sampat, Randy Dunlap, Shawn Anastasio, Vaidyanathan
Srinivasan.

* tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU
  Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"
  powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc
  powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
  powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode
  powerpc/64s: scv entry should set PPR
  Documentation/powerpc: fix malformed table in syscall64-abi
  video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n
  selftests/powerpc: Update PROT_SAO test to skip ISA 3.1
  powerpc/64s: Disallow PROT_SAO in LPARs by default
  Revert "powerpc/64s: Remove PROT_SAO support"
2020-08-30 10:56:12 -07:00
Linus Torvalds
326e311b84 Power management fixes for 5.9-rc3
- Make the recently added Tegra194 cpufreq driver use
    read_cpuid_mpir() instead of cpu_logical_map() to avoid
    exporting logical_cpu_map (Sumit Gupta).
 
  - Drop the automatic system wakeup event reporting for devices
    with pending runtime-resume requests during system-wide suspend
    to avoid spurious aborts of the suspend flow (Rafael Wysocki).
 
  - Fix build warning in the intel_pstate driver documentation and
    improve the wording in there (Randy Dunlap).
 
  - Clean up two pieces of code in the cpufreq core (Viresh Kumar).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl9JXNASHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxCmkP/i1nsYDJirFak9t7oVqDaJ6BIgMSub5k
 n0h977NSCBoU+DC1jMXAEYoWqfKLl46cCbzcYyLjkAXDWocQB3DGhBDdJp8GA+DG
 dNea2cL6J7EWokzEmVgwE5iFbvjW4YSKr4F3+wH9Y7TGTJdfLyYBbueIodpEGqYz
 EtC18CdjCX6xiBdh9VGNTiFH0lGVUKiTmfLh8dDAfCvVnB23gjziCCspswmfa0i1
 3J8u+obVDcvHEZikQy42qWsq+X6k6fI2CKhJ+1kIecUPbH8A8K45hZYqhDyHi6QJ
 eEu+3YHfb79S2MG6fwky6tJ+11qbd2W4jnVJ4GK4FJP+xfkBqTaAW4J9rVeFOK1H
 FDAUbzJTbvG7qk/N95Iqt4soXHAKmVeYamw9K+1EDC87vsajy+XK95qZgQbFOPLy
 3AuSfR/QDnK/YboJXBQnKMwSxQ3RdQjynliaMjFCcVs/W6Bajts2an1R8d6h2zZ8
 nwPEhTmwZzigXrPf4ya4ucrQjl1jBCudpP50oyH+qxYLh1BrebMIFBurcDYt+gT+
 VUsN/VXcGzMgUXS2DQIb+5Ij0mBkeMwSDXUM3U86oSnrdSVoNr4P3NJFvNAe087p
 S2Rl5EPYyexSqbLRNN3ugwmIOUgVzsXlN+UzO8ELy4Kan4xj+FpR4d4xaZD0R84e
 RW+rROEjaqbE
 =Q4FD
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix the recently added Tegra194 cpufreq driver and the handling
  of devices using runtime PM during system-wide suspend, improve the
  intel_pstate driver documentation and clean up the cpufreq core.

  Specifics:

   - Make the recently added Tegra194 cpufreq driver use
     read_cpuid_mpir() instead of cpu_logical_map() to avoid exporting
     logical_cpu_map (Sumit Gupta).

   - Drop the automatic system wakeup event reporting for devices with
     pending runtime-resume requests during system-wide suspend to avoid
     spurious aborts of the suspend flow (Rafael Wysocki).

   - Fix build warning in the intel_pstate driver documentation and
     improve the wording in there (Randy Dunlap).

   - Clean up two pieces of code in the cpufreq core (Viresh Kumar)"

* tag 'pm-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Use WARN_ON_ONCE() for invalid relation
  cpufreq: No need to verify cpufreq_driver in show_scaling_cur_freq()
  PM: sleep: core: Fix the handling of pending runtime resume requests
  Documentation: fix pm/intel_pstate build warning and wording
  cpufreq: replace cpu_logical_map() with read_cpuid_mpir()
2020-08-28 13:12:09 -07:00
Rafael J. Wysocki
ef7d960403 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: Use WARN_ON_ONCE() for invalid relation
  cpufreq: No need to verify cpufreq_driver in show_scaling_cur_freq()
  Documentation: fix pm/intel_pstate build warning and wording
  cpufreq: replace cpu_logical_map() with read_cpuid_mpir()
2020-08-28 20:58:16 +02:00
Herbert Xu
ef91bb196b kernel.h: Silence sparse warning in lower_32_bits
I keep getting sparse warnings in crypto such as:

  CHECK   drivers/crypto/ccree/cc_hash.c
   drivers/crypto/ccree/cc_hash.c:49:9: warning: cast truncates bits from constant value (47b5481dbefa4fa4 becomes befa4fa4)
   drivers/crypto/ccree/cc_hash.c:49:26: warning: cast truncates bits from constant value (db0c2e0d64f98fa7 becomes 64f98fa7)
   [.. many more ..]

This patch removes the warning by adding a mask to keep sparse
happy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-28 11:21:20 -07:00
Linus Torvalds
e309428590 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl9JG9wACgkQnJ2qBz9k
 QNlp3ggA3B/Xopb2X3cCpf2fFw63YGJU4i0XJxi+3fC/v6m8U+D4XbqJUjaM5TZz
 +4XABQf7OHvSwDezc3n6KXXD/zbkZCeVm9aohEXvfMYLyKbs+S7QNQALHEtpfBUU
 3IY2pQ90K7JT9cD9pJls/Y/EaA1ObWP7+3F1zpw8OutGchKcE8SvVjzL3SSJaj7k
 d8OTtMosAFuTe4saFWfsf9CmZzbx4sZw3VAzXEXAArrxsmqFKIcY8dI8TQ0WaYNh
 C3wQFvW+n9wHapylyi7RhGl2QH9Tj8POfnCTahNFFJbsmJBx0Z3r42mCBAk4janG
 FW+uDdH5V780bTNNVUKz0v4C/YDiKg==
 =jQnW
 -----END PGP SIGNATURE-----

Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull writeback fixes from Jan Kara:
 "Fixes for writeback code occasionally skipping writeback of some
  inodes or livelocking sync(2)"

* tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  writeback: Drop I_DIRTY_TIME_EXPIRE
  writeback: Fix sync livelock due to b_dirty_time processing
  writeback: Avoid skipping inode writeback
  writeback: Protect inode->i_io_list with inode->i_lock
2020-08-28 10:57:14 -07:00
Linus Torvalds
b0bfd5eca9 We have an inode number handling change, prompted by s390x which is
a 64-bit architecture with a 32-bit ino_t, a patch to disallow leases
 to avoid potential data integrity issues when CephFS is re-exported
 via NFS or CIFS and a fix for the bulk of W=1 compilation warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl9I9UITHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi5HcB/0XKCF9pJmHM7vyMpZvALaNn97B7jgD
 o4Lq0RBnIyum0UltnxvrROO1LGyurFH0hWofRykEWXy8Qt/9fZTddClOmY+6W+7P
 A1MVInSxCXpNHj5y1uMuwhWkLmuNWnW1aKJXBn8tXKsmrYQM3SmbCMRx8a2GxJk9
 mxl9zAtTRsih0AovRbde93i4FMpeXsyDk9EcyiJFcgnDjJpCAXgN9smu9xjPskOI
 rrFQBjsrYB04FVuv5lEB/xZI/2QLM2FqzlpIRsa6udtYRsDnOtgIbLq8p0sNJnC6
 QFxDaWFUJAXN/b1UWZz8yw2Y7nF7r47Hg6fDM4PBQ09PQEbIQKY+qn3R
 =6+bm
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "We have an inode number handling change, prompted by s390x which is a
  64-bit architecture with a 32-bit ino_t, a patch to disallow leases to
  avoid potential data integrity issues when CephFS is re-exported via
  NFS or CIFS and a fix for the bulk of W=1 compilation warnings"

* tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-client:
  ceph: don't allow setlease on cephfs
  ceph: fix inode number handling on arches with 32-bit ino_t
  libceph: add __maybe_unused to DEFINE_CEPH_FEATURE
2020-08-28 10:33:04 -07:00
Viresh Kumar
30b8e6b22f cpufreq: Use WARN_ON_ONCE() for invalid relation
The relation can't be invalid here, so if it turns out to be invalid,
just WARN_ON_ONCE() and return 0.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-08-27 12:51:25 +02:00
Nicholas Piggin
044d0d6de9 lockdep: Only trace IRQ edges
Problem:

  raw_local_irq_save(); // software state on
  local_irq_save(); // software state off
  ...
  local_irq_restore(); // software state still off, because we don't enable IRQs
  raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

 - lock_acquire()
     raw_local_irq_save()
     __lock_acquire()
       arch_spin_lock(&graph_lock)
         pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
           local_irq_save()

 - trace_clock_global()
     raw_local_irq_save()
     arch_spin_lock()
       pv_wait() := kvm_wait()
	 local_irq_save()

 - apic_retrigger_irq()
     raw_local_irq_save()
     apic->send_IPI() := default_send_IPI_single_phys()
       local_irq_save()

Possible solutions:

 A) make it work by enabling the tracing inside raw_*()
 B) make it work by keeping tracing disabled inside raw_*()
 C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

	raw_local_irq_save()
	local_irq_save()
	lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because commit: 859d069ee1 ("lockdep: Prepare for NMI IRQ
state tracking") changed IRQ tracing vs lockdep recursion and the
first instance is fairly common, the other cases hardly ever happen.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[rewrote changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200723105615.1268126-1-npiggin@gmail.com
2020-08-26 12:41:56 +02:00
Peter Zijlstra
00b0ed2d49 locking/lockdep: Cleanup
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.546087214@infradead.org
2020-08-26 12:41:54 +02:00