Commit graph

996159 commits

Author SHA1 Message Date
Libin Yang
e32df14235
ASoC: Intel: adl: remove sof_fw_filename setting in ADL snd_soc_acpi_mach
ADL will use sof-adl-s.ri if it is ADL-S platform. So let's use
the default_fw_filename in pdata->desc for the ADL FW filename.

Signed-off-by: Libin Yang <libin.yang@intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20210125070500.807474-3-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-01-27 17:28:28 +00:00
Chaitanya Kulkarni
59c157433f nvme-core: check bdev value for NULL
The nvme-core sets the bdev to NULL when admin comamnd is issued from
IOCTL in the following path e.g. nvme list :-

block_ioctl()
 blkdev_ioctl()
  nvme_ioctl()
   nvme_user_cmd()
    nvme_submit_user_cmd()

The commit 309dca309f ("block: store a block_device pointer in struct bio")
now uses bdev unconditionally in the macro bio_set_dev() and assumes
that bdev value is not NULL which results in the following crash in
since thats where bdev is actually accessed :-

void bio_associate_blkg_from_css(struct bio *bio,
				 struct cgroup_subsys_state *css)
{
	if (bio->bi_blkg)
		blkg_put(bio->bi_blkg);

	if (css && css->parent) {
		bio->bi_blkg = blkg_tryget_closest(bio, css);
	} else {
-------------->	blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg);
		bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg;
	}
}
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);

[  345.385947] BUG: kernel NULL pointer dereference, address: 0000000000000690
[  345.387103] #PF: supervisor read access in kernel mode
[  345.387894] #PF: error_code(0x0000) - not-present page
[  345.388756] PGD 162a2b067 P4D 162a2b067 PUD 1633eb067 PMD 0
[  345.389625] Oops: 0000 [#1] SMP NOPTI
[  345.390206] CPU: 15 PID: 4100 Comm: nvme Tainted: G           OE     5.11.0-rc5blk+ #141
[  345.391377] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba52764
[  345.393074] RIP: 0010:bio_associate_blkg_from_css.cold.47+0x58/0x21f

[  345.396362] RSP: 0018:ffffc90000dbbce8 EFLAGS: 00010246
[  345.397078] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[  345.398114] RDX: 0000000000000000 RSI: ffff888813be91f0 RDI: ffff888813be91f8
[  345.399039] RBP: ffffc90000dbbd30 R08: 0000000000000001 R09: 0000000000000001
[  345.399950] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff888812d32800
[  345.401031] R13: 0000000000000000 R14: ffff888113e51540 R15: ffff888113e51540
[  345.401976] FS:  00007f3747f1d780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
[  345.402997] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  345.403737] CR2: 0000000000000690 CR3: 000000081a4bc000 CR4: 00000000003506e0
[  345.404685] Call Trace:
[  345.405031]  bio_associate_blkg+0x71/0x1c0
[  345.405649]  nvme_submit_user_cmd+0x1aa/0x38e [nvme_core]
[  345.406348]  nvme_user_cmd.isra.73.cold.98+0x54/0x92 [nvme_core]
[  345.407117]  nvme_ioctl+0x226/0x260 [nvme_core]
[  345.407707]  blkdev_ioctl+0x1c8/0x2b0
[  345.408183]  block_ioctl+0x3f/0x50
[  345.408627]  __x64_sys_ioctl+0x84/0xc0
[  345.409117]  do_syscall_64+0x33/0x40
[  345.409592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  345.410233] RIP: 0033:0x7f3747632107

[  345.413125] RSP: 002b:00007ffe461b6648 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[  345.414086] RAX: ffffffffffffffda RBX: 00000000007b7fd0 RCX: 00007f3747632107
[  345.414998] RDX: 00007ffe461b6650 RSI: 00000000c0484e41 RDI: 0000000000000004
[  345.415966] RBP: 0000000000000004 R08: 00000000007b7fe8 R09: 00000000007b9080
[  345.416883] R10: 00007ffe461b62c0 R11: 0000000000000206 R12: 00000000007b7fd0
[  345.417808] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000

Add a NULL check before we set the bdev for bio.

This issue is found on block/for-next tree.

Fixes: 309dca309f ("block: store a block_device pointer in struct bio")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 10:10:15 -07:00
Emil Renner Berthing
9159835a97 vt: keyboard, use new API for keyboard_tasklet
This converts the keyboard_tasklet to use the new API in
commit 12cc923f1c ("tasklet: Introduce new initialization API")

The new API changes the argument passed to the callback function, but
fortunately the argument isn't used so it is straight forward to use
DECLARE_TASKLET_DISABLED() rather than DECLARE_TASKLET_DISABLED_OLD().

Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Link: https://lore.kernel.org/r/20210127164222.13220-1-kernel@esmil.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 18:08:07 +01:00
Jens Axboe
3e3126cf2a mm: only make map_swap_entry available for CONFIG_HIBERNATION
Current tree spews this on compile:

mm/swapfile.c:2290:17: warning: ‘map_swap_entry’ defined but not used [-Wunused-function]
 2290 | static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev)
       |                 ^~~~~~~~~~~~~~

if !CONFIG_HIBERNATION, as we don't use the function unless we have that
config option set.

Fixes: 48d15436fd ("mm: remove get_swap_bio")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 10:04:49 -07:00
Christoph Hellwig
48d15436fd mm: remove get_swap_bio
Just reuse the block_device and sector from the swap_info structure,
just as used by the SWP_SYNCHRONOUS path.  Also remove the checks for
NULL returns from bio_alloc as that can't happen for sleeping
allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:49 -07:00
Christoph Hellwig
64820ac6c6 nilfs2: remove cruft in nilfs_alloc_seg_bio
bio_alloc never returns NULL when it can sleep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
6808f7af96 nfs/blocklayout: remove cruft in bl_alloc_init_bio
bio_alloc never returns NULL when it can sleep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
e82ed3a4fb md/raid6: refactor raid5_read_one_chunk
Refactor raid5_read_one_chunk so that all simple checks are done
before allocating the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
6a59656968 md: remove md_bio_alloc_sync
md_bio_alloc_sync is never called with a NULL mddev, and ->sync_set is
initialized in md_run, so it always must be initialized as well.  Just
open code the remaining call to bio_alloc_bioset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
32637385b8 md: simplify sync_page_io
Use an on-stack bio and biovec for the single page synchronous I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
a78f18da66 md: remove bio_alloc_mddev
bio_alloc_mddev is never called with a NULL mddev, and ->bio_set is
initialized in md_run, so it always must be initialized as well.  Just
open code the remaining call to bio_alloc_bioset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
ae7153f1a7 drbd: remove drbd_req_make_private_bio
Open code drbd_req_make_private_bio in the two callers to prepare
for further changes.  Also don't bother to initialize bi_next as the
bio code already does that that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
19304f959f drbd: remove bio_alloc_drbd
Given that drbd_md_io_bio_set is initialized during module initialization
and the module fails to load if the initialization fails there is no need
to fall back to plain bio_alloc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
67883ade7a f2fs: remove FAULT_ALLOC_BIO
Sleeping bio allocations do not fail, which means that injecting an error
into sleeping bio allocations is a little silly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
25ac84262c f2fs: use blkdev_issue_flush in __submit_flush_wait
Use the blkdev_issue_flush helper instead of duplicating it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
a587daa064 dm-clone: use blkdev_issue_flush in commit_metadata
Use blkdev_issue_flush instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
c6bf3f0e25 block: use an on-stack bio in blkdev_issue_flush
There is no point in allocating memory for a synchronous flush.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
3175199ab0 block: split bio_kmalloc from bio_alloc_bioset
bio_kmalloc shares almost no logic with the bio_set based fast path
in bio_alloc_bioset.  Split it into an entirely separate implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
4eb1d68904 blk-crypto: use bio_kmalloc in blk_crypto_clone_bio
Use bio_kmalloc instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
616c6a6884 btrfs: use bio_kmalloc in __alloc_device
Use bio_kmalloc instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
f91ca2a370 zonefs: use bio_alloc in zonefs_file_dio_append
Use bio_alloc instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Heiner Kallweit
ef9e4005cb PCI: Align checking of syscall user config accessors
After 34e3207205 ("PCI: handle positive error codes"),
pci_user_read_config_*() and pci_user_write_config_*() return 0 or negative
errno values, not PCIBIOS_* values like PCIBIOS_SUCCESSFUL or
PCIBIOS_BAD_REGISTER_NUMBER.

Remove comparisons with PCIBIOS_SUCCESSFUL and check only for non-zero.  It
happens that PCIBIOS_SUCCESSFUL is zero, so this is not a functional
change, but it aligns this code with the user accessors.

[bhelgaas: commit log]
Fixes: 34e3207205 ("PCI: handle positive error codes")
Link: https://lore.kernel.org/r/f1220314-e518-1e18-bf94-8e6f8c703758@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-01-27 10:41:59 -06:00
Felix Fietkau
e2b2c390b0 mt76: mt7615: reduce VHT maximum MPDU length
With a maximum length of 7991, the radio sometimes locks up under load when
transmitting A-MSDU frames to lots of stations. Using the lower limit makes
it work reliably again

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:33:44 +01:00
Felix Fietkau
2fbcdb4386 mt76: reduce q->lock hold time
Instead of holding it for the duration of an entire station schedule run,
which can block out competing tasks for a significant amount of time,
only hold it for scheduling one batch of packets for one station.
Improves responsiveness under load

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:33:44 +01:00
Lorenzo Bianconi
9b0f100c19 mt76: usb: process URBs with status EPROTO properly
Similar to commit '0e40dbd56d ("mt7601u: process URBs in status EPROTO
properly")', do no schedule rx_worker for urb marked with status set
-EPROTO

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:33:43 +01:00
Felix Fietkau
2ab33b8d7d mt76: move vif_mask back from mt76_phy to mt76_dev
Since it is global for all drivers, it belongs to the main device

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:33:43 +01:00
Felix Fietkau
51742a9e10 mt76: mt7915: make vif index per adapter instead of per band
The firmware treats it as global, so we need to avoid collisions here

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:33:43 +01:00
Felix Fietkau
9093cfff72 mt76: mt7915: add support for using a secondary PCIe link for gen1
For performance reasons, mt7915 supports using 2 PCIE gen1 links on
platforms that don't support gen2.
Add support for using this to move traffic for a second DBDC band onto
a dedicated link

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:33:43 +01:00
Felix Fietkau
94b6df08da mt76: fix crash on tearing down ext phy
Only clear dev->phy2 after the phy is gone, the driver may still need to access
it until shutdown is complete

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:01 +01:00
Felix Fietkau
76027f40f5 mt76: mt7915: bring up the WA event rx queue for band1
This is needed for DBDC cards to work correctly on both bands simultaneously

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:01 +01:00
Felix Fietkau
95f381c559 mt76: mt7615: unify init work
Reduce code duplication and remove unnecessary exports

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:01 +01:00
Ryder Lee
07c0d0012f mt76: mt7915: support TxBF for DBDC
With this patch, TxBF can be run on both bands simultaneously.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:01 +01:00
Xu Wang
5d3b50b30d mt76: mt7915: Remove unneeded semicolon
fix semicolon.cocci warnings:
drivers/net/wireless/mediatek/mt76/mt7915/mac.c:1694:2-3: Unneeded semicolon

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:01 +01:00
Lorenzo Bianconi
5b257371ec mt76: mt7615: set mcu country code in mt7615_mcu_set_channel_domain()
Update mcu country code running mt7615_mcu_set_channel_domain routine in
mt7615_regd_notifier().
Filter out disabled channels in mt7615_mcu_set_channel_domain().

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:01 +01:00
Felix Fietkau
45a8b67a35 mt76: mt7915: fix eeprom DBDC band selection
When the EEPROM band fields contain default values, assign 2.4 GHz to the
first band and 5 GHz to the second.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:01 +01:00
Felix Fietkau
f7fc2bbe46 mt76: mt7915: fix eeprom parsing for DBDC
Annotate WIFI_CONF eeprom mask values with the byte number
Fix parsing per-band number of chains

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2021-01-27 17:30:00 +01:00
Peter Zijlstra
3daa96d672 perf/intel: Remove Perfmon-v4 counter_freezing support
Perfmon-v4 counter freezing is fundamentally broken; remove this default
disabled code to make sure nobody uses it.

The feature is called Freeze-on-PMI in the SDM, and if it would do that,
there wouldn't actually be a problem, *however* it does something subtly
different. It globally disables the whole PMU when it raises the PMI,
not when the PMI hits.

This means there's a window between the PMI getting raised and the PMI
actually getting served where we loose events and this violates the
perf counter independence. That is, a counting event should not result
in a different event count when there is a sampling event co-scheduled.

This is known to break existing software (RR).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2021-01-27 17:26:58 +01:00
Like Xu
abd562df94 x86/perf: Use static_call for x86_pmu.guest_get_msrs
Clean up that CONFIG_RETPOLINE crud and replace the
indirect call x86_pmu.guest_get_msrs with static_call().

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210125121458.181635-1-like.xu@linux.intel.com
2021-01-27 17:26:58 +01:00
Mel Gorman
bae4ec1364 sched/fair: Move avg_scan_cost calculations under SIS_PROP
As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
check and while we are at it, exclude the cost of initialising the CPU
mask from the average scan cost.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210125085909.4600-3-mgorman@techsingularity.net
2021-01-27 17:26:44 +01:00
Mel Gorman
e6e0dc2d54 sched/fair: Remove SIS_AVG_CPU
SIS_AVG_CPU was introduced as a means of avoiding a search when the
average search cost indicated that the search would likely fail. It was
a blunt instrument and disabled by commit 4c77b18cf8 ("sched/fair: Make
select_idle_cpu() more aggressive") and later replaced with a proportional
search depth by commit 1ad3aaf3fc ("sched/core: Implement new approach
to scale select_idle_cpu()").

While there are corner cases where SIS_AVG_CPU is better, it has now been
disabled for almost three years. As the intent of SIS_PROP is to reduce
the time complexity of select_idle_cpu(), lets drop SIS_AVG_CPU and focus
on SIS_PROP as a throttling mechanism.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210125085909.4600-2-mgorman@techsingularity.net
2021-01-27 17:26:43 +01:00
Peter Oskolkov
1875dc5b8f sched: Correctly sort struct predeclarations
The comment says:
  /* task_struct member predeclarations (sorted alphabetically): */

So move io_uring_task where it belongs (alphabetically).

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210126193449.487547-1-posk@google.com
2021-01-27 17:26:43 +01:00
Yue Hu
432900f816 init/Kconfig: Correct thermal pressure help text
We're using arch_scale_thermal_pressure() to retrieve per CPU thermal
pressure.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210127054451.1240-1-zbestahu@gmail.com
2021-01-27 17:26:43 +01:00
Valentin Schneider
620a6dc407 sched/topology: Make sched_init_numa() use a set for the deduplicating sort
The deduplicating sort in sched_init_numa() assumes that the first line in
the distance table contains all unique values in the entire table. I've
been trying to pen what this exactly means for the topology, but it's not
straightforward. For instance, topology.c uses this example:

  node   0   1   2   3
    0:  10  20  20  30
    1:  20  10  20  20
    2:  20  20  10  20
    3:  30  20  20  10

  0 ----- 1
  |     / |
  |   /   |
  | /     |
  2 ----- 3

Which works out just fine. However, if we swap nodes 0 and 1:

  1 ----- 0
  |     / |
  |   /   |
  | /     |
  2 ----- 3

we get this distance table:

  node   0  1  2  3
    0:  10 20 20 20
    1:  20 10 20 30
    2:  20 20 10 20
    3:  20 30 20 10

Which breaks the deduplicating sort (non-representative first line). In
this case this would just be a renumbering exercise, but it so happens that
we can have a deduplicating sort that goes through the whole table in O(n²)
at the extra cost of a temporary memory allocation (i.e. any form of set).

The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following
this, implement the set as a 256-bits bitmap. Should this not be
satisfactory (i.e. we want to support 32-bit values), then we'll have to go
for some other sparse set implementation.

This has the added benefit of letting us allocate just the right amount of
memory for sched_domains_numa_distance[], rather than an arbitrary
(nr_node_ids + 1).

Note: DT binding equivalent (distance-map) decodes distances as 32-bit
values.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com
2021-01-27 17:26:42 +01:00
Qais Yousef
0ae78eec8a sched/eas: Don't update misfit status if the task is pinned
If the task is pinned to a cpu, setting the misfit status means that
we'll unnecessarily continuously attempt to migrate the task but fail.

This continuous failure will cause the balance_interval to increase to
a high value, and eventually cause unnecessary significant delays in
balancing the system when real imbalance happens.

Caught while testing uclamp where rt-app calibration loop was pinned to
cpu 0, shortly after which we spawn another task with high util_clamp
value. The task was failing to migrate after over 40ms of runtime due to
balance_interval unnecessary expanded to a very high value from the
calibration loop.

Not done here, but it could be useful to extend the check for pinning to
verify that the affinity of the task has a cpu that fits. We could end
up in a similar situation otherwise.

Fixes: 3b1baa6496 ("sched/fair: Add 'group_misfit_task' load-balance type")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Acked-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210119120755.2425264-1-qais.yousef@arm.com
2021-01-27 17:26:42 +01:00
Barry Song
d17405d52b dma-mapping: benchmark: fix kernel crash when dma_map_single fails
if dma_map_single() fails, kernel will give the below oops since
task_struct has been destroyed and we are running into the memory
corruption due to use-after-free in kthread_stop():

[   48.095310] Unable to handle kernel paging request at virtual address 000000c473548040
[   48.095736] Mem abort info:
[   48.095864]   ESR = 0x96000004
[   48.096025]   EC = 0x25: DABT (current EL), IL = 32 bits
[   48.096268]   SET = 0, FnV = 0
[   48.096401]   EA = 0, S1PTW = 0
[   48.096538] Data abort info:
[   48.096659]   ISV = 0, ISS = 0x00000004
[   48.096820]   CM = 0, WnR = 0
[   48.097079] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104639000
[   48.098099] [000000c473548040] pgd=0000000000000000, p4d=0000000000000000
[   48.098832] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   48.099232] Modules linked in:
[   48.099387] CPU: 0 PID: 2 Comm: kthreadd Tainted: G        W
[   48.099887] Hardware name: linux,dummy-virt (DT)
[   48.100078] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   48.100516] pc : __kmalloc_node+0x214/0x368
[   48.100944] lr : __kmalloc_node+0x1f4/0x368
[   48.101458] sp : ffff800011f0bb80
[   48.101843] x29: ffff800011f0bb80 x28: ffff0000c0098ec0
[   48.102330] x27: 0000000000000000 x26: 00000000001d4600
[   48.102648] x25: ffff0000c0098ec0 x24: ffff800011b6a000
[   48.102988] x23: 00000000ffffffff x22: ffff0000c0098ec0
[   48.103333] x21: ffff8000101d7a54 x20: 0000000000000dc0
[   48.103657] x19: ffff0000c0001e00 x18: 0000000000000000
[   48.104069] x17: 0000000000000000 x16: 0000000000000000
[   48.105449] x15: 000001aa0304e7b9 x14: 00000000000003b1
[   48.106401] x13: ffff8000122d5000 x12: ffff80001228d000
[   48.107296] x11: ffff0000c0154340 x10: 0000000000000000
[   48.107862] x9 : ffff80000fffffff x8 : ffff0000c473527f
[   48.108326] x7 : ffff800011e62f58 x6 : ffff0000c01c8ed8
[   48.108778] x5 : ffff0000c0098ec0 x4 : 0000000000000000
[   48.109223] x3 : 00000000001d4600 x2 : 0000000000000040
[   48.109656] x1 : 0000000000000001 x0 : ff0000c473548000
[   48.110104] Call trace:
[   48.110287]  __kmalloc_node+0x214/0x368
[   48.110493]  __vmalloc_node_range+0xc4/0x298
[   48.110805]  copy_process+0x2c8/0x15c8
[   48.111133]  kernel_clone+0x5c/0x3c0
[   48.111373]  kernel_thread+0x64/0x90
[   48.111604]  kthreadd+0x158/0x368
[   48.111810]  ret_from_fork+0x10/0x30
[   48.112336] Code: 17ffffe9 b9402a62 b94008a1 11000421 (f8626802)
[   48.112884] ---[ end trace d4890e21e75419d5 ]---

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-27 17:18:38 +01:00
Hao Xu
6195ba0982 io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE
Abaci reported the follow warning:

[   27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0
[   27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0
[   27.077604] Modules linked in:
[   27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1
[   27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   27.080852] RIP: 0010:__might_sleep+0x80/0xa0
[   27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff  0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7
[   27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286
[   27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000
[   27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570
[   27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001
[   27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000
[   27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800
[   27.091058] FS:  00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[   27.092775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0
[   27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   27.097011] Call Trace:
[   27.097685]  __mutex_lock+0x5d/0xa30
[   27.098565]  ? prepare_to_wait_exclusive+0x71/0xc0
[   27.099412]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
[   27.100441]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
[   27.101537]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
[   27.102656]  ? trace_hardirqs_on+0x46/0x110
[   27.103459]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
[   27.104317]  io_cqring_overflow_flush.part.101+0x6d/0x70
[   27.105113]  io_cqring_wait+0x36e/0x4d0
[   27.105770]  ? find_held_lock+0x28/0xb0
[   27.106370]  ? io_uring_remove_task_files+0xa0/0xa0
[   27.107076]  __x64_sys_io_uring_enter+0x4fb/0x640
[   27.107801]  ? rcu_read_lock_sched_held+0x59/0xa0
[   27.108562]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
[   27.109684]  ? syscall_enter_from_user_mode+0x26/0x70
[   27.110731]  do_syscall_64+0x2d/0x40
[   27.111296]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   27.112056] RIP: 0033:0x7f7b13dc8239
[   27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
[   27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa
[   27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239
[   27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003
[   27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008
[   27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480
[   27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000
[   27.121490] irq event stamp: 5635
[   27.121946] hardirqs last  enabled at (5643): [] console_unlock+0x5c4/0x740
[   27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740
[   27.125192] softirqs last  enabled at (5272): [] __do_softirq+0x3c5/0x5aa
[   27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20
[   27.127634] ---[ end trace 289d7e28fa60f928 ]---

This is caused by calling io_cqring_overflow_flush() which may sleep
after calling prepare_to_wait_exclusive() which set task state to
TASK_INTERRUPTIBLE

Reported-by: Abaci <abaci@linux.alibaba.com>
Fixes: 6c503150ae ("io_uring: patch up IOPOLL overflow_flush sync")
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:18:28 -07:00
Jan Kara
7684fbde45 bfq: Use only idle IO periods for think time calculations
Currently whenever bfq queue has a request queued we add now -
last_completion_time to the think time statistics. This is however
misleading in case the process is able to submit several requests in
parallel because e.g. if the queue has request completed at time T0 and
then queues new requests at times T1, T2, then we will add T1-T0 and
T2-T0 to think time statistics which just doesn't make any sence (the
queue's think time is penalized by the queue being able to submit more
IO). So add to think time statistics only time intervals when the queue
had no IO pending.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
[axboe: fix whitespace on empty line]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:16:00 -07:00
Jan Kara
28c6def009 bfq: Use 'ttime' local variable
Use local variable 'ttime' instead of dereferencing bfqq.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:15:38 -07:00
Jan Kara
41e76c8566 bfq: Avoid false bfq queue merging
bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:

[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read

[reader]
numjobs=4
directory=/mnt

where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).

Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.

Fixes: 058fdecc6d ("block, bfq: fix in-service-queue check for queue merging")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:15:38 -07:00
Maxim Mikityanskiy
8dc932d3e8 Revert "block: simplify set_init_blocksize" to regain lost performance
The cited commit introduced a serious regression with SATA write speed,
as found by bisecting. This patch reverts this commit, which restores
write speed back to the values observed before this commit.

The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
(WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
single HDD, the rest are different RAID levels built over the first
partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.

                | Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
----------------+--------+-------+-----------+-----------+--------
9011495c94    | R:256  | R:313 | R:276     | R:313     | R:323
(before faulty) | W:254  | W:253 | W:195     | W:204     | W:117
----------------+--------+-------+-----------+-----------+--------
5ff9f19231    | R:257  | R:398 | R:312     | R:344     | R:391
(faulty commit) | W:154  | W:122 | W:67.7    | W:66.6    | W:67.2
----------------+--------+-------+-----------+-----------+--------
5.10.10         | R:256  | R:401 | R:312     | R:356     | R:375
unpatched       | W:149  | W:123 | W:64      | W:64.1    | W:61.5
----------------+--------+-------+-----------+-----------+--------
5.10.10         | R:255  | R:396 | R:312     | R:340     | R:393
patched         | W:247  | W:274 | W:220     | W:225     | W:121

Applying this patch doesn't hurt read performance, while improves the
write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
is restored back to the state before the faulty commit, and even a bit
higher in RAID tests (which aren't HDD-bound on this device) - that is
likely related to other optimizations done between the faulty commit and
5.10.10 which also improved the read speed.

Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Fixes: 5ff9f19231 ("block: simplify set_init_blocksize")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:14:12 -07:00