linux-xiaomi-chiron

Author	SHA1	Message	Date
Geert Uytterhoeven	45c9a603a4	dmaengine: rcar-dmac: Disable interrupts while stopping channels During system reboot or halt, with lockdep enabled: ================================ WARNING: inconsistent lock state 4.18.0-rc1-salvator-x-00002-g9203dbec90a68103 #41 Tainted: G W -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. reboot/2779 [HC0[0]:SC0[0]:HE1:SE1] takes: 0000000098ae4ad3 (&(&rchan->lock)->rlock){?.-.}, at: rcar_dmac_shutdown+0x58/0x6c {IN-HARDIRQ-W} state was registered at: lock_acquire+0x208/0x238 _raw_spin_lock+0x40/0x54 rcar_dmac_isr_channel+0x28/0x200 __handle_irq_event_percpu+0x1c0/0x3c8 handle_irq_event_percpu+0x34/0x88 handle_irq_event+0x48/0x78 handle_fasteoi_irq+0xc4/0x12c generic_handle_irq+0x18/0x2c __handle_domain_irq+0xa8/0xac gic_handle_irq+0x78/0xbc el1_irq+0xec/0x1c0 arch_cpu_idle+0xe8/0x1bc default_idle_call+0x2c/0x30 do_idle+0x144/0x234 cpu_startup_entry+0x20/0x24 rest_init+0x27c/0x290 start_kernel+0x430/0x45c irq event stamp: 12177 hardirqs last enabled at (12177): [<ffffff800881d804>] _raw_spin_unlock_irq+0x2c/0x4c hardirqs last disabled at (12176): [<ffffff800881d638>] _raw_spin_lock_irq+0x1c/0x60 softirqs last enabled at (11948): [<ffffff8008081da8>] __do_softirq+0x160/0x4ec softirqs last disabled at (11935): [<ffffff80080ec948>] irq_exit+0xa0/0xfc other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&rchan->lock)->rlock); <Interrupt> lock(&(&rchan->lock)->rlock); * DEADLOCK * 3 locks held by reboot/2779: #0: 00000000bfabfa74 (reboot_mutex){+.+.}, at: sys_reboot+0xdc/0x208 #1: 00000000c75d8c3a (&dev->mutex){....}, at: device_shutdown+0xc8/0x1c4 #2: 00000000ebec58ec (&dev->mutex){....}, at: device_shutdown+0xd8/0x1c4 stack backtrace: CPU: 6 PID: 2779 Comm: reboot Tainted: G W 4.18.0-rc1-salvator-x-00002-g9203dbec90a68103 #41 Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT) Call trace: dump_backtrace+0x0/0x148 show_stack+0x14/0x1c dump_stack+0xb0/0xf0 print_usage_bug.part.26+0x1c4/0x27c mark_lock+0x38c/0x610 __lock_acquire+0x3fc/0x14d4 lock_acquire+0x208/0x238 _raw_spin_lock+0x40/0x54 rcar_dmac_shutdown+0x58/0x6c platform_drv_shutdown+0x20/0x2c device_shutdown+0x160/0x1c4 kernel_restart_prepare+0x34/0x3c kernel_restart+0x14/0x5c sys_reboot+0x160/0x208 el0_svc_naked+0x30/0x34 rcar_dmac_stop_all_chan() takes the channel lock while stopping a channel, but does not disable interrupts, leading to a deadlock when a DMAC interrupt comes in. Before, the same code block was called from an interrupt handler, hence taking the spinlock was sufficient. Fix this by disabling local interrupts while taking the spinlock. Fixes: `9203dbec90` ("dmaengine: rcar-dmac: don't use DMAC error interrupt") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Vinod Koul <vkoul@kernel.org>	2018-07-09 22:20:53 +05:30
Krzysztof Kozlowski	8ab11f8068	ARM: tegra: Work safely with 256 MB Colibri-T20 modules Colibri-T20 can come in 256 MB RAM (with 512 MB NAND) or 512 MB RAM (with 1024 MB NAND) flavors. Both of them will use the same DTSI expecting the bootloader to do the fixup of /memory node. However in case it does not happen, let's stay on safe side by limiting the memory to 256 MB for both versions of Colibri-T20. Rename to remove the unnecessary memory size from the device tree file name. While at it, also follow the typical Toradex SoC, module, carrier board hierarchy. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Stefan Agner <stefan@agner.ch> Tested-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-07-09 18:50:53 +02:00
Krzysztof Kozlowski	35a21229f8	ARM: tegra: Fix unit_address_vs_reg and avoid_unnecessary_addr_size DTC warnings Remove unneeded address/size cells properties and unit addresses to fix DTC warnings like: arch/arm/boot/dts/tegra30-apalis-eval.dtb: Warning (unit_address_vs_reg): /i2c@7000d000/stmpe811@41/stmpe_touchscreen@0: node has a unit name, but no reg property arch/arm/boot/dts/tegra30-apalis-eval.dtb: Warning (avoid_unnecessary_addr_size): /i2c@7000d000/stmpe811@41: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-07-09 18:50:33 +02:00
Krzysztof Kozlowski	482997699e	ARM: tegra: Fix unit_address_vs_reg DTC warnings for /memory Add a generic /memory node in each Tegra DTSI (with empty reg property, to be overidden by each DTS) and set proper unit address for /memory nodes to fix the DTC warnings: arch/arm/boot/dts/tegra20-harmony.dtb: Warning (unit_address_vs_reg): /memory: node has a reg or ranges property, but no unit name The DTB after the change is the same as before except adding unit-address to /memory node. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-07-09 18:50:10 +02:00
Krzysztof Kozlowski	f48ba1ae6a	ARM: tegra: Remove usage of deprecated skeleton.dtsi Remove the usage of skeleton.dtsi because it was deprecated since commit `9c0da3cc61` ("ARM: dts: explicitly mark skeleton.dtsi as deprecated"). It also allows later to fix DTC warnings for missing unit name in /memory nodes. Compiled DTBs are the same as before this commit. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>	2018-07-09 18:49:44 +02:00
Stephen Boyd	166f3a8ad6	First round of updates for meson clocks targeted at v4.19 * Remove legacy register access (finish moving to syscon) * Clean up configuration flags * Add axg PCIe clocks * Add GEN CLK on gxbb, gxl and axg * Remove clk_audio_divider driver * Add axg audio clock controller -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9OFZrhjz9W1fG7cb5vwPHDfy2oUFAltDV88ACgkQ5vwPHDfy 2oWk1w//bKOHNvjBEqkA5kZcFd6hQ81KP0poG3QoZqd9Mkvj7sT1GOUVm8QS9oll neVXUT6JIavwOobwnD6f7WCNSHJ6buWTlnxJc37JJU81PBhCV2t2owhLkItSf/ni GDK/BhGf/vGCIAvjlfU2Bh2crxG7hXGVMeg8OFG99my9WryXPL1HaGJfO6ge1T26 N7Gx1bFAdf2wAK3B/EpCPNBYsnS+ohJz0Cq1S8BbJXOpMLjgAsR7UB25F+DCCjp2 pPSHq7gNKUVcgZKveiy2xRqj8F/j3eQyF9tr6K3mpx0obFzXXxFT8meMR54a6y61 swJ6E0PI9+u9kOOo7KOAEuLBIpamgIP19BJNLmXxb+ibAjKjp/mUc7i5qBoqvCOW 8Rab8Yc9/PP03VPY+Pm66OrryrsLciVX9vcSCA5TPXTzp5chyM9Yi9ZsP0QQEC2f NY7A55aL7WfHJsG28NxGEsbAf4F9ozvIFgg4FFPBq/kFlf/wZeWGtz3JPyhK84I6 yxsQbCFDqyFCNJvwPnMVKmGuui26E7w2Ctf8pdWt+/Amnd2kHekJRDi5dx/d5RK6 UhRrrckBBqV3C8UweQZZPPz7aBEA4SYnxwUc9w1jU9l2osbn0WAA/wuOuVEUVRAg 5675tDnmZwVpVmxnMH8aZH0ykaNPX7jk/XLHTdDHqk8NRDUKHRk= =w4lD -----END PGP SIGNATURE----- Merge tag 'meson-clk-4.19-1' of https://github.com/BayLibre/clk-meson into clk-meson Pull first round of updates for meson clocks from Jerome Brunet: - Remove legacy register access (finish moving to syscon) - Clean up configuration flags - Add axg PCIe clocks - Add GEN CLK on gxbb, gxl and axg - Remove clk_audio_divider driver - Add axg audio clock controller * tag 'meson-clk-4.19-1' of https://github.com/BayLibre/clk-meson: clk: meson: add gen_clk clk: meson: gxbb: remove HHI_GEN_CLK_CTNL duplicate definition clk: meson-axg: add clocks required by pcie driver clk: meson: remove unused clk-audio-divider driver clk: meson: stop rate propagation for audio clocks clk: meson: axg: add the audio clock controller driver clk: meson: add axg audio sclk divider driver clk: meson: add triple phase clock driver clk: meson: add clk-phase clock driver clk: meson: clean-up meson clock configuration clk: meson: remove obsolete register access clk: meson: expose GEN_CLK clkid clk: meson-axg: add pcie and mipi clock bindings dt-bindings: clock: add meson axg audio clock controller bindings clk: meson: audio-divider is one based clk: add duty cycle support clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL	2018-07-09 09:47:55 -07:00
Arnd Bergmann	5ba57babcb	drm: vkms: select DRM_KMS_HELPER Without this, we get link errors during randconfig build: drivers/gpu/drm/vkms/vkms_drv.o:(.rodata+0xa0): undefined reference to `drm_atomic_helper_check' drivers/gpu/drm/vkms/vkms_drv.o:(.rodata+0xa8): undefined reference to `drm_atomic_helper_commit' drivers/gpu/drm/vkms/vkms_plane.o:(.rodata+0x0): undefined reference to `drm_atomic_helper_update_plane' drivers/gpu/drm/vkms/vkms_plane.o:(.rodata+0x8): undefined reference to `drm_atomic_helper_disable_plane' drivers/gpu/drm/vkms/vkms_plane.o:(.rodata+0x18): undefined reference to `drm_atomic_helper_plane_reset' drivers/gpu/drm/vkms/vkms_plane.o:(.rodata+0x28): undefined reference to `drm_atomic_helper_plane_duplicate_state' drivers/gpu/drm/vkms/vkms_plane.o:(.rodata+0x30): undefined reference to `drm_atomic_helper_plane_destroy_state' drivers/gpu/drm/vkms/vkms_output.o:(.rodata+0x1c0): undefined reference to `drm_helper_probe_single_connector_modes' drivers/gpu/drm/vkms/vkms_crtc.o:(.rodata+0x40): undefined reference to `drm_atomic_helper_crtc_reset' drivers/gpu/drm/vkms/vkms_crtc.o:(.rodata+0x70): undefined reference to `drm_atomic_helper_set_config' drivers/gpu/drm/vkms/vkms_crtc.o:(.rodata+0x78): undefined reference to `drm_atomic_helper_page_flip' drivers/gpu/drm/vkms/vkms_crtc.o:(.rodata+0x90): undefined reference to `drm_atomic_helper_crtc_duplicate_state' drivers/gpu/drm/vkms/vkms_crtc.o:(.rodata+0x98): undefined reference to `drm_atomic_helper_crtc_destroy_state' Fixes: `854502fa0a` ("drm/vkms: Add basic CRTC initialization") Fixes: `1c7c5fd916` ("drm/vkms: Introduce basic VKMS driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180709154901.1989316-1-arnd@arndb.de	2018-07-09 18:46:05 +02:00
Gregory CLEMENT	66c7bb7c41	clk: mvebu: armada-37xx-periph: switch to SPDX license identifier Adopt the SPDX license identifier headers to ease license compliance management. Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>	2018-07-09 09:44:49 -07:00
Gregory CLEMENT	61c40f35f5	clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz Switching the CPU from the L2 or L3 frequencies (300 and 200 Mhz respectively) to L0 frequency (1.2 Ghz) requires a significant amount of time to let VDD stabilize to the appropriate voltage. This amount of time is large enough that it cannot be covered by the hardware countdown register. Due to this, the CPU might start operating at L0 before the voltage is stabilized, leading to CPU stalls. To work around this problem, we prevent switching directly from the L2/L3 frequencies to the L0 frequency, and instead switch to the L1 frequency in-between. The sequence therefore becomes: 1. First switch from L2/L3(200/300MHz) to L1(600MHZ) 2. Sleep 20ms for stabling VDD voltage 3. Then switch from L1(600MHZ) to L0(1200Mhz). It is based on the work done by Ken Ma <make@marvell.com> Cc: stable@vger.kernel.org Fixes: `2089dc33ea` ("clk: mvebu: armada-37xx-periph: add DVFS support for cpu clocks") Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>	2018-07-09 09:44:06 -07:00
Florian Westphal	84379c9afe	netfilter: ipv6: nf_defrag: drop skb dst before queueing Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg \| tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2 Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute. Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-07-09 18:04:12 +02:00
Andrey Ryabinin	2045cdfa1b	netfilter: nf_conntrack: Fix possible possible crash on module loading. Loading the nf_conntrack module with doubled hashsize parameter, i.e. modprobe nf_conntrack hashsize=12345 hashsize=12345 causes NULL-ptr deref. If 'hashsize' specified twice, the nf_conntrack_set_hashsize() function will be called also twice. The first nf_conntrack_set_hashsize() call will set the 'nf_conntrack_htable_size' variable: nf_conntrack_set_hashsize() ... /* On boot, we can set this without any fancy locking. */ if (!nf_conntrack_htable_size) return param_set_uint(val, kp); But on the second invocation, the nf_conntrack_htable_size is already set, so the nf_conntrack_set_hashsize() will take a different path and call the nf_conntrack_hash_resize() function. Which will crash on the attempt to dereference 'nf_conntrack_hash' pointer: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 0010:nf_conntrack_hash_resize+0x255/0x490 [nf_conntrack] Call Trace: nf_conntrack_set_hashsize+0xcd/0x100 [nf_conntrack] parse_args+0x1f9/0x5a0 load_module+0x1281/0x1a50 __se_sys_finit_module+0xbe/0xf0 do_syscall_64+0x7c/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix this, by checking !nf_conntrack_hash instead of !nf_conntrack_htable_size. nf_conntrack_hash will be initialized only after the module loaded, so the second invocation of the nf_conntrack_set_hashsize() won't crash, it will just reinitialize nf_conntrack_htable_size again. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-07-09 18:04:11 +02:00
Florian Fainelli	3be77fe8c3	Merge tag 'bcm2835-dt-next-2018-07-03' into devicetree/next This pull request brings in a board DT for the Raspberry Pi Compute Module and its I/O board, the Pi3's PMU node, and the display's transposer block. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>	2018-07-09 08:12:15 -07:00
Vivek Unune	2bebdfcdcd	ARM: dts: BCM5301X: Add support for Linksys EA9500 Hardware Info ------------- Processor - Broadcom BCM4709C0KFEBG dual-core @ 1.4 GHz Switch - BCM53012 in BCM4709C0KFEBG & external BCM53125 DDR3 RAM - 256 MB Flash - 128 MB (Toshiba TC58BVG0S3HTA00) 2.4GHz - BCM4366 4×4 2.4/5G single chip 802.11ac SoC Power Amp - Skyworks SE2623L 2.4 GHz power amp (x4) 5GHz x 2 - BCM4366 4×4 2.4/5G single chip 802.11ac SoC Power Amp - PLX Technology PEX8603 3-lane, 3-port PCIe switch Ports - 8 Ports, 1 WAN Ports Antennas - 8 Antennas Serial Port - @J6 [GND,TX,RX] (VCC NC) 115200 8n1 Tested with OpenWrt built with DSA driver and Kernel v4.14 Signed-off-by: Vivek Unune <npcomplete13@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>	2018-07-09 08:12:13 -07:00
Rafał Miłecki	a21e754843	ARM: dts: BCM53573: Add architected timer It's a standard ARM architected timer that was simply missed when initially adding this .dtsi file. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>	2018-07-09 08:12:12 -07:00
Vivek Unune	37f6130ec3	ARM: dts: BCM5301X: Make USB 3.0 PHY use MDIO PHY driver Currently, the USB 3.0 PHY in bcm5301x.dtsi uses platform driver which requires register range "ccb-mii" <0x18003000 0x1000>. This range overlaps with MDIO cmd and param registers (<0x18003000 0x8>). Essentially, the platform driver partly acts like a MDIO bus driver, hence to use of this register range. In some Northstar devices like Linksys EA9500, secondary switch is connected via external MDIO. The only way to access and configure the external switch is via MDIO bus. When we enable the MDIO bus in it's current state, the MDIO bus and any child buses fail to register because of the register range overlap. On Northstar, the USB 3.0 PHY is connected at address 0x10 on the internal MDIO bus. This change moves the usb3_phy node and makes it a child node of internal MDIO bus. Thanks to Rafał Miłecki's commit `af850e14a7` ("phy: bcm-ns-usb3: add MDIO driver using proper bus layer") the same USB 3.0 platform driver can now act as USB 3.0 PHY MDIO driver. Tested on Linksys Panamera (EA9500) Signed-off-by: Vivek Unune <npcomplete13@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>	2018-07-09 08:12:11 -07:00
Mohamed Ismail Abdul Packir Mohamed	a08e950de6	ARM: dts: cygnus: enable iproc-hwrng Enable the HW rng driver "iproc-rng200" for all cygnus platforms. Signed-off-by: Mohamed Ismail Abdul Packir Mohamed <mohamed-ismail.abdul@broadcom.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Scott Branden <scott.branden@broadcom.com> Tested-by: Clément Péron <peron.clem@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>	2018-07-09 08:12:10 -07:00
Clément Péron	00d1ae3840	ARM: dts: cygnus: add ethernet0 alias In order to avoid Linux generating a random mac address on every boot, add an ethernet0 alias that will allow u-boot to patch the dtb with the MAC address. Signed-off-by: Clément Péron <peron.clem@gmail.com> Acked-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>	2018-07-09 08:12:01 -07:00
Boris Brezillon	b7dd29b401	ARM: dts: bcm283x: Add Transposer block The transposer block is allowing one to write the result of the VC4 composition back to memory instead of displaying it on a screen. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net>	2018-07-09 08:10:20 -07:00
Eric Anholt	b8ccf02a50	ARM: dts: bcm283x: Add the PMU to the devicetree. This only probes on arm64 so far, but hopefully that driver will be generalized soon. Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Stefan Wahren <stefan.wahren@i2se.com>	2018-07-09 08:10:08 -07:00
Randy Dunlap	3993e501bf	block/DAC960.c: fix defined but not used build warnings Fix build warnings in DAC960.c when CONFIG_PROC_FS is not enabled by marking the unused functions as __maybe_unused. ../drivers/block/DAC960.c:6429:12: warning: 'dac960_proc_show' defined but not used [-Wunused-function] ../drivers/block/DAC960.c:6449:12: warning: 'dac960_initial_status_proc_show' defined but not used [-Wunused-function] ../drivers/block/DAC960.c:6456:12: warning: 'dac960_current_status_proc_show' defined but not used [-Wunused-function] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:55 -06:00
Matias Bjørling	ca4b2a0119	null_blk: add zone support Adds support for exposing a null_blk device through the zone device interface. The interface is managed with the parameters zoned and zone_size. If zoned is set, the null_blk instance registers as a zoned block device. The zone_size parameter defines how big each zone will be. Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:55 -06:00
Matias Bjørling	6dad38d38f	null_blk: move shared definitions to header file Split the null_blk device driver, such that it can prepare for zoned block interface support. Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Geert Uytterhoeven	e9a8385330	block: Add default switch case to blk_pm_allow_request() to kill warning With gcc 4.9.0 and 7.3.0: block/blk-core.c: In function 'blk_pm_allow_request': block/blk-core.c:2747:2: warning: enumeration value 'RPM_ACTIVE' not handled in switch [-Wswitch] switch (rq->q->rpm_status) { ^ Convert the return statement below the switch() block into a default case to fix this. Fixes: `e4f36b249b` ("block: fix peeking requests during PM") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Mikulas Patocka	b88aef36b8	block: fix infinite loop if the device loses discard capability If __blkdev_issue_discard is in progress and a device mapper device is reloaded with a table that doesn't support discard, q->limits.max_discard_sectors is set to zero. This results in infinite loop in __blkdev_issue_discard. This patch checks if max_discard_sectors is zero and aborts with -EOPNOTSUPP. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Zdenek Kabelac <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Shakeel Butt	c137969bd4	block, mm: remove unnecessary __GFP_HIGH flag The flag GFP_ATOMIC already contains __GFP_HIGH. There is no need to explicitly or __GFP_HIGH again. So, just remove unnecessary __GFP_HIGH. Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Liu Bo	00a8cdb84f	null_blk: remove NULLB_DEV_FL_CONFIGURED on turning off nullb device Currently mbps knob could only be set once before switching power knob to on, after power knob has been set at least once, there is no way to set mbps knob again due to -EBUSY. As nullb is mainly used for testing, in order to make it flexible, this removes the flag NULLB_DEV_FL_CONFIGURED so that mbps knob can be reset when power knob is off, e.g. echo 0 > /config/nullb/a/power echo 40 > /config/nullb/a/mbps echo 1 > /config/nullb/a/power So does other knobs under /config/nullb/a. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	ca47e8c72a	mm: skip readahead if the cgroup is congested We noticed in testing we'd get pretty bad latency stalls under heavy pressure because read ahead would try to do its thing while the cgroup was under severe pressure. If we're under this much pressure we want to do as little IO as possible so we can still make progress on real work if we're a throttled cgroup, so just skip readahead if our group is under pressure. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	b351f0c76c	Documentation: add a doc for blk-iolatency A basic documentation to describe the interface, statistics, and behavior of io.latency. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	d706751215	block: introduce blk-iolatency io controller Current IO controllers for the block layer are less than ideal for our use case. The io.max controller is great at hard limiting, but it is not work conserving. This patch introduces io.latency. You provide a latency target for your group and we monitor the io in short windows to make sure we are not exceeding those latency targets. This makes use of the rq-qos infrastructure and works much like the wbt stuff. There are a few differences from wbt - It's bio based, so the latency covers the whole block layer in addition to the actual io. - We will throttle all IO types that comes in here if we need to. - We use the mean latency over the 100ms window. This is because writes can be particularly fast, which could give us a false sense of the impact of other workloads on our protected workload. - By default there's no throttling, we set the queue_depth to INT_MAX so that we can have as many outstanding bio's as we're allowed to. Only at throttle time do we pay attention to the actual queue depth. - We backcharge cgroups for root cg issued IO and induce artificial delays in order to deal with cases like metadata only or swap heavy workloads. In testing this has worked out relatively well. Protected workloads will throttle noisy workloads down to 1 io at time if they are doing normal IO on their own, or induce up to a 1 second delay per syscall if they are doing a lot of root issued IO (metadata/swap IO). Our testing has revolved mostly around our production web servers where we have hhvm (the web server application) in a protected group and everything else in another group. We see slightly higher requests per second (RPS) on the test tier vs the control tier, and much more stable RPS across all machines in the test tier vs the control tier. Another test we run is a slow memory allocator in the unprotected group. Before this would eventually push us into swap and cause the whole box to die and not recover at all. With these patches we see slight RPS drops (usually 10-15%) before the memory consumer is properly killed and things recover within seconds. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	67b42d0bf7	rq-qos: introduce dio_bio callback wbt cares only about request completion time, but controllers may need information that is on the bio itself, so add a done_bio callback for rq-qos so things like blk-iolatency can use it to have the bio when it completes. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	c1c80384c8	block: remove external dependency on wbt_flags We don't really need to save this stuff in the core block code, we can just pass the bio back into the helpers later on to derive the same flags and update the rq->wbt_flags appropriately. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	a79050434b	blk-rq-qos: refactor out common elements of blk-wbt blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	2ecbf45635	blk-stat: export helpers for modifying blk_rq_stat We need to use blk_rq_stat in the blkcg qos stuff, so export some of these helpers so they can be used by other things. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Tejun Heo	2cf855837b	memcontrol: schedule throttling if we are congested Memory allocations can induce swapping via kswapd or direct reclaim. If we are having IO done for us by kswapd and don't actually go into direct reclaim we may never get scheduled for throttling. So instead check to see if our cgroup is congested, and if so schedule the throttling. Before we return to user space the throttling stuff will only throttle if we actually required it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	d09d8df3a2	blkcg: add generic throttling mechanism Since IO can be issued from literally anywhere it's almost impossible to do throttling without having some sort of adverse effect somewhere else in the system because of locking or other dependencies. The best way to solve this is to do the throttling when we know we aren't holding any other kernel resources. Do this by tracking throttling in a per-blkg basis, and if we require throttling flag the task that it needs to check before it returns to user space and possibly sleep there. This is to address the case where a process is doing work that is generating IO that can't be throttled, whether that is directly with a lot of REQ_META IO, or indirectly by allocating so much memory that it is swamping the disk with REQ_SWAP. We can't use task_add_work as we don't want to induce a memory allocation in the IO path, so simply saving the request queue in the task and flagging it to do the notify_resume thing achieves the same result without the overhead of a memory allocation. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Tejun Heo	0d3bd88d54	swap,blkcg: issue swap io with the appropriate context For backcharging we need to know who the page belongs to when swapping it out. We don't worry about things that do ->rw_page (zram etc) at the moment, we're only worried about pages that actually go to a block device. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	0d1e0c7cd5	blk: introduce REQ_SWAP Just like REQ_META, it's important to know the IO coming down is swap in order to guard against potential IO priority inversion issues with cgroups. Add REQ_SWAP and use it for all swap IO, and add it to our bio_issue_as_root_blkg helper. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	903d23f0a3	blk-cgroup: allow controllers to output their own stats blk-iolatency has a few stats that it would like to print out, and instead of adding a bunch of crap to the generic code just provide a helper so that controllers can add stuff to the stat line if they want to. Hide it behind a boot option since it changes the output of io.stat from normal, and these stats are only interesting to developers. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	c7c98fd376	block: introduce bio_issue_as_root_blkg Instead of forcing all file systems to get the right context on their bio's, simply check for REQ_META to see if we need to issue as the root blkg. We don't want to force all bio's to have the root blkg associated with them if REQ_META is set, as some controllers (blk-iolatency) need to know who the originating cgroup is so it can backcharge them for the work they are doing. This helper will make sure that the controllers do the proper thing wrt the IO priority and backcharging. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Josef Bacik	08e18eab0c	block: add bi_blkg to the bio for cgroups Currently io.low uses a bi_cg_private to stash its private data for the blkg, however other blkcg policies may want to use this as well. Since we can get the private data out of the blkg, move this to bi_blkg in the bio and make it generic, then we can use bio_associate_blkg() to attach the blkg to the bio. Theoretically we could simply replace the bi_css with this since we can get to all the same information from the blkg, however you have to lookup the blkg, so for example wbc_init_bio() would have to lookup and possibly allocate the blkg for the css it was trying to attach to the bio. This could be problematic and result in us either not attaching the css at all to the bio, or falling back to the root blkcg if we are unable to allocate the corresponding blkg. So for now do this, and in the future if possible we could just replace the bi_css with bi_blkg and update the helpers to do the correct translation. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Ming Lei	6e76871730	blk-mq: dequeue request one by one from sw queue if hctx is busy It won't be efficient to dequeue request one by one from sw queue, but we have to do that when queue is busy for better merge performance. This patch takes the Exponential Weighted Moving Average(EWMA) to figure out if queue is busy, then only dequeue request one by one from sw queue when queue is busy. Fixes: `b347689ffb` ("blk-mq-sched: improve dispatching from sw queue") Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Gustavo A. R. Silva	d893ff8603	block/loop: mark expected switch fall-through In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Gustavo A. R. Silva	d769a99296	drbd: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Warning level 2 was used in this case: -Wimplicit-fallthrough=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Ming Lei	b04f50ab8a	blk-mq: only attempt to merge bio if there is rq in sw queue Only attempt to merge bio iff the ctx->rq_list isn't empty, because: 1) for high-performance SSD, most of times dispatch may succeed, then there may be nothing left in ctx->rq_list, so don't try to merge over sw queue if it is empty, then we can save one acquiring of ctx->lock 2) we can't expect good merge performance on per-cpu sw queue, and missing one merge on sw queue won't be a big deal since tasks can be scheduled from one CPU to another. Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Ming Lei	3f0cedc7e9	blk-mq: use list_splice_tail_init() to insert requests list_splice_tail_init() is much more faster than inserting each request one by one, given all requets in 'list' belong to same sw queue and ctx->lock is required to insert requests. Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Minwoo Im	c018c84fdb	blk-mq: fix typo in a function comment Fix typo in a function blk_mq_alloc_tag_set() comment. if if it too large -> if it's too large. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Minwoo Im	0da73d00ca	blk-mq: code clean-up by adding an API to clear set->mq_map set->mq_map is now currently cleared if something goes wrong when establishing a queue map in blk-mq-pci.c. It's also cleared before updating a queue map in blk_mq_update_queue_map(). This patch provides an API to clear set->mq_map to make it clear. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Colin Ian King	5efac89c84	paride: remove redundant variable n Variable n is being assigned but is never used hence it is redundant and can be removed. Also put spacing between variables in declaration to clean up checkpatch warnings. Cleans up clang warning: warning: variable 'n' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Colin Ian King	e84422cdf3	partitions/ldm: remove redundant pointer dgrp Pointer dgrp is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'dgrp' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Colin Ian King	f4354a94e2	loop: remove redundant pointer inode Pointer inode is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'inode' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00

... 198 199 200 201 202 ...

781771 commits