linux-xiaomi-chiron

Author	SHA1	Message	Date
Brian Vazquez	2e3a94aa2b	bpf: Fix memory leaks in generic update/delete batch ops Generic update/delete batch ops functions were using __bpf_copy_key without properly freeing the memory. Handle the memory allocation and copy_from_user separately. Fixes: `aa2e93b8e5` ("bpf: Add generic support for update and delete batch ops") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200119194040.128369-1-brianvv@google.com	2020-01-20 22:27:51 +01:00
Masami Ichikawa	bf24daac8f	tracing: Do not set trace clock if tracefs lockdown is in effect When trace_clock option is not set and unstable clcok detected, tracing_set_default_clock() sets trace_clock(ThinkPad A285 is one of case). In that case, if lockdown is in effect, null pointer dereference error happens in ring_buffer_set_clock(). Link: http://lkml.kernel.org/r/20200116131236.3866925-1-masami256@gmail.com Cc: stable@vger.kernel.org Fixes: `17911ff38a` ("tracing: Add locked_down checks to the open calls of files created for tracefs") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1788488 Signed-off-by: Masami Ichikawa <masami256@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-01-20 16:18:14 -05:00
Steven Rostedt (VMware)	8bcebc77e8	tracing: Fix histogram code when expression has same var as value While working on a tool to convert SQL syntex into the histogram language of the kernel, I discovered the following bug: # echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events # echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger Would not display any histograms in the sched_switch histogram side. But if I were to swap the location of "delta=common_timestamp-$start" with "start2=$start" Such that the last line had: # echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger The histogram works as expected. What I found out is that the expressions clear out the value once it is resolved. As the variables are resolved in the order listed, when processing: delta=common_timestamp-$start The $start is cleared. When it gets to "start2=$start", it errors out with "unresolved symbol" (which is silent as this happens at the location of the trace), and the histogram is dropped. When processing the histogram for variable references, instead of adding a new reference for a variable used twice, use the same reference. That way, not only is it more efficient, but the order will no longer matter in processing of the variables. From Tom Zanussi: "Just to clarify some more about what the problem was is that without your patch, we would have two separate references to the same variable, and during resolve_var_refs(), they'd both want to be resolved separately, so in this case, since the first reference to start wasn't part of an expression, it wouldn't get the read-once flag set, so would be read normally, and then the second reference would do the read-once read and also be read but using read-once. So everything worked and you didn't see a problem: from: start2=$start,delta=common_timestamp-$start In the second case, when you switched them around, the first reference would be resolved by doing the read-once, and following that the second reference would try to resolve and see that the variable had already been read, so failed as unset, which caused it to short-circuit out and not do the trigger action to generate the synthetic event: to: delta=common_timestamp-$start,start2=$start With your patch, we only have the single resolution which happens correctly the one time it's resolved, so this can't happen." Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home Cc: stable@vger.kernel.org Fixes: `067fe038e7` ("tracing: Add variable reference handling to hist triggers") Reviewed-by: Tom Zanuss <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-01-20 16:11:47 -05:00
Olof Johansson	684415d0de	cmdq: - clean ups of unused code and debuggability - add cmdq_instruction to make the function call interface more readable - add functions for polling and providing info for the user of cmdq scpsys: - add bindings for MT6765 -----BEGIN PGP SIGNATURE----- iQJLBAABCAA1FiEEUdvKHhzqrUYPB/u8L21+TfbCqH4FAl4cPekXHG1hdHRoaWFz LmJnZ0BnbWFpbC5jb20ACgkQL21+TfbCqH6YDA/+MRYiMb26fu+Ac3pF1SW6ao0A fCI0JH2L8Zj4nklg3Y/hoi2PG7DJL2vaG71Ng3iQxQqKLWJdS1WbF/DBl8JdozQx mJXXZ0JM5cc/gIvrG8PLAauWRTDUIPDC2ZZ66mO4CiR6T7Kp/LpnmZF5ZR4qxu/W KNAYxSeyPJ36WQ3HCmubBWOn4JaetOeBFID6z3roKhsGyFXLmCo0Djw4c0Lpk79t 3qVpo7TcC3hintZSC8ufIjQZA/KU3kwy3pS2O+HKMDb4EwORl14Grd6/cN9rAf0p twprn4Y9Jx4cBy0jGyZShKMbIn6wT6zF8GqpoV71vvXTgknyr8XoaiepnjMd2Enm BXw7R8kUt3w4mHnMwX9aUPAP48S1DVMm0OsbwOOqRcSXyJWnQBFB85LnkKeSfzRW iWfeXQBaESCQziYkJSkXz2D4epV/Rzq/L1y5IrddQuwIlpQ3eeqYUWBDYhuV3VlY K4r/AawG0NKPWTvH0bgLZAiWybnzXPZUvVJzwkdwRBjpiyhsl0FX1oZb1H8Vf6mK yrCUgaTIVp6z0pMd8kaxPTyi2AXryKODOXp+XJ2JFmdLFuBPQMSLCiG+B6sDa2Sw eYMcZpFtL5B/p/NpNuSUlGZ2HXTxknS3RnIDTUJ583BZLeItShHnBW3P/8qjx7Wm 8oGsLh0ZLF0G5iqdeuM= =Q9/6 -----END PGP SIGNATURE----- Merge tag 'v5.5-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/drivers cmdq: - clean ups of unused code and debuggability - add cmdq_instruction to make the function call interface more readable - add functions for polling and providing info for the user of cmdq scpsys: - add bindings for MT6765 * tag 'v5.5-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux: dt-bindings: mediatek: add MT6765 power dt-bindings soc: mediatek: cmdq: delete not used define soc: mediatek: cmdq: add cmdq_dev_get_client_reg function soc: mediatek: cmdq: add polling function soc: mediatek: cmdq: define the instruction struct soc: mediatek: cmdq: remove OR opertaion from err return Link: https://lore.kernel.org/r/9b365e76-e346-f813-d750-d7cfd0d16e4e@gmail.com Signed-off-by: Olof Johansson <olof@lixom.net>	2020-01-20 11:26:10 -08:00
Linus Torvalds	d96d875ef5	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl4laHsACgkQnJ2qBz9k QNnyrQf/W6+QU/BTP0K478PvXOzglNz/UdJf8Y5bzr11Cpx9oV4Mh5MdePViyo+Q +pqVmNmNsSpFoTt4K+b/TkU8z81jB+uYnxYZgFUUVrpKh1913EriGjna0F94ZlvL b607o6cfq79J6w8Ddf64Yq415328syxVsmZiK03T04ENHHcSW1zuBiuG2iO1l4lt SM+QNfSgJe33+fRcbI9Rr7Pywhm1FYZ6EIrymTeWZTGDuU8tN0o3m5vJ9Y0AuHsf u8V/3TX2bWI/TVFWtFzOQvhq2cCVATmBesgRzaPO7brNMvyGjAvtg/gGSfnPaWPs ZOSuUuIp2aL5Z4I5ZAwca0lHErkV8w== =b/h6 -----END PGP SIGNATURE----- Merge tag 'fixes_for_v5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs fix from Jan Kara: "A fixup of a recently merged reiserfs fix which has caused problem when xattrs were not compiled in" * tag 'fixes_for_v5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: fix handling of -EOPNOTSUPP in reiserfs_for_each_xattr	2020-01-20 11:24:13 -08:00
Olof Johansson	96b34bac41	This pull request contains Broadcom SoCs MAINTAINERS file updates for 5.6, please pull the following: - Nicolas adds an entry for the Broadcom STB PCIe Root Complex files for both BCM7xxx (actual STB SoCs) and BCM2711 (Raspberry Pi 4). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEm+Rq3+YGJdiR9yuFh9CWnEQHBwQFAl4iepcACgkQh9CWnEQH BwT3jg/+K9+5zXLbbbmd2hkMTtOaNcKBX3vVTQoYCENSv+PXqhIc3z8be/wH+2bT wOWwYL1EP0Fjhnpe5JM/r1Z7wpGukNCCFmwu33ly5gc25albr4XLEI8/kXgFZPSQ Tg/IdFttY0VaIjXrUDh6lCTb3OJkR2vuprivZQpKUZO0pi2wfNNgmQVJOsdVtvEB 9H6OUIV3CP79tJbEPchM7AV0aoOXMY3tfubKNgBzgW6hr9p4MsQWQiQfiLmCqB+H EcMTogFJ63Z239cnH/bCd363PGKJacI3BMB4FQ+7otI8Pl2pN1e39iYKF8Ch5+vt JFdh5eOfmVnJ/FInDKYt/u+lsPGYMt1aseZrJZlgwO1ZSR8/kpyr0bzsy5E9Koyh LW7NOIF1xXhmOip2CXArAeyfIV5toGyVIFvqLB21Mv5u9Xgzo7kq5nJi7AtnVEns 1LWKMG7qD9dVQT3nVKlxGeE0SXuxANVxgRI7OJ+++6l1XUeSuG0ZU5hRIRGBwKbg 6zFSyp94VolF/cx3BripJVYNWBBDeRDiE2fSAcvxGO5almRB0k0V7Cf4r/c8uYtC NYtk7LUZb2Gz8kUUoJv7Uoa8G6oGECYRZ92kBtsk9vRKPUQWNYakrRCxXbTLRDkj 1/l98A1iHUqDpEOfSVO+0RdFCbmNAkmyPBVz1eS4wBxUWYHk29M= =OIMI -----END PGP SIGNATURE----- Merge tag 'arm-soc/for-5.6/maintainers' of https://github.com/Broadcom/stblinux into arm/drivers This pull request contains Broadcom SoCs MAINTAINERS file updates for 5.6, please pull the following: - Nicolas adds an entry for the Broadcom STB PCIe Root Complex files for both BCM7xxx (actual STB SoCs) and BCM2711 (Raspberry Pi 4). * tag 'arm-soc/for-5.6/maintainers' of https://github.com/Broadcom/stblinux: MAINTAINERS: Add brcmstb PCIe controller entry Link: https://lore.kernel.org/r/20200118032935.1346-1-f.fainelli@gmail.com Signed-off-by: Olof Johansson <olof@lixom.net>	2020-01-20 11:22:49 -08:00
Olof Johansson	1342a6aa4a	Samsung defconfig changes for v5.6 1. Bring back explicitly wanted options which were removed through `make savedefconfig`. savedefconfig removes options selected by other symbol, however developers of this other symbol can remove anytime 'select' statement. 2. Enable NFS v4.1 and v4.2, useful in testing/CI systems. 3. Enable thermal throttling through devfreq framework. -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEE3dJiKD0RGyM7briowTdm5oaLg9cFAl4l5+AQHGtyemtAa2Vy bmVsLm9yZwAKCRDBN2bmhouD16FrEACQzSD76RPfzB5VuVVFlwlnDsan/lhKvCG5 jBHVTX5O2eTT+rS5Xo5JjTBIAlkh09hkowbvV57puoXqbVb+b3Mf8xwKrsS+J7Lz hTQZB/tIhMmpWluLJUpK1+IkqaaozdcctqdQm/L6lu5iRzILSrsdf7Pic1zgdQ5V 0dGeEw7H5wp5Yy0OrjSCo+mE4FaxR7MNFe41AJIwe3neGSLsF1R/Kc6w5L3YkOAu aX/g+odWpS64AWlCkdomSSs2Qoj4dd/EXIeuAck/e4hnpDSBfHHTQ2KvxqHfkG5x G2aquZ1n2QSg4zyXN4K0gu0GfVkI/nvoS7tHblQVh+9e3lGSKJA9iJ948tWMyR4k Ovwt5qkevR5poQlMfcivhfn/4rJFrXm8VUugywyDGdJEvUDFz355hoyoEIrVSq96 UdxHyIWKO9HiNnvlsEZrr1B1Gf9J+++oFT/lR60JzCNvFJcuEw8j51jLV1N3WKAo ndQZfB2VOUSL0iBs4L7+7J/TuwDgRYohgJ5TvyXQJjcryBpJAVoJgKaCqLJMnrPm 5eet8vjEciHg6XiM9nMEWskhQfD2+K63gukRsPuAze47xYisd3vmHtns1LQ7A2zH W4b1fmSQY5XzxfH79uIVcSdH2kL/DtqcSCR09czhkZTLbHtq/h59uSrv/El3shTu 4KAMvyjmnQ== =d+tH -----END PGP SIGNATURE----- Merge tag 'samsung-defconfig-5.6' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/defconfig Samsung defconfig changes for v5.6 1. Bring back explicitly wanted options which were removed through `make savedefconfig`. savedefconfig removes options selected by other symbol, however developers of this other symbol can remove anytime 'select' statement. 2. Enable NFS v4.1 and v4.2, useful in testing/CI systems. 3. Enable thermal throttling through devfreq framework. * tag 'samsung-defconfig-5.6' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: ARM: multi_v7_defconfig: Enable devfreq thermal integration ARM: exynos_defconfig: Enable devfreq thermal integration ARM: multi_v7_defconfig: Enable NFS v4.1 and v4.2 ARM: exynos_defconfig: Enable NFS v4.1 and v4.2 ARM: exynos_defconfig: Bring back explicitly wanted options Link: https://lore.kernel.org/r/20200120180227.9061-1-krzk@kernel.org Signed-off-by: Olof Johansson <olof@lixom.net>	2020-01-20 11:20:23 -08:00
Kevin Hao	0f394daef8	irqdomain: Fix a memory leak in irq_domain_push_irq() Fix a memory leak reported by kmemleak: unreferenced object 0xffff000bc6f50e80 (size 128): comm "kworker/23:2", pid 201, jiffies 4294894947 (age 942.132s) hex dump (first 32 bytes): 00 00 00 00 41 00 00 00 86 c0 03 00 00 00 00 00 ....A........... 00 a0 b2 c6 0b 00 ff ff 40 51 fd 10 00 80 ff ff ........@Q...... backtrace: [<00000000e62d2240>] kmem_cache_alloc_trace+0x1a4/0x320 [<00000000279143c9>] irq_domain_push_irq+0x7c/0x188 [<00000000d9f4c154>] thunderx_gpio_probe+0x3ac/0x438 [<00000000fd09ec22>] pci_device_probe+0xe4/0x198 [<00000000d43eca75>] really_probe+0xdc/0x320 [<00000000d3ebab09>] driver_probe_device+0x5c/0xf0 [<000000005b3ecaa0>] __device_attach_driver+0x88/0xc0 [<000000004e5915f5>] bus_for_each_drv+0x7c/0xc8 [<0000000079d4db41>] __device_attach+0xe4/0x140 [<00000000883bbda9>] device_initial_probe+0x18/0x20 [<000000003be59ef6>] bus_probe_device+0x98/0xa0 [<0000000039b03d3f>] deferred_probe_work_func+0x74/0xa8 [<00000000870934ce>] process_one_work+0x1c8/0x470 [<00000000e3cce570>] worker_thread+0x1f8/0x428 [<000000005d64975e>] kthread+0xfc/0x128 [<00000000f0eaa764>] ret_from_fork+0x10/0x18 Fixes: `495c38d300` ("irqdomain: Add irq_domain_{push,pop}_irq() functions") Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200120043547.22271-1-haokexin@gmail.com	2020-01-20 19:10:05 +00:00
Joakim Zhang	2fbb13961e	irqchip: Add NXP INTMUX interrupt multiplexer support The Interrupt Multiplexer (INTMUX) expands the number of peripherals that can interrupt the core: * The INTMUX has 8 channels that are assigned to 8 NVIC interrupt slots. * Each INTMUX channel can receive up to 32 interrupt sources and has 1 interrupt output. * The INTMUX routes the interrupt sources to the interrupt outputs. Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200117060653.27485-3-qiangqing.zhang@nxp.com	2020-01-20 19:10:05 +00:00
Joakim Zhang	618ea6275b	dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer This patch adds the DT bindings for the NXP INTMUX interrupt multiplexer for i.MX8 family SoCs. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20200117060653.27485-2-qiangqing.zhang@nxp.com	2020-01-20 19:10:05 +00:00
Hyunki Koo	b74416dba3	irqchip: Define EXYNOS_IRQ_COMBINER This patch is written to clean up dependency of ARCH_EXYNOS Not all exynos device have IRQ_COMBINER, especially aarch64 EXYNOS but it is built for all exynos devices. Thus add the config for EXYNOS_IRQ_COMBINER remove direct dependency between ARCH_EXYNOS and exynos-combiner.c and only selected on the aarch32 devices Signed-off-by: Hyunki Koo <hyunki00.koo@samsung.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20191224211108.7128-1-hyunki00.koo@gmail.com	2020-01-20 19:10:05 +00:00
Qianggui Song	8f78bd62bd	irqchip/meson-gpio: Add support for meson a1 SoCs The meson a1 Socs have some changes compared with previous chips. For A113L, it contains 62 pins and can be spied on: - 62:128 undefined - 61:50 12 pins on bank A - 49:37 13 pins on bank F - 36:20 17 pins on bank X - 19:13 7 pins on bank B - 12:0 13 pins on bank P There are five relative registers for gpio interrupt controller, details are as below: - PADCTRL_GPIO_IRQ_CTRL0 bit[31]: enable/disable the whole irq lines bit[16-23]: both edge trigger bit[8-15]: single edge trigger bit[0-7]: pol trigger - PADCTRL_GPIO_IRQ_CTRL[X] bit[0-6]: 7 bits to choose gpio source for irq line 2[X] - 2 bit[16-22]: 7 bits to choose gpio source for irq line 2[X] - 1 where X =1,2,3,4 Signed-off-by: Qianggui Song <qianggui.song@amlogic.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191216123645.10099-4-qianggui.song@amlogic.com	2020-01-20 19:10:05 +00:00
Qianggui Song	e2514165f3	irqchip/meson-gpio: Rework meson irqchip driver to support meson-A1 SoCs Since Meson-A1 SoCs register layout of gpio interrupt controller has difference with previous chips, registers to decide irq line and offset of trigger method are all changed, the current driver should be modified. Signed-off-by: Qianggui Song <qianggui.song@amlogic.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191216123645.10099-3-qianggui.song@amlogic.com	2020-01-20 19:10:05 +00:00
Qianggui Song	fd6765b4c9	dt-bindings: interrupt-controller: New binding for Meson-A1 SoCs Update dt-binding document for GPIO interrupt controller of Meson-A1 SoCs Signed-off-by: Qianggui Song <qianggui.song@amlogic.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20191216123645.10099-2-qianggui.song@amlogic.com	2020-01-20 19:10:05 +00:00
John Garry	d6152e6ec9	irqchip/mbigen: Set driver .suppress_bind_attrs to avoid remove problems The following crash can be seen for setting CONFIG_DEBUG_TEST_DRIVER_REMOVE=y for DT FW (which some people still use): Hisilicon MBIGEN-V2 60080000.interrupt-controller: Failed to create mbi-gen irqdomain Hisilicon MBIGEN-V2: probe of 60080000.interrupt-controller failed with error -12 [...] Unable to handle kernel paging request at virtual address 0000000000005008 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000041fb9990000 [0000000000005008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc6-00002-g3fc42638a506-dirty #1622 Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018 pstate: 40000085 (nZcv daIf -PAN -UAO) pc : mbigen_set_type+0x38/0x60 lr : __irq_set_trigger+0x6c/0x188 sp : ffff800014b4b400 x29: ffff800014b4b400 x28: 0000000000000007 x27: 0000000000000000 x26: 0000000000000000 x25: ffff041fd83bd0d4 x24: ffff041fd83bd188 x23: 0000000000000000 x22: ffff80001193ce00 x21: 0000000000000004 x20: 0000000000000000 x19: ffff041fd83bd000 x18: ffffffffffffffff x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000119098c8 x14: ffff041fb94ec91c x13: ffff041fb94ec1a1 x12: 0000000000000030 x11: 0101010101010101 x10: 0000000000000040 x9 : 0000000000000000 x8 : ffff041fb98c6680 x7 : ffff800014b4b380 x6 : ffff041fd81636c8 x5 : 0000000000000000 x4 : 000000000000025f x3 : 0000000000005000 x2 : 0000000000005008 x1 : 0000000000000004 x0 : 0000000080000000 Call trace: mbigen_set_type+0x38/0x60 __setup_irq+0x744/0x900 request_threaded_irq+0xe0/0x198 pcie_pme_probe+0x98/0x118 pcie_port_probe_service+0x38/0x78 really_probe+0xa0/0x3e0 driver_probe_device+0x58/0x100 __device_attach_driver+0x90/0xb0 bus_for_each_drv+0x64/0xc8 __device_attach+0xd8/0x138 device_initial_probe+0x10/0x18 bus_probe_device+0x90/0x98 device_add+0x4c4/0x770 device_register+0x1c/0x28 pcie_port_device_register+0x1e4/0x4f0 pcie_portdrv_probe+0x34/0xd8 local_pci_probe+0x3c/0xa0 pci_device_probe+0x128/0x1c0 really_probe+0xa0/0x3e0 driver_probe_device+0x58/0x100 __device_attach_driver+0x90/0xb0 bus_for_each_drv+0x64/0xc8 __device_attach+0xd8/0x138 device_attach+0x10/0x18 pci_bus_add_device+0x4c/0xb8 pci_bus_add_devices+0x38/0x88 pci_host_probe+0x3c/0xc0 pci_host_common_probe+0xf0/0x208 hisi_pcie_almost_ecam_probe+0x24/0x30 platform_drv_probe+0x50/0xa0 really_probe+0xa0/0x3e0 driver_probe_device+0x58/0x100 device_driver_attach+0x6c/0x90 __driver_attach+0x84/0xc8 bus_for_each_dev+0x74/0xc8 driver_attach+0x20/0x28 bus_add_driver+0x148/0x1f0 driver_register+0x60/0x110 __platform_driver_register+0x40/0x48 hisi_pcie_almost_ecam_driver_init+0x1c/0x24 The specific problem here is that the mbigen driver real probe has failed as the mbigen_of_create_domain()->of_platform_device_create() call fails, the reason for that being that we never destroyed the platform device created during the remove test dry run and there is some conflict. Since we generally would never want to unbind this driver, and to save adding a driver tear down path for that, just set the driver .suppress_bind_attrs member to avoid this possibility. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/1579196323-180137-1-git-send-email-john.garry@huawei.com	2020-01-20 19:10:04 +00:00
Eddie James	04f605906f	irqchip: Add Aspeed SCU interrupt controller The Aspeed SOCs provide some interrupts through the System Control Unit registers. Add an interrupt controller that provides these interrupts to the system. Signed-off-by: Eddie James <eajames@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lore.kernel.org/r/1579123790-6894-3-git-send-email-eajames@linux.ibm.com	2020-01-20 19:10:04 +00:00
Eddie James	5350a237b4	dt-bindings: interrupt-controller: Add Aspeed SCU interrupt controller Document the Aspeed SCU interrupt controller and add an include file for the interrupts it provides. Signed-off-by: Eddie James <eajames@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lore.kernel.org/r/1579123790-6894-2-git-send-email-eajames@linux.ibm.com	2020-01-20 19:10:03 +00:00
Yash Shah	96868dce64	gpio/sifive: Add GPIO driver for SiFive SoCs Adds the GPIO driver for SiFive RISC-V SoCs. Signed-off-by: Wesley W. Terpstra <wesley@sifive.com> [Atish: Various fixes and code cleanup] Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Yash Shah <yash.shah@sifive.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1575976274-13487-6-git-send-email-yash.shah@sifive.com	2020-01-20 19:10:03 +00:00
Enric Balletbo i Serra	3d7610e8da	regulator: core: Fix exported symbols to the exported GPL version Change the exported symbols introduced by commit `e915331149` ("regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage") from EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL(), like is used for all the core parts. Fixes: `e915331149` ("regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage") Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Link: https://lore.kernel.org/r/20200120123921.1204339-1-enric.balletbo@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>	2020-01-20 19:09:48 +00:00
Gustavo A. R. Silva	c878465715	remoteproc: use struct_size() helper One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct fw_rsc_vdev { ... struct fw_rsc_vdev_vring vring[0]; } __packed; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(rsc) + rsc->num_of_vrings sizeof(struct fw_rsc_vdev_vring) with: struct_size(rsc, vring, rsc->num_of_vrings) This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20190830151406.GA23274@embeddedor Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>	2020-01-20 10:50:54 -08:00
Eric Biggers	50d9fad73a	ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted() There's no need for the ubifs_crypt_is_encrypted() function anymore. Just use IS_ENCRYPTED() instead, like ext4 and f2fs do. IS_ENCRYPTED() checks the VFS-level flag instead of the UBIFS-specific flag, but it shouldn't change any behavior since the flags are kept in sync. Link: https://lore.kernel.org/r/20191209212721.244396-1-ebiggers@kernel.org Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-01-20 10:43:46 -08:00
Brandon Maier	a8f40111d1	remoteproc: Initialize rproc_class before use The remoteproc_core and remoteproc drivers all initialize with module_init(). However remoteproc drivers need the rproc_class during their probe. If one of the remoteproc drivers runs init and gets through probe before remoteproc_init() runs, a NULL pointer access of rproc_class's `glue_dirs` spinlock occurs. > Unable to handle kernel NULL pointer dereference at virtual address 000000dc > pgd = c0004000 > [000000dc] *pgd=00000000 > Internal error: Oops: 5 [#1] PREEMPT ARM > Modules linked in: > CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.14.106-rt56 #1 > Hardware name: Generic OMAP36xx (Flattened Device Tree) > task: c6050000 task.stack: c604a000 > PC is at rt_spin_lock+0x40/0x6c > LR is at rt_spin_lock+0x28/0x6c > pc : [<c0523c90>] lr : [<c0523c78>] psr: 60000013 > sp : c604bdc0 ip : 00000000 fp : 00000000 > r10: 00000000 r9 : c61c7c10 r8 : c6269c20 > r7 : c0905888 r6 : c6269c20 r5 : 00000000 r4 : 000000d4 > r3 : 000000dc r2 : c6050000 r1 : 00000002 r0 : 000000d4 > Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none ... > [<c0523c90>] (rt_spin_lock) from [<c03b65a4>] (get_device_parent+0x54/0x17c) > [<c03b65a4>] (get_device_parent) from [<c03b6bec>] (device_add+0xe0/0x5b4) > [<c03b6bec>] (device_add) from [<c042adf4>] (rproc_add+0x18/0xd8) > [<c042adf4>] (rproc_add) from [<c01110e4>] (my_rproc_probe+0x158/0x204) > [<c01110e4>] (my_rproc_probe) from [<c03bb6b8>] (platform_drv_probe+0x34/0x70) > [<c03bb6b8>] (platform_drv_probe) from [<c03b9dd4>] (driver_probe_device+0x2c8/0x420) > [<c03b9dd4>] (driver_probe_device) from [<c03ba02c>] (__driver_attach+0x100/0x11c) > [<c03ba02c>] (__driver_attach) from [<c03b7d08>] (bus_for_each_dev+0x7c/0xc0) > [<c03b7d08>] (bus_for_each_dev) from [<c03b910c>] (bus_add_driver+0x1cc/0x264) > [<c03b910c>] (bus_add_driver) from [<c03ba714>] (driver_register+0x78/0xf8) > [<c03ba714>] (driver_register) from [<c010181c>] (do_one_initcall+0x100/0x190) > [<c010181c>] (do_one_initcall) from [<c0800de8>] (kernel_init_freeable+0x130/0x1d0) > [<c0800de8>] (kernel_init_freeable) from [<c051eee8>] (kernel_init+0x8/0x114) > [<c051eee8>] (kernel_init) from [<c01175b0>] (ret_from_fork+0x14/0x24) > Code: e2843008 e3c2203f f5d3f000 e5922010 (e193cf9f) > ---[ end trace 0000000000000002 ]--- Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com> Link: https://lore.kernel.org/r/20190530225223.136420-1-brandon.maier@rockwellcollins.com Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>	2020-01-20 10:41:34 -08:00
Pi-Hsun Shih	7017996951	rpmsg: add rpmsg support for mt8183 SCP. Add a simple rpmsg support for mt8183 SCP, that use IPI / IPC directly. Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Link: https://lore.kernel.org/r/20191112110330.179649-4-pihsun@chromium.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>	2020-01-20 10:29:56 -08:00
Erin Lo	63c13d61ea	remoteproc/mediatek: add SCP support for mt8183 Provide a basic driver to control Cortex M4 co-processor Signed-off-by: Erin Lo <erin.lo@mediatek.com> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Link: https://lore.kernel.org/r/20191112110330.179649-3-pihsun@chromium.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>	2020-01-20 10:29:54 -08:00
Erin Lo	e47e98877b	dt-bindings: Add a binding for Mediatek SCP Add a DT binding documentation of SCP for the MT8183 SoC from Mediatek. Signed-off-by: Erin Lo <erin.lo@mediatek.com> Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20191112110330.179649-2-pihsun@chromium.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>	2020-01-20 10:29:51 -08:00
Kadlecsik József	32c72165db	netfilter: ipset: use bitmap infrastructure completely The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them. Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-20 17:41:45 +01:00
Xiaochen Shen	32ada3b9e0	x86/resctrl: Clean up unused function parameter in mkdir path Commit `334b0f4e9b` ("x86/resctrl: Fix a deadlock due to inaccurate reference") changed the argument to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() within mkdir_rdt_prepare(). That change resulted in an unused function parameter to mkdir_rdt_prepare(). Clean up the unused function parameter in mkdir_rdt_prepare() and its callers rdtgroup_mkdir_mon() and rdtgroup_mkdir_ctrl_mon(). Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1578500886-21771-5-git-send-email-xiaochen.shen@intel.com	2020-01-20 17:00:41 +01:00
Jessica Yu	708e0ada19	module: avoid setting info->name early in case we can fall back to info->mod->name In setup_load_info(), info->name (which contains the name of the module, mostly used for early logging purposes before the module gets set up) gets unconditionally assigned if .modinfo is missing despite the fact that there is an if (!info->name) check near the end of the function. Avoid assigning a placeholder string to info->name if .modinfo doesn't exist, so that we can fall back to info->mod->name later on. Fixes: `5fdc7db644` ("module: setup load info before module_sig_check()") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jessica Yu <jeyu@kernel.org>	2020-01-20 16:59:39 +01:00
Xiaochen Shen	334b0f4e9b	x86/resctrl: Fix a deadlock due to inaccurate reference There is a race condition which results in a deadlock when rmdir and mkdir execute concurrently: $ ls /sys/fs/resctrl/c1/mon_groups/m1/ cpus cpus_list mon_data tasks Thread 1: rmdir /sys/fs/resctrl/c1 Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1 3 locks held by mkdir/48649: #0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50 #1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170 #2: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70 4 locks held by rmdir/48652: #0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50 #1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0 #2: (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120 #3: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70 Thread 1 is deleting control group "c1". Holding rdtgroup_mutex, kernfs_remove() removes all kernfs nodes under directory "c1" recursively, then waits for sub kernfs node "mon_groups" to drop active reference. Thread 2 is trying to create a subdirectory "m1" in the "mon_groups" directory. The wrapper kernfs_iop_mkdir() takes an active reference to the "mon_groups" directory but the code drops the active reference to the parent directory "c1" instead. As a result, Thread 1 is blocked on waiting for active reference to drop and never release rdtgroup_mutex, while Thread 2 is also blocked on trying to get rdtgroup_mutex. Thread 1 (rdtgroup_rmdir) Thread 2 (rdtgroup_mkdir) (rmdir /sys/fs/resctrl/c1) (mkdir /sys/fs/resctrl/c1/mon_groups/m1) ------------------------- ------------------------- kernfs_iop_mkdir /* * kn: "m1", parent_kn: "mon_groups", * prgrp_kn: parent_kn->parent: "c1", * * "mon_groups", parent_kn->active++: 1 / kernfs_get_active(parent_kn) kernfs_iop_rmdir / "c1", kn->active++ / kernfs_get_active(kn) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) / "c1", kn->active-- / kernfs_break_active_protection(kn) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp sentry->flags = RDT_DELETED rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED kernfs_get(kn) kernfs_remove(rdtgrp->kn) __kernfs_remove / "mon_groups", sub_kn / atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active) kernfs_drain(sub_kn) / * sub_kn->active == KN_DEACTIVATED_BIAS + 1, * waiting on sub_kn->active to drop, but it * never drops in Thread 2 which is blocked * on getting rdtgroup_mutex. / Thread 1 hangs here ----> wait_event(sub_kn->active == KN_DEACTIVATED_BIAS) ... rdtgroup_mkdir rdtgroup_mkdir_mon(parent_kn, prgrp_kn) mkdir_rdt_prepare(parent_kn, prgrp_kn) rdtgroup_kn_lock_live(prgrp_kn) atomic_inc(&rdtgrp->waitcount) / * "c1", prgrp_kn->active-- * * The active reference on "c1" is * dropped, but not matching the * actual active reference taken * on "mon_groups", thus causing * Thread 1 to wait forever while * holding rdtgroup_mutex. / kernfs_break_active_protection( prgrp_kn) / * Trying to get rdtgroup_mutex * which is held by Thread 1. */ Thread 2 hangs here ----> mutex_lock ... The problem is that the creation of a subdirectory in the "mon_groups" directory incorrectly releases the active protection of its parent directory instead of itself before it starts waiting for rdtgroup_mutex. This is triggered by the rdtgroup_mkdir() flow calling rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the parent control group ("c1") as argument. It should be called with kernfs node "mon_groups" instead. What is currently missing is that the kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp. Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() instead. And then it operates on the same rdtgroup structure but handles the active reference of kernfs node "mon_groups" to prevent deadlock. The same changes are also made to the "mon_data" directories. This results in some unused function parameters that will be cleaned up in follow-up patch as the focus here is on the fix only in support of backporting efforts. Fixes: `c7d9aac613` ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com	2020-01-20 16:57:53 +01:00
Xiaochen Shen	074fadee59	x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup There is a race condition in the following scenario which results in an use-after-free issue when reading a monitoring file and deleting the parent ctrl_mon group concurrently: Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls kernfs_break_active_protection() to drop the active reference of kernfs node in rdtgroup_kn_lock_live(). In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub kernfs nodes to drop the active reference when removing all subtree kernfs nodes recursively. Thread 2 could block on kernfs_remove() until Thread 1 calls kernfs_break_active_protection(). Only after kernfs_remove() completes the refcount of rdtgrp could be trusted. Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(), Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0 instead of 1 due to the race. In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_mondata_show) Thread 2 (rdtgroup_rmdir) -------------------------------- ------------------------- rdtgroup_kn_lock_live /* * kn active protection until * kernfs_break_active_protection(kn) / rdtgrp = kernfs_to_rdtgroup(kn) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp / * sentry->waitcount should be 1 * but is 0 now due to the race. / kfree(sentry)[1] /* * Only after kernfs_remove() * completes, the refcount of * rdtgrp could be trusted. / atomic_inc(&rdtgrp->waitcount) / kn->active-- / kernfs_break_active_protection(kn) rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED / * Blocking routine, wait for * all sub kernfs nodes to drop * active reference in * kernfs_break_active_protection. / kernfs_remove(rdtgrp->kn) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test( &rdtgrp->waitcount) && (flags & RDT_DELETED) kernfs_unbreak_active_protection(kn) kfree(rdtgrp) mutex_lock mon_event_read rdtgroup_kn_unlock mutex_unlock / * Use-after-free: refer to earlier rdtgrp * memory which was freed in [1]. / atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) / kn->active++ */ kernfs_unbreak_active_protection(kn) kfree(rdtgrp) Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp. Fixes: `f3cbeacaa0` ("x86/intel_rdt/cqm: Add rmdir support") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-3-git-send-email-xiaochen.shen@intel.com	2020-01-20 16:56:11 +01:00
Xiaochen Shen	b8511ccc75	x86/resctrl: Fix use-after-free when deleting resource groups A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount) that indicates how many waiters expect this rdtgrp to exist. Waiters could be waiting on rdtgroup_mutex or some work sitting on a task's workqueue for when the task returns from kernel mode or exits. The deletion of a rdtgrp is intended to have two phases: (1) while holding rdtgroup_mutex the necessary cleanup is done and rdtgrp->flags is set to RDT_DELETED, (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed only if there are no waiters and its flag is set to RDT_DELETED. Upon gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check for the RDT_DELETED flag. When unmounting the resctrl file system or deleting ctrl_mon groups, all of the subdirectories are removed and the data structure of rdtgrp is forcibly freed without checking rdtgrp->waitcount. If at this point there was a waiter on rdtgrp then a use-after-free issue occurs when the waiter starts running and accesses the rdtgrp structure it was waiting on. See kfree() calls in [1], [2] and [3] in these two call paths in following scenarios: (1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp() There are several scenarios that result in use-after-free issue in following: Scenario 1: ----------- In Thread 1, rdtgroup_tasks_write() adds a task_work callback move_myself(). If move_myself() is scheduled to execute after Thread 2 rdt_kill_sb() is finished, referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_tasks_write) Thread 2 (rdt_kill_sb) ------------------------------- ---------------------- rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_move_task __rdtgroup_move_task /* * Take an extra refcount, so rdtgrp cannot be freed * before the call back move_myself has been invoked / atomic_inc(&rdtgrp->waitcount) / Callback move_myself will be scheduled for later / task_work_add(move_myself) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) mutex_lock rmdir_all_sub / * sentry and rdtgrp are freed * without checking refcount / free_all_child_rdtgrp kfree(sentry)[1] kfree(rdtgrp)[2] mutex_unlock / * Callback is scheduled to execute * after rdt_kill_sb is finished / move_myself / * Use-after-free: refer to earlier rdtgrp * memory which was freed in [1] or [2]. / atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) Scenario 2: ----------- In Thread 1, rdtgroup_tasks_write() adds a task_work callback move_myself(). If move_myself() is scheduled to execute after Thread 2 rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_tasks_write) Thread 2 (rdtgroup_rmdir) ------------------------------- ------------------------- rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_move_task __rdtgroup_move_task / * Take an extra refcount, so rdtgrp cannot be freed * before the call back move_myself has been invoked / atomic_inc(&rdtgrp->waitcount) / Callback move_myself will be scheduled for later / task_work_add(move_myself) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp / * sentry is freed without * checking refcount / kfree(sentry)[3] rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test( &rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) /* * Callback is scheduled to execute * after rdt_kill_sb is finished / move_myself / * Use-after-free: refer to earlier rdtgrp * memory which was freed in [3]. */ atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed like following. Note that "0x6b" is POISON_FREE after kfree(). The corrupted bits "0x6a", "0x64" at offset 0x424 correspond to waitcount member of struct rdtgroup which was freed: Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048 420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkjkkkkkkkkkkk Single bit error detected. Probably bad RAM. Run memtest86+ or a similar memory test tool. Next obj: start=ffff9504c5b0d800, len=2048 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048 420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkdkkkkkkkkkkk Prev obj: start=ffff9504c58ab000, len=2048 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Fix this by taking reference count (waitcount) of rdtgrp into account in the two call paths that currently do not do so. Instead of always freeing the resource group it will only be freed if there are no waiters on it. If there are waiters, the resource group will have its flags set to RDT_DELETED. It will be left to the waiter to free the resource group when it starts running and finding that it was the last waiter and the resource group has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp() Fixes: `f3cbeacaa0` ("x86/intel_rdt/cqm: Add rmdir support") Fixes: `60cf5e101f` ("x86/intel_rdt: Add mkdir to resctrl file system") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com	2020-01-20 16:45:43 +01:00
Anand Jain	a69976bc69	btrfs: device stats, log when stats are zeroed We had a report indicating that some read errors aren't reported by the device stats in the userland. It is important to have the errors reported in the device stat as user land scripts might depend on it to take the reasonable corrective actions. But to debug these issue we need to be really sure that request to reset the device stat did not come from the userland itself. So log an info message when device error reset happens. For example: BTRFS info (device sdc): device stats zeroed by btrfs(9223) Reported-by: philip@philip-seeger.de Link: https://www.spinics.net/lists/linux-btrfs/msg96528.html Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:02 +01:00
Josef Bacik	556755a8a9	btrfs: fix improper setting of scanned for range cyclic write cache pages We noticed that we were having regular CG OOM kills in cases where there was still enough dirty pages to avoid OOM'ing. It turned out there's this corner case in btrfs's handling of range_cyclic where files that were being redirtied were not getting fully written out because of how we do range_cyclic writeback. We unconditionally were setting scanned = 1; the first time we found any pages in the inode. This isn't actually what we want, we want it to be set if we've scanned the entire file. For range_cyclic we could be starting in the middle or towards the end of the file, so we could write one page and then not write any of the other dirty pages in the file because we set scanned = 1. Fix this by not setting scanned = 1 if we find pages. The rules for setting scanned should be 1) !range_cyclic. In this case we have a specified range to write out. 2) range_cyclic && index == 0. In this case we've started at the beginning and there is no need to loop around a second time. 3) range_cyclic && we started at index > 0 and we've reached the end of the file without satisfying our nr_to_write. This patch fixes both of our writepages implementations to make sure these rules hold true. This fixed our over zealous CG OOMs in production. Fixes: `d1310b2e0c` ("Btrfs: Split the extent_map code into two parts") Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:02 +01:00
David Sterba	4babad1019	btrfs: safely advance counter when looking up bio csums Dan's smatch tool reports fs/btrfs/file-item.c:295 btrfs_lookup_bio_sums() warn: should this be 'count == -1' which points to the while (count--) loop. With count == 0 the check itself could decrement it to -1. There's a WARN_ON a few lines below that has never been seen in practice though. It turns out that the value of page_bytes_left matches the count (by sectorsize multiples). The loop never reaches the state where count would go to -1, because page_bytes_left == 0 is found first and this breaks out. For clarity, use only plain check on count (and only for positive value), decrement safely inside the loop. Any other discrepancy after the whole bio list processing should be reported by the exising WARN_ON_ONCE as well. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:01 +01:00
David Sterba	94f8c46566	btrfs: remove unused member btrfs_device::work This is a leftover from recently removed bio scheduling framework. Fixes: `ba8a9d0795` ("Btrfs: delete the entire async bio submission framework") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:01 +01:00
Johannes Thumshirn	ef0a82da81	btrfs: remove unnecessary wrapper get_alloc_profile btrfs_get_alloc_profile() is a simple wrapper over get_alloc_profile(). The only difference is btrfs_get_alloc_profile() is visible to other functions in btrfs while get_alloc_profile() is static and thus only visible to functions in block-group.c. Let's just fold get_alloc_profile() into btrfs_get_alloc_profile() to get rid of the unnecessary second function. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:01 +01:00
Dennis Zhou	81b29a3bf7	btrfs: add correction to handle -1 edge case in async discard From Dave's testing described below, it's possible to drive a file system to have bogus values of discardable_extents and _bytes. As btrfs_discard_calc_delay() is the only user of discardable_extents, we can correct here for any negative discardable_extents/discardable_bytes. The problem is not reliably reproducible. The workload that created it was based on linux git tree, switching between release tags, then everytihng deleted followed by a full rebalance. At this state the values of discardable_bytes was 16K and discardable_extents was -1, expected values 0 and 0. Repeating the workload again did not correct the bogus values so the offset seems to be stable once it happens. Reported-by: David Sterba <dsterba@suse.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:01 +01:00
Dennis Zhou	27f0afc737	btrfs: ensure removal of discardable_* in free_bitmap() Most callers of free_bitmap() only call it if bitmap_info->bytes is 0. However, there are certain cases where we may free the free space cache via __btrfs_remove_free_space_cache(). This exposes a path where free_bitmap() is called regardless. This may result in a bad accounting situation for discardable_bytes and discardable_extents. So, remove the stats and call btrfs_discard_update_discardable(). Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:01 +01:00
Dennis Zhou	f9bb615af2	btrfs: make smaller extents more likely to go into bitmaps It's less than ideal for small extents to eat into our extent budget, so force extents <= 32KB into the bitmaps save for the first handful. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:00 +01:00
Dennis Zhou	5d90c5c757	btrfs: increase the metadata allowance for the free_space_cache Currently, there is no way for the free space cache to recover from being serviced by purely bitmaps because the extent threshold is set to 0 in recalculate_thresholds() when we surpass the metadata allowance. This adds a recovery mechanism by keeping large extents out of the bitmaps and increases the metadata upper bound to 64KB. The recovery mechanism bypasses this upper bound, thus making it a soft upper bound. But, with the bypass being 1MB or greater, it shouldn't add unbounded overhead. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:00 +01:00
Dennis Zhou	dbc2a8c927	btrfs: add async discard implementation overview Give a brief overview for how async discard is implemented. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:00 +01:00
Dennis Zhou	9ddf648f9c	btrfs: keep track of discard reuse stats Keep track of how much we are discarding and how often we are reusing with async discard. The discard_*_bytes values don't need any special protection because the work item provides the single threaded access. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:00 +01:00
Dennis Zhou	5cb0724e1b	btrfs: only keep track of data extents for async discard As mentioned earlier, discarding data can be done either by issuing an explicit discard or implicitly by reusing the LBA. Metadata block_groups see much more frequent reuse due to well it being metadata. So instead of explicitly discarding metadata block_groups, just leave them be and let the latter implicit discarding be done for them. For mixed block_groups, block_groups which contain both metadata and data, we let them be as higher fragmentation is expected. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:00 +01:00
Dennis Zhou	7fe6d45e40	btrfs: have multiple discard lists Non-block group destruction discarding currently only had a single list with no minimum discard length. This can lead to caravaning more meaningful discards behind a heavily fragmented block group. This adds support for multiple lists with minimum discard lengths to prevent the caravan effect. We promote block groups back up when we exceed the BTRFS_ASYNC_DISCARD_MAX_FILTER size, currently we support only 2 lists with filters of 1MB and 32KB respectively. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:41:00 +01:00
Dennis Zhou	19b2a2c719	btrfs: make max async discard size tunable Expose max_discard_size as a tunable via sysfs and switch the current fixed maximum to the default value. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:40:59 +01:00
Dennis Zhou	4aa9ad5203	btrfs: limit max discard size for async discard Throttle the maximum size of a discard so that we can provide an upper bound for the rate of async discard. While the block layer is able to split discards into the appropriate sized discards, we want to be able to account more accurately the rate at which we are consuming NCQ slots as well as limit the upper bound of work for a discard. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:40:59 +01:00
Dennis Zhou	e93591bb6e	btrfs: add kbps discard rate limit for async discard Provide the ability to rate limit based on kbps in addition to iops as additional guides for the target discard rate. The delay used ends up being max(kbps_delay, iops_delay). Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:40:59 +01:00
Dennis Zhou	a230930084	btrfs: calculate discard delay based on number of extents An earlier patch keeps track of discardable_extents. These are undiscarded extents managed by the free space cache. Here, we will use this to dynamically calculate the discard delay interval. There are 3 rate to consider. The first is the target convergence rate, the rate to discard all discardable_extents over the BTRFS_DISCARD_TARGET_MSEC time frame. This is clamped by the lower limit, the iops limit or BTRFS_DISCARD_MIN_DELAY (1ms), and the upper limit, BTRFS_DISCARD_MAX_DELAY (1s). We reevaluate this delay every transaction commit. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:40:59 +01:00
Dennis Zhou	5dc7c10b87	btrfs: keep track of discardable_bytes for async discard Keep track of this metric so that we can understand how ahead or behind we are in discarding rate. This uses the same accounting method as discardable_extents, deltas between previous/current values and propagating them up. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:40:59 +01:00
Dennis Zhou	dfb79ddb13	btrfs: track discardable extents for async discard The number of discardable extents will serve as the rate limiting metric for how often we should discard. This keeps track of discardable extents in the free space caches by maintaining deltas and propagating them to the global count. The deltas are calculated from 2 values stored in PREV and CURR entries, then propagated up to the global discard ctl. The current counter value becomes the previous counter value after update. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-20 16:40:58 +01:00

... 60 61 62 63 64 ...

900375 commits