linux-xiaomi-chiron

Author	SHA1	Message	Date
Ramuthevar Vadivel Murugan	9227942383	phy: intel-lgm-emmc: Add support for eMMC PHY Add support for eMMC PHY on Intel's Lightning Mountain SoC. Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>	2020-01-14 10:50:19 +05:30
Ramuthevar Vadivel Murugan	5bc9991080	dt-bindings: phy: intel-emmc-phy: Add YAML schema for LGM eMMC PHY Add a YAML schema to use the host controller driver with the eMMC PHY on Intel's Lightning Mountain SoC. Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>	2020-01-14 10:50:19 +05:30
Kishon Vijay Abraham I	091876cc35	phy: ti: j721e-wiz: Add support for WIZ module present in TI J721E SoC Add support for WIZ module present in TI's J721E SoC. WIZ is a SERDES wrapper used to configure some of the input signals to the SERDES. It is used with both Sierra(16G) and Torrent(10G) SERDES. This driver configures three clock selects (pll0, pll1, dig), two divider clocks and supports resets for each of the lanes. [jsarha@ti.com: Add support for Torrent(10G) SERDES wrapper] Signed-off-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>	2020-01-14 10:50:19 +05:30
Kishon Vijay Abraham I	ad044f01c2	dt-bindings: phy: Document WIZ (SERDES wrapper) bindings Add DT binding documentation for WIZ (SERDES wrapper). WIZ is NOT a PHY but a wrapper used to configure some of the input signals to the SERDES. It is used with both Sierra(16G) and Torrent(10G) serdes. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> [jsarha@ti.com: Add separate compatible for Sierra(16G) and Torrent(10G) SERDES] Signed-off-by: Jyri Sarha <jsarha@ti.com> Reviewed-by: Rob Herring <robh@kernel.org>	2020-01-14 10:50:19 +05:30
Johan Hovold	a112adafcb	NFC: pn533: fix bulk-message timeout The driver was doing a synchronous uninterruptible bulk-transfer without using a timeout. This could lead to the driver hanging on probe due to a malfunctioning (or malicious) device until the device is physically disconnected. While sleeping in probe the driver prevents other devices connected to the same hub from being added to (or removed from) the bus. An arbitrary limit of five seconds should be more than enough. Fixes: `dbafc28955` ("NFC: pn533: don't send USB data off of the stack") Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:50:18 -08:00
Kristian Evensen	a9ff44f0e6	qmi_wwan: Add support for Quectel RM500Q RM500Q is a 5G module from Quectel, supporting both standalone and non-standalone modes. The normal Quectel quirks apply (DTR and dynamic interface numbers). Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:43:39 -08:00
Linus Torvalds	63d264fe08	Intel ID: PSIRT-TA-201910-001 CVEID: CVE-2019-14615 Summary of Vulnerability ------------------------ Insufficient control flow in certain data structures for some Intel(R) Processors with Intel Processor Graphics may allow an unauthenticated user to potentially enable information disclosure via local access Products affected: ------------------ Intel CPU’s with Gen7, Gen7.5 and Gen9 Graphics. Public Disclosure Schedule: --------------------------- Intel is pursuing a coordinated disclosure of this vulnerability. The targeted public disclosure date is January 14 2020 Mitigation Summary ------------------ This patch provides mitigation for Gen9 hardware only. Patches for Gen7 and Gen7.5 will be provided later. Note that Gen8 is not impacted due to a previously implemented workaround. The mitigation involves using an existing hardware feature to forcibly clear down all EU state at each context switch. -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJeGHjXAAoJEID/Kx9323OZezwH/iLlbczb6HW7AbloQVa7KRNL cZ4VHHXmMEQPSprxFuOS21/hVW1rKZzbjTGGI0qbm4qNT3LiK92E0dcoMs1Tp9Xd eElZpkeO36pqdxc/a256N3xrpmhiMnmk33F36k4qGpt6YUxvFUyZ50re0e3pO03j wGJ1cMIbAKJQmMC23yQdD44y1TH32fGeUQvwbLgktHAS/r1DxqyaZZq1hSpOiZdV TqhFLQAXUw2Cxy3FmF7KgcedcZfii1Rq5Gz7iQeyix3CbNM9r+1UGqsjGacDcXS9 /GxhBCSKf35pOj7ZxgtLPCCdL5mSAtvQO/E+yLx3F9axG9bzzNGkLpEsWeCshp8= =3jTf -----END PGP SIGNATURE----- Merge tag 'Intel-CVE-2019-14615' from bundle by Akeem Abodunrin. Merge Intel Gen9 graphics fix from Akeem Abodunrin: "Insufficient control flow in certain data structures for some Intel Processors with Intel Processor Graphics may allow an unauthenticated user to potentially enable information disclosure via local access This provides mitigation for Gen9 hardware. Note that Gen8 is not impacted due to a previously implemented workaround. The mitigation involves using an existing hardware feature to forcibly clear down all EU state at each context switch" * tag 'Intel-CVE-2019-14615' of emailed bundle from Akeem G Abodunrin <akeem.g.abodunrin@intel.com>: drm/i915/gen9: Clear residual context state on context switch	2020-01-13 18:40:57 -08:00
Milind Parab	fd2a89146a	net: macb: fix for fixed-link mode This patch fix the issue with fixed link. With fixed-link device opening fails due to macb_phylink_connect not handling fixed-link mode, in which case no MAC-PHY connection is needed and phylink_connect return success (0), however in current driver attempt is made to search and connect to PHY even for fixed-link. Fixes: `7897b071ac` ("net: macb: convert to phylink") Signed-off-by: Milind Parab <mparab@cadence.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:37:42 -08:00
Jakub Kicinski	76ccf5288c	Merge branch 'stmmac-ETF-support' Jose Abreu says: ==================== net: stmmac: ETF support This series adds the support for ETF scheduler in stmmac. 1) Starts adding the support by implementing Enhanced Descriptors in stmmac main core. This is needed for ETF feature in XGMAC and QoS cores. 2) Integrates the ETF logic into stmmac TC core. 3) and 4) adds the HW specific support for ETF in XGMAC and QoS cores. The IP feature is called TBS (Time Based Scheduling). 5) Enables ETF in GMAC5 IPK PCI entry for all Queues except Queue 0. 6) Adds the new TBS feature and even more information into the debugFS HW features file. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:32:31 -08:00
Jose Abreu	28c1cf73c9	net: stmmac: selftests: Add a test for TBS feature Add a new test for TBS feature which is used in ETF scheduler. In this test, we send a packet with a launch time specified as now + 500ms and check if the packet was transmitted on that time frame. Changes from v2: - Use the TBS bitfield - Remove debug message Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:49 -08:00
Jose Abreu	05373e31ba	net: stmmac: selftests: Switch to dev_direct_xmit() In the upcoming commit for TBS selftest we will need to send a packet on a specific Queue. As stmmac fallsback to netdev_pick_tx() on the select Queue callback, we need to switch all selftests logic to dev_direct_xmit() so that we can send the given SKB on a specific Queue. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:48 -08:00
Jose Abreu	44e6547570	net: stmmac: Add missing information in DebugFS capabilities file Adds more information regarding HW Capabilities in the corresponding DebugFS file. Changes from v2: - Remove the TX/RX queues in use (Jakub) Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:48 -08:00
Jose Abreu	7eadf57290	net: stmmac: pci: Enable TBS on GMAC5 IPK PCI entry Enable TBS support on GMAC5 PCI entry for all Queues except Queue 0. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:48 -08:00
Jose Abreu	58ae928140	net: stmmac: gmac4+: Add TBS support Adds all the necessary HW hooks to support TBS feature in QoS cores. Changes from v1: - Remove unneeded LT shift as the IP already does this. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:48 -08:00
Jose Abreu	6a549b9f0d	net: stmmac: xgmac: Add TBS support Adds all the necessary HW hooks to support TBS feature in XGMAC cores. Changes from v1: - Remove unneeded LT shift as the IP already does this. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:48 -08:00
Jose Abreu	430b383c73	net: stmmac: tc: Add support for ETF Scheduler using TBS Adds the support for ETF scheduler using TBS feature which is available in XGMAC and QoS IPs. Changes from v2: - Fix checkpatch issues (Jakub) - Use the TBS bitfield Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:48 -08:00
Jose Abreu	579a25a854	net: stmmac: Initial support for TBS Adds the initial hooks for TBS support. This needs a 32 byte descriptor in order for it to work with current HW. Adds all the logic for Enhanced Descriptors in main core but no HW related logic for now. Changes from v2: - Use bitfield for TBS status / support (Jakub) - Remove unneeded cache alignment (Jakub) - Fix checkpatch issues Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:31:48 -08:00
Jens Axboe	74566df3a7	io_uring: don't setup async context for read/write fixed We don't need it, and if we have it, then the retry handler will attempt to copy the non-existent iovec with the inline iovec, with a segment count that doesn't make sense. Fixes: `f67676d160` ("io_uring: ensure async punted read/write requests copy iovec") Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-01-13 19:25:29 -07:00
Chen Zhou	ab9837b5ed	amd-xgbe: remove unnecessary conversion to bool The conversion to bool is not needed, remove it. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:22:17 -08:00
Yang Shi	554913f600	mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE Commit `99cb0dbd47` ("mm,thp: add read-only THP support for (non-shmem) FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but the corresponding description for trace events were not added. Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com Fixes: `99cb0dbd47` ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Song Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:02 -08:00
Adrian Huang	2fe20210fc	mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid When booting with amd_iommu=off, the following WARNING message appears: AMD-Vi: AMD IOMMU disabled on kernel command-line ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:2772 flush_workqueue+0x42e/0x450 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc3-amd-iommu #6 Hardware name: Lenovo ThinkSystem SR655-2S/7D2WRCZ000, BIOS D8E101L-1.00 12/05/2019 RIP: 0010:flush_workqueue+0x42e/0x450 Code: ff 0f 0b e9 7a fd ff ff 4d 89 ef e9 33 fe ff ff 0f 0b e9 7f fd ff ff 0f 0b e9 bc fd ff ff 0f 0b e9 a8 fd ff ff e8 52 2c fe ff <0f> 0b 31 d2 48 c7 c6 e0 88 c5 95 48 c7 c7 d8 ad f0 95 e8 19 f5 04 Call Trace: kmem_cache_destroy+0x69/0x260 iommu_go_to_state+0x40c/0x5ab amd_iommu_prepare+0x16/0x2a irq_remapping_prepare+0x36/0x5f enable_IR_x2apic+0x21/0x172 default_setup_apic_routing+0x12/0x6f apic_intr_mode_init+0x1a1/0x1f1 x86_late_time_init+0x17/0x1c start_kernel+0x480/0x53f secondary_startup_64+0xb6/0xc0 ---[ end trace 30894107c3749449 ]--- x2apic: IRQ remapping doesn't support X2APIC mode x2apic disabled The warning is caused by the calling of 'kmem_cache_destroy()' in free_iommu_resources(). Here is the call path: free_iommu_resources kmem_cache_destroy flush_memcg_workqueue flush_workqueue The root cause is that the IOMMU subsystem runs before the workqueue subsystem, which the variable 'wq_online' is still 'false'. This leads to the statement 'if (WARN_ON(!wq_online))' in flush_workqueue() is 'true'. Since the variable 'memcg_kmem_cache_wq' is not allocated during the time, it is unnecessary to call flush_memcg_workqueue(). This prevents the WARNING message triggered by flush_workqueue(). Link: http://lkml.kernel.org/r/20200103085503.1665-1-ahuang12@lenovo.com Fixes: `92ee383f6d` ("mm: fix race between kmem_cache destroy, create and deactivate") Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reported-by: Xiaochun Lee <lixc17@lenovo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:02 -08:00
Wen Yang	0a5d1a7f64	mm/page-writeback.c: improve arithmetic divisions Use div64_ul() instead of do_div() if the divisor is unsigned long, to avoid truncation to 32-bit on 64-bit platforms. Link: http://lkml.kernel.org/r/20200102081442.8273-4-wenyang@linux.alibaba.com Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:02 -08:00
Wen Yang	d3ac946ec9	mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide The two variables 'numerator' and 'denominator', though they are declared as long, they should actually be unsigned long (according to the implementation of the fprop_fraction_percpu() function) And do_div() does a 64-by-32 division, while the divisor 'denominator' is unsigned long, thus 64-bit on 64-bit platforms. Hence the proper function to call is div64_ul(). Link: http://lkml.kernel.org/r/20200102081442.8273-3-wenyang@linux.alibaba.com Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:02 -08:00
Wen Yang	6d9e8c651d	mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() Patch series "use div64_ul() instead of div_u64() if the divisor is unsigned long". We were first inspired by commit `b0ab99e773` ("sched: Fix possible divide by zero in avg_atom () calculation"), then refer to the recently analyzed mm code, we found this suspicious place. 201 if (min) { 202 min *= this_bw; 203 do_div(min, tot_bw); 204 } And we also disassembled and confirmed it: /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 201 0xffffffff811c37da <__wb_calc_thresh+234>: xor %r10d,%r10d 0xffffffff811c37dd <__wb_calc_thresh+237>: test %rax,%rax 0xffffffff811c37e0 <__wb_calc_thresh+240>: je 0xffffffff811c3800 <__wb_calc_thresh+272> /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 202 0xffffffff811c37e2 <__wb_calc_thresh+242>: imul %r8,%rax /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 203 0xffffffff811c37e6 <__wb_calc_thresh+246>: mov %r9d,%r10d ---> truncates it to 32 bits here 0xffffffff811c37e9 <__wb_calc_thresh+249>: xor %edx,%edx 0xffffffff811c37eb <__wb_calc_thresh+251>: div %r10 0xffffffff811c37ee <__wb_calc_thresh+254>: imul %rbx,%rax 0xffffffff811c37f2 <__wb_calc_thresh+258>: shr $0x2,%rax 0xffffffff811c37f6 <__wb_calc_thresh+262>: mul %rcx 0xffffffff811c37f9 <__wb_calc_thresh+265>: shr $0x2,%rdx 0xffffffff811c37fd <__wb_calc_thresh+269>: mov %rdx,%r10 This series uses div64_ul() instead of div_u64() if the divisor is unsigned long, to avoid truncation to 32-bit on 64-bit platforms. This patch (of 3): The variables 'min' and 'max' are unsigned long and do_div truncates them to 32 bits, which means it can test non-zero and be truncated to zero for division. Fix this issue by using div64_ul() instead. Link: http://lkml.kernel.org/r/20200102081442.8273-2-wenyang@linux.alibaba.com Fixes: `693108a8a6` ("writeback: make bdi->min/max_ratio handling cgroup writeback aware") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:02 -08:00
Vlastimil Babka	8e57f8acbb	mm, debug_pagealloc: don't rely on static keys too early Commit `96a2b03f28` ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from `96a2b03f28`. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: `96a2b03f28` ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:02 -08:00
Roman Gushchin	4a87e2a25d	mm: memcg/slab: fix percpu slab vmstats flushing Currently slab percpu vmstats are flushed twice: during the memcg offlining and just before freeing the memcg structure. Each time percpu counters are summed, added to the atomic counterparts and propagated up by the cgroup tree. The second flushing is required due to how recursive vmstats are implemented: counters are batched in percpu variables on a local level, and once a percpu value is crossing some predefined threshold, it spills over to atomic values on the local and each ascendant levels. It means that without flushing some numbers cached in percpu variables will be dropped on floor each time a cgroup is destroyed. And with uptime the error on upper levels might become noticeable. The first flushing aims to make counters on ancestor levels more precise. Dying cgroups may resume in the dying state for a long time. After kmem_cache reparenting which is performed during the offlining slab counters of the dying cgroup don't have any chances to be updated, because any slab operations will be performed on the parent level. It means that the inaccuracy caused by percpu batching will not decrease up to the final destruction of the cgroup. By the original idea flushing slab counters during the offlining should minimize the visible inaccuracy of slab counters on the parent level. The problem is that percpu counters are not zeroed after the first flushing. So every cached percpu value is summed twice. It creates a small error (up to 32 pages per cpu, but usually less) which accumulates on parent cgroup level. After creating and destroying of thousands of child cgroups, slab counter on parent level can be way off the real value. For now, let's just stop flushing slab counters on memcg offlining. It can't be done correctly without scheduling a work on each cpu: reading and zeroing it during css offlining can race with an asynchronous update, which doesn't expect values to be changed underneath. With this change, slab counters on parent level will become eventually consistent. Once all dying children are gone, values are correct. And if not, the error is capped by 32 * NR_CPUS pages per dying cgroup. It's not perfect, as slab are reparented, so any updates after the reparenting will happen on the parent level. It means that if a slab page was allocated, a counter on child level was bumped, then the page was reparented and freed, the annihilation of positive and negative counter values will not happen until the child cgroup is released. It makes slab counters different from others, and it might want us to implement flushing in a correct form again. But it's also a question of performance: scheduling a work on each cpu isn't free, and it's an open question if the benefit of having more accurate counters is worth it. We might also consider flushing all counters on offlining, not only slab counters. So let's fix the main problem now: make the slab counters eventually consistent, so at least the error won't grow with uptime (or more precisely the number of created and destroyed cgroups). And think about the accuracy of counters separately. Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com Fixes: `bee07b33db` ("mm: memcontrol: flush percpu slab vmstats on kmem offlining") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:02 -08:00
Kirill A. Shutemov	991589974d	mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment Shmem/tmpfs tries to provide THP-friendly mappings if huge pages are enabled. But it doesn't work well with above-47bit hint address. Normally, the kernel doesn't create userspace mappings above 47-bit, even if the machine allows this (such as with 5-level paging on x86-64). Not all user space is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. Userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 47-bits. If the application doesn't need a particular address, but wants to allocate from whole address space it can specify -1 as a hint address. Unfortunately, this trick breaks THP alignment in shmem/tmp: shmem_get_unmapped_area() would not try to allocate PMD-aligned area if any hint address specified. This can be fixed by requesting the aligned area if the we failed to allocated at user-specified hint address. The request with inflated length will also take the user-specified hint address. This way we will not lose an allocation request from the full address space. [kirill@shutemov.name: fold in a fixup] Link: http://lkml.kernel.org/r/20191223231309.t6bh5hkbmokihpfu@box Link: http://lkml.kernel.org/r/20191220142548.7118-3-kirill.shutemov@linux.intel.com Fixes: `b569bab78d` ("x86/mm: Prepare to expose larger address space to userspace") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Willhalm, Thomas" <thomas.willhalm@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:01 -08:00
Kirill A. Shutemov	97d3d0f9a1	mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment Patch series "Fix two above-47bit hint address vs. THP bugs". The two get_unmapped_area() implementations have to be fixed to provide THP-friendly mappings if above-47bit hint address is specified. This patch (of 2): Filesystems use thp_get_unmapped_area() to provide THP-friendly mappings. For DAX in particular. Normally, the kernel doesn't create userspace mappings above 47-bit, even if the machine allows this (such as with 5-level paging on x86-64). Not all user space is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. Userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 47-bits. If the application doesn't need a particular address, but wants to allocate from whole address space it can specify -1 as a hint address. Unfortunately, this trick breaks thp_get_unmapped_area(): the function would not try to allocate PMD-aligned area if any hint address specified. Modify the routine to handle it correctly: - Try to allocate the space at the specified hint address with length padding required for PMD alignment. - If failed, retry without length padding (but with the same hint address); - If the returned address matches the hint address return it. - Otherwise, align the address as required for THP and return. The user specified hint address is passed down to get_unmapped_area() so above-47bit hint address will be taken into account without breaking alignment requirements. Link: http://lkml.kernel.org/r/20191220142548.7118-2-kirill.shutemov@linux.intel.com Fixes: `b569bab78d` ("x86/mm: Prepare to expose larger address space to userspace") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Thomas Willhalm <thomas.willhalm@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:01 -08:00
David Hildenbrand	8068df3b60	mm/memory_hotplug: don't free usage map when removing a re-added early section When we remove an early section, we don't free the usage map, as the usage maps of other sections are placed into the same page. Once the section is removed, it is no longer an early section (especially, the memmap is freed). When we re-add that section, the usage map is reused, however, it is no longer an early section. When removing that section again, we try to kfree() a usage map that was allocated during early boot - bad. Let's check against PageReserved() to see if we are dealing with an usage map that was allocated during boot. We could also check against !(PageSlab(usage_page) \|\| PageCompound(usage_page)), but PageReserved() is cleaner. Can be triggered using memtrace under ppc64/powernv: $ mount -t debugfs none /sys/kernel/debug/ $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable ------------[ cut here ]------------ kernel BUG at mm/slub.c:3969! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61 NIP kfree+0x338/0x3b0 LR section_deactivate+0x138/0x200 Call Trace: section_deactivate+0x138/0x200 __remove_pages+0x114/0x150 arch_remove_memory+0x3c/0x160 try_remove_memory+0x114/0x1a0 __remove_memory+0x20/0x40 memtrace_enable_set+0x254/0x850 simple_attr_write+0x138/0x160 full_proxy_write+0x8c/0x110 __vfs_write+0x38/0x70 vfs_write+0x11c/0x2a0 ksys_write+0x84/0x140 system_call+0x5c/0x68 ---[ end trace 4b053cbd84e0db62 ]--- The first invocation will offline+remove memory blocks. The second invocation will first add+online them again, in order to offline+remove them again (usually we are lucky and the exact same memory blocks will get "reallocated"). Tested on powernv with boot memory: The usage map will not get freed. Tested on x86-64 with DIMMs: The usage map will get freed. Using Dynamic Memory under a Power DLAPR can trigger it easily. Triggering removal (I assume after previously removed+re-added) of memory from the HMC GUI can crash the kernel with the same call trace and is fixed by this patch. Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com Fixes: `326e1b8f83` ("mm/sparsemem: introduce a SECTION_IS_EARLY flag") Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Pingfan Liu <piliu@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:01 -08:00
Vlastimil Babka	cc638f329e	mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations THP page faults now attempt a __GFP_THISNODE allocation first, which should only compact existing free memory, followed by another attempt that can allocate from any node using reclaim/compaction effort specified by global defrag setting and madvise. This patch makes the following changes to the scheme: - Before the patch, the first allocation relies on a check for pageblock order and __GFP_IO to prevent excessive reclaim. This however affects also the second attempt, which is not limited to single node. Instead of that, reuse the existing check for costly order __GFP_NORETRY allocations, and make sure the first THP attempt uses __GFP_NORETRY. As a side-effect, all costly order __GFP_NORETRY allocations will bail out if compaction needs reclaim, while previously they only bailed out when compaction was deferred due to previous failures. This should be still acceptable within the __GFP_NORETRY semantics. - Before the patch, the second allocation attempt (on all nodes) was passing __GFP_NORETRY. This is redundant as the check for pageblock order (discussed above) was stronger. It's also contrary to madvise(MADV_HUGEPAGE) which means some effort to allocate THP is requested. After this patch, the second attempt doesn't pass __GFP_THISNODE nor __GFP_NORETRY. To sum up, THP page faults now try the following attempts: 1. local node only THP allocation with no reclaim, just compaction. 2. for madvised VMA's or when synchronous compaction is enabled always - THP allocation from any node with effort determined by global defrag setting and VMA madvise 3. fallback to base pages on any node Link: http://lkml.kernel.org/r/08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz Fixes: `b39d0ee263` ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-13 18:19:01 -08:00
Jesper Dangaard Brouer	0eac8ce95b	ptr_ring: add include of linux/mm.h Commit `0bf7800f17` ("ptr_ring: try vmalloc() when kmalloc() fails") started to use kvmalloc_array and kvfree, which are defined in mm.h, the previous functions kcalloc and kfree, which are defined in slab.h. Add the missing include of linux/mm.h. This went unnoticed as other include files happened to include mm.h. Fixes: `0bf7800f17` ("ptr_ring: try vmalloc() when kmalloc() fails") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:16:43 -08:00
Lorenzo Bianconi	1657adccaa	net: mvneta: change page pool nid to NUMA_NO_NODE With 'commit `44768decb7` ("page_pool: handle page recycle for NUMA_NO_NODE condition")' we can safely change nid to NUMA_NO_NODE and accommodate future NUMA aware hardware using mvneta network interface Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-13 18:14:01 -08:00
Andrii Nakryiko	9c01546d26	tools/bpf: Add runqslower tool to tools/bpf Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches its BCC-based counterpart 1-to-1, supporting all the same parameters and functionality. runqslower tool utilizes BPF skeleton, auto-generated from BPF object file, as well as memory-mapped interface to global (read-only, in this case) data. Its Makefile also ensures auto-generation of "relocatable" vmlinux.h, which is necessary for BTF-typed raw tracepoints with direct memory access. [0] `11bf5d02c8/tools/runqslower.py` Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-6-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	1cf5b23988	bpftool: Apply preserve_access_index attribute to all types in BTF dump This patch makes structs and unions, emitted through BTF dump, automatically CO-RE-relocatable (unless disabled with `#define BPF_NO_PRESERVE_ACCESS_INDEX`, specified before including generated header file). This effectivaly turns usual bpf_probe_read() call into equivalent of bpf_core_read(), by automatically applying builtin_preserve_access_index to any field accesses of types in generated C types header. This is especially useful for tp_btf/fentry/fexit BPF program types. They allow direct memory access, so BPF C code just uses straightfoward a->b->c access pattern to read data from kernel. But without kernel structs marked as CO-RE relocatable through preserve_access_index attribute, one has to enclose all the data reads into a special __builtin_preserve_access_index code block, like so: __builtin_preserve_access_index(({ x = p->pid; /* where p is struct task_struct , for example / })); This is very inconvenient and obscures the logic quite a bit. By marking all auto-generated types with preserve_access_index attribute the above code is reduced to just a clean and natural `x = p->pid;`. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-5-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	2cc51d34d9	selftests/bpf: Conform selftests/bpf Makefile output to libbpf and bpftool Bring selftest/bpf's Makefile output to the same format used by libbpf and bpftool: 2 spaces of padding on the left + 8-character left-aligned build step identifier. Also, hide feature detection output by default. Can be enabled back by setting V=1. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-4-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	292e1d73b1	libbpf: Clean up bpf_helper_defs.h generation output bpf_helpers_doc.py script, used to generate bpf_helper_defs.h, unconditionally emits one informational message to stderr. Remove it and preserve stderr to contain only relevant errors. Also make sure script invocations command is muted by default in libbpf's Makefile. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-3-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	533420a415	tools: Sync uapi/linux/if_link.h Sync uapi/linux/if_link.h into tools to avoid out of sync warnings during libbpf build. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-2-andriin@fb.com	2020-01-13 17:48:12 -08:00
Masahiro Yamada	f26661e127	initramfs: make initramfs compression choice non-optional Currently, the choice of the initramfs compression mode is too complex because users are allowed to not specify the compression mode at all. I think it makes more sense to require users to choose the compression mode explicitly, and delete the fallback defaults of INITRAMFS_COMPRESSION. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-01-14 10:42:44 +09:00
Masahiro Yamada	ddd09bcc89	initramfs: make compression options not depend on INITRAMFS_SOURCE Even if INITRAMFS_SOURCE is empty, usr/gen_initramfs.sh generates a tiny default initramfs, which is embedded in vmlinux. So, defining INITRAMFS_COMPRESSION* options should be valid irrespective of INITRAMFS_SOURCE. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-01-14 10:42:44 +09:00
Masahiro Yamada	be1859bdc6	initramfs: remove redundant dependency on BLK_DEV_INITRD init/Kconfig includes usr/Kconfig inside the "if BLK_DEV_INITRD" ... "endif" block: if BLK_DEV_INITRD source "usr/Kconfig" endif Hence, all the defines in usr/Kconfig depend on BLK_DEV_INITRD. Remove the redundant "depends on BLK_DEV_INITRD". Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Thelen <gthelen@google.com>	2020-01-14 10:42:44 +09:00
Masahiro Yamada	80e715a06c	initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh The comments in usr/Makefile wrongly refer to the script name (twice). Line 37: # The dependency list is generated by gen_initramfs.sh -l Line 54: # 4) Arguments to gen_initramfs.sh changes There does not exist such a script. I was going to fix the comments, but after some consideration, I thought "gen_initramfs.sh" would be more suitable than "gen_initramfs_list.sh" because it generates an initramfs image in the common usage. The script generates a list that can be fed to gen_init_cpio only when it is directly run without -o or -l option. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-01-14 10:42:44 +09:00
Masahiro Yamada	a4c968e70f	gen_initramfs_list.sh: fix the tool name in the comment There is no tool named "gen_initramfs". The correct name is "gen_init_cpio". Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Thelen <gthelen@google.com>	2020-01-14 10:42:44 +09:00
Masahiro Yamada	9a04dc5298	gen_initramfs_list.sh: remove unused variable 'default_list' This is assigned, but not referenced. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Thelen <gthelen@google.com>	2020-01-14 10:42:44 +09:00
Masahiro Yamada	a2183c0437	initramfs: replace klibcdirs in Makefile with FORCE 'klibcdirs' was added by commit `d39a206bc3` ("kbuild: rebuild initramfs if content of initramfs changes"). If this is just a matter of forcing execution of the recipe line, we can replace it with FORCE. The following code is currently useless: $(deps_initramfs): klibcdirs The original intent could be a hook for the klibc integration into the kernel tree, but klibc is a separate project, which can be built independently. Clean it up. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Thelen <gthelen@google.com>	2020-01-14 10:42:44 +09:00
Michał Mirosław	9945722afd	builddeb: make headers package thinner Remove a bunch of files not used during external module builds: - foreign architecture headers - subtree Makefiles - Kconfig files - perl scripts On amd64 system this looses a third of the resulting .deb size. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-01-14 10:42:44 +09:00
Pingfan Liu	fbee6ba2dc	powerpc/pseries: Advance pfn if section is not present in lmb_is_removable() In lmb_is_removable(), if a section is not present, it should continue to test the rest of the sections in the block. But the current code fails to do so. Fixes: `51925fb3c5` ("powerpc/pseries: Implement memory hotplug remove in the kernel") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1578632042-12415-1-git-send-email-kernelfans@gmail.com	2020-01-14 11:41:56 +10:00
Sukadev Bhattiprolu	c2a20711fc	powerpc/xmon: don't access ASDR in VMs ASDR is HV-privileged and must only be accessed in HV-mode. Fixes a Program Check (0x700) when xmon in a VM dumps SPRs. Fixes: `d1e1b351f5` ("powerpc/xmon: Add ISA v3.0 SPRs to SPR dump") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200107021633.GB29843@us.ibm.com	2020-01-14 11:04:08 +10:00
Jens Axboe	7454049eb7	Merge branch 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.6/drivers Pull MD changes from Song. * 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid1: introduce wait_for_serialization md/raid1: use bucket based mechanism for IO serialization md: introduce a new struct for IO serialization md: don't destroy serial_info_pool if serialize_policy is true raid1: serialize the overlap write md: reorgnize mddev_create/destroy_serial_pool md: add serialize_policy sysfs node for raid1 md: prepare for enable raid1 io serialization md: fix a typo s/creat/create md: rename wb stuffs raid5: remove worker_cnt_per_group argument from alloc_thread_groups md/raid6: fix algorithm choice under larger PAGE_SIZE raid6/test: fix a compilation warning raid6/test: fix a compilation error md-bitmap: small cleanups	2020-01-13 17:27:12 -07:00
Dan Carpenter	eb368de6de	power: supply: sbs-battery: Fix a signedness bug in sbs_get_battery_capacity() The "mode" variable is an enum and in this context GCC treats it as an unsigned int so the error handling is never triggered. Fixes: `51d0756604` ("bq20z75: Add support for charge properties") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>	2020-01-14 01:26:14 +01:00
Sven Van Asbroeck	a60ec78d30	power: supply: ltc2941-battery-gauge: fix use-after-free This driver's remove path calls cancel_delayed_work(). However, that function does not wait until the work function finishes. This could mean that the work function is still running after the driver's remove function has finished, which would result in a use-after-free. Fix by calling cancel_delayed_work_sync(), which ensures that that the work is properly cancelled, no longer running, and unable to re-schedule itself. This issue was detected with the help of Coccinelle. Cc: stable <stable@vger.kernel.org> Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>	2020-01-14 01:23:20 +01:00

... 98 99 100 101 102 ...

900375 commits