Commit graph

1105317 commits

Author SHA1 Message Date
Yosry Ahmed
a3622a53e6 selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
Currently, alloc_anon_noexit() calls alloc_anon() which instantly frees
the allocated memory. alloc_anon_noexit() is usually used with
cg_run_nowait() to run a process in the background that allocates
memory. It makes sense for the background process to keep the memory
allocated and not instantly free it (otherwise there is no point of
running it in the background).

Link: https://lkml.kernel.org/r/20220425190040.2475377-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:59 -07:00
Yosry Ahmed
6c26df84e1 selftests: cgroup: return -errno from cg_read()/cg_write() on failure
Currently, cg_read()/cg_write() returns 0 on success and -1 on failure.
Modify them to return the -errno on failure.

Link: https://lkml.kernel.org/r/20220425190040.2475377-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:59 -07:00
Shakeel Butt
94968384dd memcg: introduce per-memcg reclaim interface
This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.


This patch (of 4):

Introduce a memcg interface to trigger memory reclaim on a memory cgroup.

Use case: Proactive Reclaim
---------------------------

A userspace proactive reclaimer can continuously probe the memcg to
reclaim a small amount of memory.  This gives more accurate and up-to-date
workingset estimation as the LRUs are continuously sorted and can
potentially provide more deterministic memory overcommit behavior.  The
memory overcommit controller can provide more proactive response to the
changing behavior of the running applications instead of being reactive.

A userspace reclaimer's purpose in this case is not a complete replacement
for kswapd or direct reclaim, it is to proactively identify memory savings
opportunities and reclaim some amount of cold pages set by the policy to
free up the memory for more demanding jobs or scheduling new jobs.

A user space proactive reclaimer is used in Google data centers. 
Additionally, Meta's TMO paper recently referenced a very similar
interface used for user space proactive reclaim:
https://dl.acm.org/doi/pdf/10.1145/3503222.3507731

Benefits of a user space reclaimer:
-----------------------------------

1) More flexible on who should be charged for the cpu of the memory
   reclaim.  For proactive reclaim, it makes more sense to be centralized.

2) More flexible on dedicating the resources (like cpu).  The memory
   overcommit controller can balance the cost between the cpu usage and
   the memory reclaimed.

3) Provides a way to the applications to keep their LRUs sorted, so,
   under memory pressure better reclaim candidates are selected.  This
   also gives more accurate and uptodate notion of working set for an
   application.

Why memory.high is not enough?
------------------------------

- memory.high can be used to trigger reclaim in a memcg and can
  potentially be used for proactive reclaim.  However there is a big
  downside in using memory.high.  It can potentially introduce high
  reclaim stalls in the target application as the allocations from the
  processes or the threads of the application can hit the temporary
  memory.high limit.

- Userspace proactive reclaimers usually use feedback loops to decide
  how much memory to proactively reclaim from a workload.  The metrics
  used for this are usually either refaults or PSI, and these metrics will
  become messy if the application gets throttled by hitting the high
  limit.

- memory.high is a stateful interface, if the userspace proactive
  reclaimer crashes for any reason while triggering reclaim it can leave
  the application in a bad state.

- If a workload is rapidly expanding, setting memory.high to proactively
  reclaim memory can result in actually reclaiming more memory than
  intended.

The benefits of such interface and shortcomings of existing interface were
further discussed in this RFC thread:
https://lore.kernel.org/linux-mm/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/

Interface:
----------

Introducing a very simple memcg interface 'echo 10M > memory.reclaim' to
trigger reclaim in the target memory cgroup.

The interface is introduced as a nested-keyed file to allow for future
optional arguments to be easily added to configure the behavior of
reclaim.

Possible Extensions:
--------------------

- This interface can be extended with an additional parameter or flags
  to allow specifying one or more types of memory to reclaim from (e.g.
  file, anon, ..).

- The interface can also be extended with a node mask to reclaim from
  specific nodes. This has use cases for reclaim-based demotion in memory
  tiering systens.

- A similar per-node interface can also be added to support proactive
  reclaim and reclaim-based demotion in systems without memcg.

- Add a timeout parameter to make it easier for user space to call the
  interface without worrying about being blocked for an undefined amount
  of time.

For now, let's keep things simple by adding the basic functionality.

[yosryahmed@google.com: worked on versions v2 onwards, refreshed to
current master, updated commit message based on recent
discussions and use cases]
Link: https://lkml.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220425190040.2475377-2-yosryahmed@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Co-developed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Wei Xu <weixugc@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:59 -07:00
Brian Geffon
30226b69f8 zram: add a huge_idle writeback mode
Today it's only possible to write back as a page, idle, or huge.  A user
might want to writeback pages which are huge and idle first as these idle
pages do not require decompression and make a good first pass for
writeback.

Idle writeback specifically has the advantage that a refault is unlikely
given that the page has been swapped for some amount of time without being
refaulted.

Huge writeback has the advantage that you're guaranteed to get the maximum
benefit from a single page writeback, that is, you're reclaiming one full
page of memory.  Pages which are compressed in zram being written back
result in some benefit which is always less than a page size because of
the fact that it was compressed.

The primary use of this is for minimizing refaults in situations where the
device has to be sensitive to storage endurance.  On ChromeOS we have
devices with slow eMMC and repeated writes and refaults can negatively
affect performance and endurance.

Link: https://lkml.kernel.org/r/20220322215821.1196994-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:59 -07:00
Chen Wandun
d137a7cb9b mm/page_alloc: simplify update of pgdat in wake_all_kswapds
There is no need to update last_pgdat for each zone, only update
last_pgdat when iterating the first zone of a node.

Link: https://lkml.kernel.org/r/20220322115635.2708989-1-chenwandun@huawei.com
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:59 -07:00
Andrey Konovalov
ec2a0f9c8b kasan: mark KASAN_VMALLOC flags as kasan_vmalloc_flags_t
Fix sparse warning:

mm/kasan/shadow.c:496:15: warning: restricted kasan_vmalloc_flags_t degrades to integer

Link: https://lkml.kernel.org/r/52d8fccdd3a48d4bdfd0ff522553bac2a13f1579.1649351254.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:58 -07:00
Zqiang
07d067e4f2 kasan: fix sleeping function called from invalid context on RT kernel
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
...........
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x8c
dump_stack+0x10/0x12
 __might_resched.cold+0x13b/0x173
rt_spin_lock+0x5b/0xf0
 ___cache_free+0xa5/0x180
qlist_free_all+0x7a/0x160
per_cpu_remove_cache+0x5f/0x70
smp_call_function_many_cond+0x4c4/0x4f0
on_each_cpu_cond_mask+0x49/0xc0
kasan_quarantine_remove_cache+0x54/0xf0
kasan_cache_shrink+0x9/0x10
kmem_cache_shrink+0x13/0x20
acpi_os_purge_cache+0xe/0x20
acpi_purge_cached_objects+0x21/0x6d
acpi_initialize_objects+0x15/0x3b
acpi_init+0x130/0x5ba
do_one_initcall+0xe5/0x5b0
kernel_init_freeable+0x34f/0x3ad
kernel_init+0x1e/0x140
ret_from_fork+0x22/0x30

When the kmem_cache_shrink() was called, the IPI was triggered, the
___cache_free() is called in IPI interrupt context, the local-lock or
spin-lock will be acquired.  On PREEMPT_RT kernel, these locks are
replaced with sleepbale rt-spinlock, so the above problem is triggered. 
Fix it by moving the qlist_free_allfrom() from IPI interrupt context to
task context when PREEMPT_RT is enabled.

[akpm@linux-foundation.org: reduce ifdeffery]
Link: https://lkml.kernel.org/r/20220401134649.2222485-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:58 -07:00
Baolin Wang
9c8bbfaca1 mm: hugetlb: add missing cache flushing in hugetlb_unshare_all_pmds()
Missed calling flush_cache_range() before removing the sharing PMD
entrires, otherwise data consistence issue may be occurred on some
architectures whose caches are strict and require a virtual>physical
translation to exist for a virtual address.  Thus add it.

Now no architectures enabling PMD sharing will be affected, since they do
not have a VIVT cache.  That means this issue can not be happened in
practice so far.

Link: https://lkml.kernel.org/r/47441086affcabb6ecbe403173e9283b0d904b38.1650956489.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/419b0e777c9e6d1454dcd906e0f5b752a736d335.1650781755.git.baolin.wang@linux.alibaba.com
Fixes: 6dfeaff93b ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:58 -07:00
xu xin
25fa414ada mm/khugepaged: use vma_is_anonymous
Clean up the vma->vm_ops usage.  Use vma_is_anonymous instead of
vma->vm_ops to make it more understandable.

Link: https://lkml.kernel.org/r/20220424071642.3234971-1-xu.xin16@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:58 -07:00
Peng Liu
30a514002d mm: use for_each_online_node and node_online instead of open coding
Use more generic functions to deal with issues related to online nodes. 
The changes will make the code simplified.

Link: https://lkml.kernel.org/r/20220429030218.644635-1-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Suggested-by: Davidlohr Bueso <dave@stgolabs.net>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:58 -07:00
Peng Liu
f81f6e4b5e hugetlb: fix return value of __setup handlers
When __setup() return '0', using invalid option values causes the entire
kernel boot option string to be reported as Unknown.  Hugetlb calls
__setup() and will return '0' when set invalid parameter string.

The following phenomenon is observed:
 cmdline:
  hugepagesz=1Y hugepages=1
 dmesg:
  HugeTLB: unsupported hugepagesz=1Y
  HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
  Unknown kernel command line parameters "hugepagesz=1Y hugepages=1"

Since hugetlb will print warning/error information before return for
invalid parameter string, just use return '1' to avoid print again.

Link: https://lkml.kernel.org/r/20220413032915.251254-4-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:57 -07:00
Peng Liu
f87442f407 hugetlb: fix hugepages_setup when deal with pernode
Hugepages can be specified to pernode since "hugetlbfs: extend the
definition of hugepages parameter to support node allocation", but the
following problem is observed.

Confusing behavior is observed when both 1G and 2M hugepage is set
after "numa=off".
 cmdline hugepage settings:
  hugepagesz=1G hugepages=0:3,1:3
  hugepagesz=2M hugepages=0:1024,1:1024
 results:
  HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
  HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages

Furthermore, confusing behavior can be also observed when an invalid node
behind a valid node.  To fix this, never allocate any typical hugepage
when an invalid parameter is received.

Link: https://lkml.kernel.org/r/20220413032915.251254-3-liupeng256@huawei.com
Fixes: b5389086ad ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:57 -07:00
Peng Liu
0a7a0f6f7f hugetlb: fix wrong use of nr_online_nodes
Patch series "hugetlb: Fix some incorrect behavior", v3.

This series fix three bugs of hugetlb:
1) Invalid use of nr_online_nodes;
2) Inconsistency between 1G hugepage and 2M hugepage;
3) Useless information in dmesg.


This patch (of 4):

Certain systems are designed to have sparse/discontiguous nodes.  In this
case, nr_online_nodes can not be used to walk through numa node.  Also, a
valid node may be greater than nr_online_nodes.

However, in hugetlb, it is assumed that nodes are contiguous.

For sparse/discontiguous nodes, the current code may treat a valid node
as invalid, and will fail to allocate all hugepages on a valid node that
"nid >= nr_online_nodes".

As David suggested:

	if (tmp >= nr_online_nodes)
		goto invalid;

Just imagine node 0 and node 2 are online, and node 1 is offline. 
Assuming that "node < 2" is valid is wrong.  

Recheck all the places that use nr_online_nodes, and repair them one by
one.

[liupeng256@huawei.com: v4]
  Link: https://lkml.kernel.org/r/20220416103526.3287348-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220413032915.251254-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220413032915.251254-2-liupeng256@huawei.com
Fixes: 4178158ef8 ("hugetlbfs: fix issue of preallocation of gigantic pages can't work")
Fixes: b5389086ad ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Fixes: e79ce98323 ("hugetlbfs: fix a truncation issue in hugepages parameter")
Fixes: f9317f77a6 ("hugetlb: clean up potential spectre issue warnings")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:57 -07:00
Daniele Ceraolo Spurio
59a4752895 drm/i915: Xe_HP SDV and DG2 have up to 4 CCS engines
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>  # mesa anvil & iris
Link: https://patchwork.freedesktop.org/patch/msgid/20220428041926.1483683-5-matthew.d.roper@intel.com
2022-04-29 14:30:27 -07:00
Matt Roper
ecf8eca51f drm/i915/xehp: Add compute engine ABI
We're now ready to start exposing compute engines to userspace.

v2:
 - Move kerneldoc for other engine classes to a separate patch.  (Andi)

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Szymon Morek <szymon.morek@intel.com>
UMD (mesa): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14395
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>  # mesa anvil & iris
Link: https://patchwork.freedesktop.org/patch/msgid/20220428041926.1483683-4-matthew.d.roper@intel.com
2022-04-29 14:30:27 -07:00
Heiko Schocher
9ff9236394 drm/panel: simple: Add Startek KD070WVFPA043-C069A panel support
Add Startek KD070WVFPA043-C069A 7" TFT LCD panel support.

Signed-off-by: Heiko Schocher <hs@denx.de>
[fabio: passed .flags and .bus_flags]
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220429172056.3499563-2-festevam@gmail.com
2022-04-29 23:30:22 +02:00
Matt Roper
97e17a0906 drm/i915/xehp: Add register for compute engine's MMIO-based TLB invalidation
Compute engines have a separate register that the driver should use to
perform MMIO-based TLB invalidation.

Note that the term "context" in this register's bspec description is
used to refer to the engine instance (in the same way "context" is used
on bspec 46167).

Bspec: 43930
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220428041926.1483683-3-matthew.d.roper@intel.com
2022-04-29 14:30:21 -07:00
Fabio Estevam
1e69a83a5e dt-bindings: display: simple: Add Startek KD070WVFPA043-C069A panel
Add Startek KD070WVFPA043-C069A 7" TFT LCD panel compatible string.

Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220429172056.3499563-1-festevam@gmail.com
2022-04-29 23:30:21 +02:00
Matt Roper
991b4de327 drm/i915/uapi: Add kerneldoc for engine class enum
We'll be adding a new type of engine soon.  Let's document the existing
engine classes first to help make it clear what each type of engine is
used for.

Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220428041926.1483683-2-matthew.d.roper@intel.com
2022-04-29 14:27:37 -07:00
Arnd Bergmann
adee8aa22a Revert "arm: dts: at91: Fix boolean properties with values"
This reverts commit 0dc23d1a8e, which caused another regression
as the pinctrl code actually expects an integer value of 0 or 1
rather than a simple boolean property.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29 23:09:49 +02:00
Minghao Chi
ab7671282b drm/nouveau: simplify the return expression of nouveau_debugfs_init()
Simplify the return expression.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[fixed an indenting error before pushing]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220429090309.3853003-1-chi.minghao@zte.com.cn
2022-04-29 15:44:54 -04:00
Jakub Kicinski
4f159a7c4d linux-can-fixes-for-5.18-20220429
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmJruWQTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCtfkuQ2KDTXekLCACB+A9hMgX3swUDJ5vqBuvWjOZIgcgC
 7sCOjIuO479IgeEhztXNOdAwQ+7BEg6UOOAeqXqSbqVcs/1mLgCHBaUlj7ENGsaI
 7QSqmbf//1is6tqrujOpcKwd45w92ys96xX1TEbohFvjGvVE18DSt5pWuH0fEWs3
 Qw3t+AfqP9aW1l7JwxN5fo6ktWnwXItCcB4fjeWjcV3AR2RG8HD7JXTq8vQrMA+3
 suEvdEbh9nPTw0ll3N+RLyRHq69sGb1EmOi8OxE5UBu1WT7QytzS1a+oZkowg2bt
 Xp2ULRtf66imf+9P5ojYA9j1Pd+FtMJRcn07jEyD+EoDQD7oFzVVg/wO
 =R5GF
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.18-20220429' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-04-29

The first patch is by Oliver Hartkopp and removes the ability to
re-binding bounds sockets from the ISOTP. It turned out to be not
needed and brings unnecessary complexity.

The last 4 patches all target the grcan driver. Duoming Zhou's patch
fixes a potential dead lock in the grcan_close() function. Daniel
Hellstrom's patch fixes the dma_alloc_coherent() to use the correct
device. Andreas Larsson's 1st patch fixes a broken system id check,
the 2nd patch fixes the NAPI poll budget usage.

* tag 'linux-can-fixes-for-5.18-20220429' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: grcan: only use the NAPI poll budget for RX
  can: grcan: grcan_probe(): fix broken system id check for errata workaround needs
  can: grcan: use ofdev->dev when allocating DMA memory
  can: grcan: grcan_close(): fix deadlock
  can: isotp: remove re-binding of bound socket
====================

Link: https://lore.kernel.org/r/20220429125612.1792561-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29 12:33:55 -07:00
Paolo Bonzini
f751d8eac1 KVM: x86: work around QEMU issue with synthetic CPUID leaves
Synthesizing AMD leaves up to 0x80000021 caused problems with QEMU,
which assumes the *host* CPUID[0x80000000].EAX is higher or equal
to what KVM_GET_SUPPORTED_CPUID reports.

This causes QEMU to issue bogus host CPUIDs when preparing the input
to KVM_SET_CPUID2.  It can even get into an infinite loop, which is
only terminated by an abort():

   cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e)

To work around this, only synthesize those leaves if 0x8000001d exists
on the host.  The synthetic 0x80000021 leaf is mostly useful on Zen2,
which satisfies the condition.

Fixes: f144c49e8c ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 15:24:58 -04:00
Sargun Dhillon
662340ef92 selftests/seccomp: Ensure that notifications come in FIFO order
When multiple notifications are waiting, ensure they show up in order, as
defined by the (predictable) seccomp notification ID. This ensures FIFO
ordering of notification delivery as notification ids are monitonic and
decided when the notification is generated (as opposed to received).

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: linux-kselftest@vger.kernel.org
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220428015447.13661-2-sargun@sargun.me
2022-04-29 11:49:18 -07:00
Mark Brown
c8220e8721
ASoC: SOF: Miscellaneous preparatory patches for IPC4
Merge series from Ranjani Sridharan <ranjani.sridharan@linux.intel.com>:

This series includes last few remaining miscellaneous patches to prepare
for the introduction of new IPC version, IPC4, in the SOF driver. The changes
include new IPC ops for topology parsing to set up the volume table, prepare
the widgets for set up and free the routes. The remaining patches introduce
new fields in the existing data structures for use in IPC4 and align the flows
for widget/route set up so that they are common for both IPC3 and IPC4.
2022-04-29 19:48:47 +01:00
Linus Torvalds
3e71713c9e perf tools fixes for v5.18: 4th batch
- Fix Intel PT (Processor Trace) timeless decoding with perf.data directory.
 
 - ARM SPE (Statistical Profiling Extensions) address fixes, for synthesized
   events and for SPE events with physical addresses.  Add a simple 'perf test'
   entry to make sure this doesn't regress.
 
 - Remove arch specific processing of kallsyms data to fixup symbol end address,
   fixing excessive memory consumption in the annotation code.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYmvfpwAKCRCyPKLppCJ+
 JysNAQDtEIvGuRtjANnFqDQqyhrffvAg5BFkLg1HDYAttdsT0AD/bveO3Be5AoVH
 ocyoL9W5qoGo0pgxS5qfB13o5bvhwAE=
 =UlpT
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v5.18-2022-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix Intel PT (Processor Trace) timeless decoding with perf.data
   directory.

 - ARM SPE (Statistical Profiling Extensions) address fixes, for
   synthesized events and for SPE events with physical addresses. Add a
   simple 'perf test' entry to make sure this doesn't regress.

 - Remove arch specific processing of kallsyms data to fixup symbol end
   address, fixing excessive memory consumption in the annotation code.

* tag 'perf-tools-fixes-for-v5.18-2022-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf symbol: Remove arch__symbols__fixup_end()
  perf symbol: Update symbols__fixup_end()
  perf symbol: Pass is_kallsyms to symbols__fixup_end()
  perf test: Add perf_event_attr test for Arm SPE
  perf arm-spe: Fix SPE events with phys addresses
  perf arm-spe: Fix addresses of synthesized SPE events
  perf intel-pt: Fix timeless decoding with perf.data directory
2022-04-29 11:34:07 -07:00
Sargun Dhillon
4cbf6f6211 seccomp: Use FIFO semantics to order notifications
Previously, the seccomp notifier used LIFO semantics, where each
notification would be added on top of the stack, and notifications
were popped off the top of the stack. This could result one process
that generates a large number of notifications preventing other
notifications from being handled. This patch moves from LIFO (stack)
semantics to FIFO (queue semantics).

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220428015447.13661-1-sargun@sargun.me
2022-04-29 11:30:54 -07:00
Yang Guang
95a126d981 selftests/seccomp: Add SKIP for failed unshare()
Running the seccomp tests under the kernel with "defconfig"
shouldn't fail. Because the CONFIG_USER_NS is not supported
in "defconfig". Skipping this case instead of failing it is
better.

Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/7f7687696a5c0a2d040a24474616e945c7cf2bb5.1648599460.git.yang.guang5@zte.com.cn
2022-04-29 11:28:43 -07:00
Jann Horn
d250a3e4e5 selftests/seccomp: Test PTRACE_O_SUSPEND_SECCOMP without CAP_SYS_ADMIN
Add a test to check that PTRACE_O_SUSPEND_SECCOMP can't be set without
CAP_SYS_ADMIN through PTRACE_SEIZE or PTRACE_SETOPTIONS.

Signed-off-by: Jann Horn <jannh@google.com>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-04-29 11:28:42 -07:00
Jann Horn
2bfed7d2ff selftests/seccomp: Don't call read() on TTY from background pgrp
Since commit 92d25637a3 ("kselftest: signal all child processes"), tests
are executed in background process groups. This means that trying to read
from stdin now throws SIGTTIN when stdin is a TTY, which breaks some
seccomp selftests that try to use read(0, NULL, 0) as a dummy syscall.

The simplest way to fix that is probably to just use -1 instead of 0 as
the dummy read()'s FD.

Fixes: 92d25637a3 ("kselftest: signal all child processes")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220319010011.1374622-1-jannh@google.com
2022-04-29 11:28:41 -07:00
Alexandru Elisei
18f3976fdb KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
When userspace is debugging a VM, the kvm_debug_exit_arch part of the
kvm_run struct contains arm64 specific debug information: the ESR_EL2
value, encoded in the field "hsr", and the address of the instruction
that caused the exception, encoded in the field "far".

Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
kvm_debug_exit_arch.hsr cannot be changed because that would change the
memory layout of the struct on big endian machines:

Current layout:			| Layout with "hsr" extended to 64 bits:
				|
offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)

which breaks existing code.

The padding is inserted by the compiler because the "far" field must be
aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
page 18), and the struct itself must be aligned to 8 bytes (the struct must
be aligned to the maximum alignment of its fields - aapcs64, page 18),
which means that "hsr" must be aligned to 8 bytes as it is the first field
in the struct.

To avoid changing the struct size and layout for the existing fields, add a
new field, "hsr_high", which replaces the existing padding. "hsr_high" will
be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
both on big and little endian machine, becomes:

offset 0: ESR_EL2[31:0]  (hsr)
offset 4: ESR_EL2[61:32] (hsr_high)
offset 8: FAR_EL2[61:0]  (far)

The padding that the compiler inserts for the current struct layout is
unitialized. To prevent an updated userspace running on an old kernel
mistaking the padding for a valid "hsr_high" value, add a new flag,
KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that
"hsr_high" holds a valid ESR_EL2[61:32] value.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Alexandru Elisei
0b12620fdd KVM: arm64: Treat ESR_EL2 as a 64-bit register
ESR_EL2 was defined as a 32-bit register in the initial release of the
ARM Architecture Manual for Armv8-A, and was later extended to 64 bits,
with bits [63:32] RES0. ARMv8.7 introduced FEAT_LS64, which makes use of
bits [36:32].

KVM treats ESR_EL1 as a 64-bit register when saving and restoring the
guest context, but ESR_EL2 is handled as a 32-bit register. Start
treating ESR_EL2 as a 64-bit register to allow KVM to make use of the
most significant 32 bits in the future.

The type chosen to represent ESR_EL2 is u64, as that is consistent with the
notation KVM overwhelmingly uses today (u32), and how the rest of the
registers are declared.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-5-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Alexandru Elisei
8d56e5c5a9 arm64: Treat ESR_ELx as a 64-bit register
In the initial release of the ARM Architecture Reference Manual for
ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This
changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture,
when they became 64-bit registers, with bits [63:32] defined as RES0. In
version G.a, a new field was added to ESR_ELx, ISS2, which covers bits
[36:32].  This field is used when the Armv8.7 extension FEAT_LS64 is
implemented.

As a result of the evolution of the register width, Linux stores it as
both a 64-bit value and a 32-bit value, which hasn't affected correctness
so far as Linux only uses the lower 32 bits of the register.

Make the register type consistent and always treat it as 64-bit wide. The
register is redefined as an "unsigned long", which is an unsigned
double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1,
page 14). The type was chosen because "unsigned int" is the most frequent
type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx
in exception handling, is also declared as "unsigned long". The 64-bit type
also makes adding support for architectural features that use fields above
bit 31 easier in the future.

The KVM hypervisor will receive a similar update in a subsequent patch.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-4-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Alexandru Elisei
3fed9e5514 arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall
If a compat process tries to execute an unknown system call above the
__ARM_NR_COMPAT_END number, the kernel sends a SIGILL signal to the
offending process. Information about the error is printed to dmesg in
compat_arm_syscall() -> arm64_notify_die() -> arm64_force_sig_fault() ->
arm64_show_signal().

arm64_show_signal() interprets a non-zero value for
current->thread.fault_code as an exception syndrome and displays the
message associated with the ESR_ELx.EC field (bits 31:26).
current->thread.fault_code is set in compat_arm_syscall() ->
arm64_notify_die() with the bad syscall number instead of a valid ESR_ELx
value. This means that the ESR_ELx.EC field has the value that the user set
for the syscall number and the kernel can end up printing bogus exception
messages*. For example, for the syscall number 0x68000000, which evaluates
to ESR_ELx.EC value of 0x1A (ESR_ELx_EC_FPAC) the kernel prints this error:

[   18.349161] syscall[300]: unhandled exception: ERET/ERETAA/ERETAB, ESR 0x68000000, Oops - bad compat syscall(2) in syscall[10000+50000]
[   18.350639] CPU: 2 PID: 300 Comm: syscall Not tainted 5.18.0-rc1 #79
[   18.351249] Hardware name: Pine64 RockPro64 v2.0 (DT)
[..]

which is misleading, as the bad compat syscall has nothing to do with
pointer authentication.

Stop arm64_show_signal() from printing exception syndrome information by
having compat_arm_syscall() set the ESR_ELx value to 0, as it has no
meaning for an invalid system call number. The example above now becomes:

[   19.935275] syscall[301]: unhandled exception: Oops - bad compat syscall(2) in syscall[10000+50000]
[   19.936124] CPU: 1 PID: 301 Comm: syscall Not tainted 5.18.0-rc1-00005-g7e08006d4102 #80
[   19.936894] Hardware name: Pine64 RockPro64 v2.0 (DT)
[..]

which although shows less information because the syscall number,
wrongfully advertised as the ESR value, is missing, it is better than
showing plainly wrong information. The syscall number can be easily
obtained with strace.

*A 32-bit value above or equal to 0x8000_0000 is interpreted as a negative
integer in compat_arm_syscal() and the condition scno < __ARM_NR_COMPAT_END
evaluates to true; the syscall will exit to userspace in this case with the
ENOSYS error code instead of arm64_notify_die() being called.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-3-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Alexandru Elisei
a99ef9cb4b arm64: Make ESR_ELx_xVC_IMM_MASK compatible with assembly
ESR_ELx_xVC_IMM_MASK is used as a mask for the immediate value for the
HVC/SMC instructions. The header file is included by assembly files (like
entry.S) and ESR_ELx_xVC_IMM_MASK is not conditioned on __ASSEMBLY__ being
undefined. Use the UL() macro for defining the constant's size, as that is
compatible with both C code and assembly, whereas the UL suffix only works
for C code.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-2-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:26 +01:00
Chengming Zhou
c4a0ebf87c arm64/ftrace: Make function graph use ftrace directly
As we do in commit 0c0593b45c ("x86/ftrace: Make function graph
use ftrace directly"), we don't need special hook for graph tracer,
but instead we use graph_ops:func function to install return_hooker.

Since commit 3b23e4991f ("arm64: implement ftrace with regs") add
implementation for FTRACE_WITH_REGS on arm64, we can easily adopt
the same cleanup on arm64.

And this cleanup only changes the FTRACE_WITH_REGS implementation,
so the mcount-based implementation is unaffected.

While in theory it would be possible to make a similar cleanup for
!FTRACE_WITH_REGS, this will require rework of the core code, and
so for now we only change the FTRACE_WITH_REGS implementation.

Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20220420160006.17880-2-zhouchengming@bytedance.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:21:12 +01:00
Chengming Zhou
e999995c84 ftrace: cleanup ftrace_graph_caller enable and disable
The ftrace_[enable,disable]_ftrace_graph_caller() are used to do
special hooks for graph tracer, which are not needed on some ARCHs
that use graph_ops:func function to install return_hooker.

So introduce the weak version in ftrace core code to cleanup
in x86.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220420160006.17880-1-zhouchengming@bytedance.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:21:12 +01:00
Michal Suchanek
d43fae7c4d testing: nvdimm: asm/mce.h is not needed in nfit.c
asm/mce.h is not available on arm, and it is not needed to build nfit.c.
Remove the include.

It was likely needed for COPY_MC_TEST

Fixes: 3adb776384 ("x86, libnvdimm/test: Remove COPY_MC_TEST")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Link: https://lore.kernel.org/r/20220429074334.21771-1-msuchanek@suse.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-29 11:00:10 -07:00
Michal Suchanek
dccfbc73a9 testing: nvdimm: iomap: make __nfit_test_ioremap a macro
The ioremap passed as argument to __nfit_test_ioremap can be a macro so
it cannot be passed as function argument. Make __nfit_test_ioremap into
a macro so that ioremap can be passed as untyped macro argument.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Fixes: 6bc756193f ("tools/testing/nvdimm: libnvdimm unit test infrastructure")
Link: https://lore.kernel.org/r/20220429134039.18252-1-msuchanek@suse.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-29 10:59:39 -07:00
Linus Torvalds
2d0de93ca2 RISC-V Fixes for 5.18-rc6
* A fix to properly ensure a single CPU is running during patch_text().
 * A defconfig update to include RPMSG_CTRL when RPMSG_CHAR was set,
   necessary after a recent refactoring.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmJsAHYTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiZP7D/9ASxIyJFLpiyVisC/5WnCJHmWUXWJC
 DVlXXfR0U6mjFZcvWYvVTIpBa37qUb6p1w85NOK1+zE3gURyTWdAsA5dXxK3PWGr
 zZhVaNPGlQx7nzI675z1JKFEp+EcE1UjzLKL+jdkwY+ep/BSc2vMN0Kj5cbW3L/j
 /Bsi1HK8LdMxG9uTCxdDYs3qKlIh/ZgW0Lvnj3sjV8tqW4D4iXMqcL8l/ydpDnwL
 amYrc4erQyOKA7kbYDPMxUx/A8SVrXepHowUz6lpyJLCC8tdmVwI7rjImb5FzLJK
 idTX7ZAnUiT68Jom3VOhqGYVRS/uZHxr3kLtkNtPAm+yHafvNRnYPrqmhTpTV4BP
 eZpvUq5a90NoX+Xi1s0ou+m0LGbIzXOKVcC8ffnxd6I3EGQiWYvJyjwi1LEGFLMB
 iP+iT5GlzWbrAsrXaHGGVvc2TymfaxL4VglapBjRYSLV6Evg0QorZbyLrNepBKaL
 XO61miNN6FFETkKZ29vd1KAzEmT8/Gf1APJwRKsRtcfAE1ndm0lJ/j5CRF5rP45K
 cp9zl/i/jthZSBR7NjQDCIXNlK+MvAs9HtwuDIP+vgVyfTQDZJcNfeTnSJdf46KV
 dHtn9eu00fMNEIiFvFbYyWGfcHyKwGziIUWDxe7OtLbmyiHWzKdVZMR29WTmJHpI
 0KMTBoUya4yAbA==
 =BK1v
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to properly ensure a single CPU is running during patch_text().

 - A defconfig update to include RPMSG_CTRL when RPMSG_CHAR was set,
   necessary after a recent refactoring.

* tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL
  riscv: patch_text: Fixup last cpu should be master
2022-04-29 10:44:58 -07:00
Linus Torvalds
66c2112b74 arm64 fix for -rc5
- Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmJryRkQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNOS/B/43vhg0Bz0sH9poU0r3YE6TfEeCTd2HwCBq
 Caf6VAM6K9Y7JwPSFkAgnmW2DOG1hif83mCaX47Dkb4bJZaUQpmzJ+ZsK0nCe/Wo
 GIBzTdjRLrl6LQAqcW860s2eE4Ac212LszDxdp5DXFp4HjzbSfsteUFWipnpyXEs
 KtA4Z991fbsj+FsWY42zqhFkMnnPO+6twT5okYN/S6tONrojCTZhRTKupBr4VRAk
 94xD+pJI/8EfyakEhNN2PabhCvlq0bEI/EB6pJMrL6GjMs5t0pbpowXHzMoctZuf
 /SkRgyILDNUPbAa6wGCJoV4ZcgFH7PDy80iDMwNscb/DgAO8xo6t
 =hafH
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Will Deacon:
 "Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type.

  This is a fix to the MTE ELF ABI for a bug that was added during the
  most recent merge window as part of the coredump support.

  The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE
  segment type has already been allocated to PT_AARCH64_UNWIND by the
  ELF ABI, so we've bumped the value and changed the name of the
  identifier to be better aligned with the existing one"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  elf: Fix the arm64 MTE ELF segment name and value
2022-04-29 10:36:47 -07:00
Lai Jiangshan
84e5ffd045 KVM: X86/MMU: Fix shadowing 5-level NPT for 4-level NPT L1 guest
When shadowing 5-level NPT for 4-level NPT L1 guest, the root_sp is
allocated with role.level = 5 and the guest pagetable's root gfn.

And root_sp->spt[0] is also allocated with the same gfn and the same
role except role.level = 4.  Luckily that they are different shadow
pages, but only root_sp->spt[0] is the real translation of the guest
pagetable.

Here comes a problem:

If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse)
and uses the same gfn as the root page for nested NPT before and after
switching gCR4_LA57.  The host (hCR4_LA57=1) might use the same root_sp
for the guest even the guest switches gCR4_LA57.  The guest will see
unexpected page mapped and L2 may exploit the bug and hurt L1.  It is
lucky that the problem can't hurt L0.

And three special cases need to be handled:

The root_sp should be like role.direct=1 sometimes: its contents are
not backed by gptes, root_sp->gfns is meaningless.  (For a normal high
level sp in shadow paging, sp->gfns is often unused and kept zero, but
it could be relevant and meaningful if sp->gfns is used because they
are backed by concrete gptes.)

For such root_sp in the case, root_sp is just a portal to contribute
root_sp->spt[0], and root_sp->gfns should not be used and
root_sp->spt[0] should not be dropped if gpte[0] of the guest root
pagetable is changed.

Such root_sp should not be accounted too.

So add role.passthrough to distinguish the shadow pages in the hash
when gCR4_LA57 is toggled and fix above special cases by using it in
kvm_mmu_page_{get|set}_gfn() and sp_has_gptes().

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220420131204.2850-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:50:00 -04:00
Lai Jiangshan
767d8d8d50 KVM: X86/MMU: Add sp_has_gptes()
Add sp_has_gptes() which equals to !sp->role.direct currently.

Shadow page having gptes needs to be write-protected, accounted and
responded to kvm_mmu_pte_write().

Use it in these places to replace !sp->role.direct and rename
for_each_gfn_indirect_valid_sp.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220420131204.2850-2-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:50:00 -04:00
Suravee Suthikulpanit
9f084f7c2e KVM: SVM: Introduce trace point for the slow-path of avic_kic_target_vcpus
This can help identify potential performance issues when handles
AVIC incomplete IPI due vCPU not running.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220420154954.19305-3-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:50:00 -04:00
Suravee Suthikulpanit
7223fd2d53 KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible
Currently, an AVIC-enabled VM suffers from performance bottleneck
when scaling to large number of vCPUs for I/O intensive workloads.

In such case, a vCPU often executes halt instruction to get into idle state
waiting for interrupts, in which KVM would de-schedule the vCPU from
physical CPU.

When AVIC HW tries to deliver interrupt to the halting vCPU, it would
result in AVIC incomplete IPI #vmexit to notify KVM to reschedule
the target vCPU into running state.

Investigation has shown the main hotspot is in the kvm_apic_match_dest()
in the following call stack where it tries to find target vCPUs
corresponding to the information in the ICRH/ICRL registers.

  - handle_exit
    - svm_invoke_exit_handler
      - avic_incomplete_ipi_interception
        - kvm_apic_match_dest

However, AVIC provides hints in the #vmexit info, which can be used to
retrieve the destination guest physical APIC ID.

In addition, since QEMU defines guest physical APIC ID to be the same as
vCPU ID, it can be used to quickly identify the target vCPU to deliver IPI,
and avoid the overhead from searching through all vCPUs to match the target
vCPU.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220420154954.19305-2-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:49:59 -04:00
Paolo Bonzini
347a0d0ded KVM: x86/mmu: replace direct_map with root_role.direct
direct_map is always equal to the direct field of the root page's role:

- for shadow paging, direct_map is true if CR0.PG=0 and root_role.direct is
copied from cpu_role.base.direct

- for TDP, it is always true and root_role.direct is also always true

- for shadow TDP, it is always false and root_role.direct is also always
false

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:49:59 -04:00
Paolo Bonzini
4d25502aa1 KVM: x86/mmu: replace root_level with cpu_role.base.level
Remove another duplicate field of struct kvm_mmu.  This time it's
the root level for page table walking; the separate field is
always initialized as cpu_role.base.level, so its users can look
up the CPU mode directly instead.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:49:58 -04:00
Paolo Bonzini
a972e29c1d KVM: x86/mmu: replace shadow_root_level with root_role.level
root_role.level is always the same value as shadow_level:

- it's kvm_mmu_get_tdp_level(vcpu) when going through init_kvm_tdp_mmu

- it's the level argument when going through kvm_init_shadow_ept_mmu

- it's assigned directly from new_role.base.level when going
  through shadow_mmu_init_context

Remove the duplication and get the level directly from the role.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:49:58 -04:00
Paolo Bonzini
a7f1de9b60 KVM: x86/mmu: pull CPU mode computation to kvm_init_mmu
Do not lead init_kvm_*mmu into the temptation of poking
into struct kvm_mmu_role_regs, by passing to it directly
the CPU mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:49:57 -04:00
Paolo Bonzini
56b321f9e3 KVM: x86/mmu: simplify and/or inline computation of shadow MMU roles
Shadow MMUs compute their role from cpu_role.base, simply by adjusting
the root level.  It's one line of code, so do not place it in a separate
function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:49:57 -04:00