linux-xiaomi-chiron

Author	SHA1	Message	Date
Isabella Basso	5427d3d772	test_hash.c: split test_hash_init Split up test_hash_init so that it calls each test more explicitly insofar it is possible without rewriting the entire file. This aims at improving readability. Split tests performed on string_or as they don't interfere with those performed in hash_or. Also separate pr_info calls about skipped tests as they're not part of the tests themselves, but only warn about (un)defined arch-specific hash functions. Link: https://lkml.kernel.org/r/20211208183711.390454-4-isabbasso@riseup.net Reviewed-by: David Gow <davidgow@google.com> Tested-by: David Gow <davidgow@google.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Cc: Augusto Durães Camargo <augusto.duraes33@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: Enzo Ferreira <ferreiraenzoa@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: kernel test robot <lkp@intel.com> Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:54 +02:00
Isabella Basso	ae7880676b	test_hash.c: split test_int_hash into arch-specific functions Split the test_int_hash function to keep its mainloop separate from arch-specific chunks, which are only compiled as needed. This aims at improving readability. Link: https://lkml.kernel.org/r/20211208183711.390454-3-isabbasso@riseup.net Reviewed-by: David Gow <davidgow@google.com> Tested-by: David Gow <davidgow@google.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Cc: Augusto Durães Camargo <augusto.duraes33@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: Enzo Ferreira <ferreiraenzoa@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: kernel test robot <lkp@intel.com> Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:54 +02:00
Isabella Basso	fd0a146240	hash.h: remove unused define directive Patch series "test_hash.c: refactor into KUnit", v3. We refactored the lib/test_hash.c file into KUnit as part of the student group LKCAMP [1] introductory hackathon for kernel development. This test was pointed to our group by Daniel Latypov [2], so its full conversion into a pure KUnit test was our goal in this patch series, but we ran into many problems relating to it not being split as unit tests, which complicated matters a bit, as the reasoning behind the original tests is quite cryptic for those unfamiliar with hash implementations. Some interesting developments we'd like to highlight are: - In patch 1/5 we noticed that there was an unused define directive that could be removed. - In patch 4/5 we noticed how stringhash and hash tests are all under the lib/test_hash.c file, which might cause some confusion, and we also broke those kernel config entries up. Overall KUnit developments have been made in the other patches in this series: In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c file so as to make it more compatible with the KUnit style, whilst preserving the original idea of the maintainer who designed it (i.e. George Spelvin), which might be undesirable for unit tests, but we assume it is enough for a first patch. This patch (of 5): Currently, there exist hash_32() and __hash_32() functions, which were introduced in a patch [1] targeting architecture specific optimizations. These functions can be overridden on a per-architecture basis to achieve such optimizations. They must set their corresponding define directive (HAVE_ARCH_HASH_32 and HAVE_ARCH__HASH_32, respectively) so that header files can deal with these overrides properly. As the supported 32-bit architectures that have their own hash function implementation (i.e. m68k, Microblaze, H8/300, pa-risc) have only been making use of the (more general) __hash_32() function (which only lacks a right shift operation when compared to the hash_32() function), remove the define directive corresponding to the arch-specific hash_32() implementation. [1] https://lore.kernel.org/lkml/20160525073311.5600.qmail@ns.sciencehorizons.net/ [akpm@linux-foundation.org: hash_32_generic() becomes hash_32()] Link: https://lkml.kernel.org/r/20211208183711.390454-1-isabbasso@riseup.net Link: https://lkml.kernel.org/r/20211208183711.390454-2-isabbasso@riseup.net Reviewed-by: David Gow <davidgow@google.com> Tested-by: David Gow <davidgow@google.com> Co-developed-by: Augusto Durães Camargo <augusto.duraes33@gmail.com> Signed-off-by: Augusto Durães Camargo <augusto.duraes33@gmail.com> Co-developed-by: Enzo Ferreira <ferreiraenzoa@gmail.com> Signed-off-by: Enzo Ferreira <ferreiraenzoa@gmail.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:54 +02:00
Zhen Lei	a31f9336ed	lib/list_debug.c: print more list debugging context in __list_del_entry_valid() Currently, the entry->prev and entry->next are considered to be valid as long as they are not LIST_POISON{1\|2}. However, the memory may be corrupted. The prev->next is invalid probably because 'prev' is invalid, not because prev->next's content is illegal. Unfortunately, the printk and its subfunctions will modify the registers that hold the 'prev' and 'next', and we don't see this valuable information in the BUG context. So print the contents of 'entry->prev' and 'entry->next'. Here's an example: list_del corruption. prev->next should be c0ecbf74, but was c08410dc kernel BUG at lib/list_debug.c:53! ... ... PC is at __list_del_entry_valid+0x58/0x98 LR is at __list_del_entry_valid+0x58/0x98 psr: 60000093 sp : c0ecbf30 ip : 00000000 fp : 00000001 r10: c08410d0 r9 : 00000001 r8 : c0825e0c r7 : 20000013 r6 : c08410d0 r5 : c0ecbf74 r4 : c0ecbf74 r3 : c0825d08 r2 : 00000000 r1 : df7ce6f4 r0 : 00000044 ... ... Stack: (0xc0ecbf30 to 0xc0ecc000) bf20: c0ecbf74 c0164fd0 c0ecbf70 c0165170 bf40: c0eca000 c0840c00 c0840c00 c0824500 c0825e0c c0189bbc c088f404 60000013 bf60: 60000013 c0e85100 000004ec 00000000 c0ebcdc0 c0ecbf74 c0ecbf74 c0825d08 bf80: c0e807c0 c018965c 00000000 c013f2a0 c0e807c0 c013f154 00000000 00000000 bfa0: 00000000 00000000 00000000 c01001b0 00000000 00000000 00000000 00000000 bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 (__list_del_entry_valid) from (__list_del_entry+0xc/0x20) (__list_del_entry) from (finish_swait+0x60/0x7c) (finish_swait) from (rcu_gp_kthread+0x560/0xa20) (rcu_gp_kthread) from (kthread+0x14c/0x15c) (kthread) from (ret_from_fork+0x14/0x24) At first, I thought prev->next was overwritten. Later, I carefully analyzed the RCU code and the disassembly code. The error occurred when deleting a node from the list rcu_state.gp_wq. The System.map shows that the address of rcu_state is c0840c00. Then I use gdb to obtain the offset of rcu_state.gp_wq.task_list. (gdb) p &((struct rcu_state )0)->gp_wq.task_list $1 = (struct list_head ) 0x4dc Again: list_del corruption. prev->next should be c0ecbf74, but was c08410dc c08410dc = c0840c00 + 0x4dc = &rcu_state.gp_wq.task_list Because rcu_state.gp_wq has at most one node, so I can guess that "prev = &rcu_state.gp_wq.task_list". But for other scenes, maybe I wasn't so lucky, I cannot figure out the value of 'prev'. Link: https://lkml.kernel.org/r/20211207025835.1909-1-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Andy Shevchenko	0425473037	list: introduce list_is_head() helper and re-use it in list.h Introduce list_is_head() in the similar () way as it's done for list_entry_is_head(). Make use of it in the list.h. ) it's done as inliner and not a macro to be aligned with other list_is_*() APIs; while at it, make all three to have the same style. Link: https://lkml.kernel.org/r/20211201141824.81400-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Alexey Dobriyan	70ac69928e	kstrtox: uninline everything I've made a mistake of looking into lib/kstrtox.o code generation. The only function remotely performance critical is _parse_integer() (via /proc//map_files/), everything else is not. Uninline everything, shrink lib/kstrtox.o by ~20 % ! Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/23 up/down: 0/-1269 (-1269 !!!) Function old new delta kstrtoull 16 13 -3 kstrtouint 59 48 -11 kstrtou8 60 49 -11 kstrtou16 61 50 -11 _kstrtoul 46 35 -11 kstrtoull_from_user 95 83 -12 kstrtoul_from_user 95 83 -12 kstrtoll 93 80 -13 kstrtouint_from_user 124 83 -41 kstrtou8_from_user 125 83 -42 kstrtou16_from_user 126 83 -43 kstrtos8 101 50 -51 kstrtos16 102 51 -51 kstrtoint 100 49 -51 _kstrtol 93 35 -58 kstrtobool_from_user 156 75 -81 kstrtoll_from_user 165 83 -82 kstrtol_from_user 165 83 -82 kstrtoint_from_user 172 83 -89 kstrtos8_from_user 173 83 -90 kstrtos16_from_user 174 83 -91 _parse_integer 136 10 -126 _kstrtoull 308 101 -207 Total: Before=3421236, After=3419967, chg -0.04% Link: https://lkml.kernel.org/r/YZDsFDhHst4m2Pnt@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Randy Dunlap	26d98e9f78	get_maintainer: don't remind about no git repo when --nogit is used When --nogit is used with scripts/get_maintainer.pl, the script spews 4 lines of unnecessary information (noise). Do not print those lines when --nogit is specified. This change removes the printing of these 4 lines: ./scripts/get_maintainer.pl: No supported VCS found. Add --nogit to options? Using a git repository produces better results. Try Linus Torvalds' latest git repository using: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Link: https://lkml.kernel.org/r/20220102031424.3328-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Davidlohr Bueso	7f8ca0edfe	kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP) PRIO_PGRP needs the tasklist_lock mainly to serialize vs setpgid(2), to protect against any concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. However, the remaining can only rely only on RCU: PRIO_PROCESS only does the task lookup and never iterates over tasklist and we already have an rcu-aware stable pointer. PRIO_USER is already racy vs setuid(2) so with creds being rcu protected, we can end up seeing stale data. When removing the tasklist_lock there can be a race with (i) fork but this is benign as the child's nice is inherited and the new task is not observable by the user yet either, hence the return semantics do not differ. And (ii) a race with exit, which is a small window and can cause us to miss a task which was removed from the list and it had the highest nice. Similarly change the buggy do_each_thread/while_each_thread combo in PRIO_USER for the rcu-safe for_each_process_thread flavor, which doesn't make use of next_thread/p->thread_group. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20211210182250.43734-1-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	d6986ce24f	kthread: dynamically allocate memory to store kthread's full name When I was implementing a new per-cpu kthread cfs_migration, I found the comm of it "cfs_migration/%u" is truncated due to the limitation of TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 all have the same name "cfs_migration/1", which will confuse the user. This issue is not critical, because we can get the corresponding CPU from the task's Cpus_allowed. But for kthreads corresponding to other hardware devices, it is not easy to get the detailed device info from task comm, for example, jbd2/nvme0n1p2- xfs-reclaim/sdf Currently there are so many truncated kthreads: rcu_tasks_kthre rcu_tasks_rude_ rcu_tasks_trace poll_mpt3sas0_s ext4-rsv-conver xfs-reclaim/sd{a, b, c, ...} xfs-blockgc/sd{a, b, c, ...} xfs-inodegc/sd{a, b, c, ...} audit_send_repl ecryptfs-kthrea vfio-irqfd-clea jbd2/nvme0n1p2- ... We can shorten these names to work around this problem, but it may be not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for example, it is a nice name, and it is not a good idea to shorten it. One possible way to fix this issue is extending the task comm size, but as task->comm is used in lots of places, that may cause some potential buffer overflows. Another more conservative approach is introducing a new pointer to store kthread's full name if it is truncated, which won't introduce too much overhead as it is in the non-critical path. Finally we make a dicision to use the second approach. See also the discussions in this thread: https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/ After this change, the full name of these truncated kthreads will be displayed via /proc/[pid]/comm: rcu_tasks_kthread rcu_tasks_rude_kthread rcu_tasks_trace_kthread poll_mpt3sas0_statu ext4-rsv-conversion xfs-reclaim/sdf1 xfs-blockgc/sdf1 xfs-inodegc/sdf1 audit_send_reply ecryptfs-kthread vfio-irqfd-cleanup jbd2/nvme0n1p2-8 Link: https://lkml.kernel.org/r/20211120112850.46047-1-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Suggested-by: Petr Mladek <pmladek@suse.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	3087c61ed2	tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN As the sched:sched_switch tracepoint args are derived from the kernel, we'd better make it same with the kernel. So the macro TASK_COMM_LEN is converted to type enum, then all the BPF programs can get it through BTF. The BPF program which wants to use TASK_COMM_LEN should include the header vmlinux.h. Regarding the test_stacktrace_map and test_tracepoint, as the type defined in linux/bpf.h are also defined in vmlinux.h, so we don't need to include linux/bpf.h again. Link: https://lkml.kernel.org/r/20211120112738.45980-8-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	4cfb943537	tools/bpf/bpftool/skeleton: replace bpf_probe_read_kernel with bpf_probe_read_kernel_str to get task comm bpf_probe_read_kernel_str() will add a nul terminator to the dst, then we don't care about if the dst size is big enough. Link: https://lkml.kernel.org/r/20211120112738.45980-7-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	d068144d3b	samples/bpf/test_overhead_kprobe_kern: replace bpf_probe_read_kernel with bpf_probe_read_kernel_str to get task comm bpf_probe_read_kernel_str() will add a nul terminator to the dst, then we don't care about if the dst size is big enough. This patch also replaces the hard-coded 16 with TASK_COMM_LEN to make it grepable. Link: https://lkml.kernel.org/r/20211120112738.45980-6-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	95af469c4f	fs/binfmt_elf: replace open-coded string copy with get_task_comm It is better to use get_task_comm() instead of the open coded string copy as we do in other places. struct elf_prpsinfo is used to dump the task information in userspace coredump or kernel vmcore. Below is the verification of vmcore, crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0] > 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1] > 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2] > 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3] > 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4] > 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5] 0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6] > 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7] > 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8] > 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9] > 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10] > 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11] > 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12] > 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13] > 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14] > 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15] It works well as expected. Some comments are added to explain why we use the hard-coded 16. Link: https://lkml.kernel.org/r/20211120112738.45980-5-laoar.shao@gmail.com Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	7b6397d7e5	drivers/infiniband: replace open-coded string copy with get_task_comm We'd better use the helper get_task_comm() rather than the open-coded strlcpy() to get task comm. As the comment above the hard-coded 16, we can replace it with TASK_COMM_LEN. Link: https://lkml.kernel.org/r/20211120112738.45980-4-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	503471ac36	fs/exec: replace strncpy with strscpy_pad in __get_task_comm If the dest buffer size is smaller than sizeof(tsk->comm), the buffer will be without null ternimator, that may cause problem. Using strscpy_pad() instead of strncpy() in __get_task_comm() can make the string always nul ternimated and zero padded. Link: https://lkml.kernel.org/r/20211120112738.45980-3-laoar.shao@gmail.com Suggested-by: Kees Cook <keescook@chromium.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Yafang Shao	06c5088aee	fs/exec: replace strlcpy with strscpy_pad in __set_task_comm Patch series "task comm cleanups", v2. This patchset is part of the patchset "extend task comm from 16 to 24"[1]. Now we have different opinion that dynamically allocates memory to store kthread's long name into a separate pointer, so I decide to take the useful cleanups apart from the original patchset and send it separately[2]. These useful cleanups can make the usage around task comm less error-prone. Furthermore, it will be useful if we want to extend task comm in the future. [1]. https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/ [2]. https://lore.kernel.org/lkml/CALOAHbAx55AUo3bm8ZepZSZnw7A08cvKPdPyNTf=E_tPqmw5hw@mail.gmail.com/ This patch (of 7): strlcpy() can trigger out-of-bound reads on the source string[1], we'd better use strscpy() instead. To make it be robust against full tsk->comm copies that got noticed in other places, we should make sure it's zero padded. [1] https://github.com/KSPP/linux/issues/89 Link: https://lkml.kernel.org/r/20211120112738.45980-1-laoar.shao@gmail.com Link: https://lkml.kernel.org/r/20211120112738.45980-2-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Andy Shevchenko	40cbf09f06	kernel.h: include a note to discourage people from including it in headers Include a note at the top to discourage people from including it in headers. Link: https://lkml.kernel.org/r/20211209150803.4473-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
Andy Shevchenko	22c033989c	include/linux/unaligned: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. The rest of the changes are induced by the above and may not be split. Link: https://lkml.kernel.org/r/20211209123823.20425-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> [brcmfmac] Acked-by: Kalle Valo <kvalo@kernel.org> Cc: Arend van Spriel <aspriel@gmail.com> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Chi-hsien Lin <chi-hsien.lin@infineon.com> Cc: Wright Feng <wright.feng@infineon.com> Cc: Chung-hsien Hsu <chung-hsien.hsu@infineon.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:53 +02:00
luo penghao	7080cead5d	sysctl: remove redundant ret assignment Subsequent if judgments will assign new values to ret, so the statement here should be deleted The clang_analyzer complains as follows: fs/proc/proc_sysctl.c: Value stored to 'ret' is never read Link: https://lkml.kernel.org/r/20211230063622.586360-1-luo.penghao@zte.com.cn Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Geert Uytterhoeven	153ee1c41a	sysctl: fix duplicate path separator in printed entries sysctl_print_dir() always terminates the printed path name with a slash, so printing a slash before the file part causes a duplicate like in sysctl duplicate entry: /kernel//perf_user_access Fix this by dropping the extra slash. Link: https://lkml.kernel.org/r/e3054d605dc56f83971e4b6d2f5fa63a978720ad.1641551872.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Qi Zheng	51a1873440	proc: convert the return type of proc_fd_access_allowed() to be boolean Convert return type of proc_fd_access_allowed() and the 'allowed' in it to be boolean since the return type of ptrace_may_access() is boolean. Link: https://lkml.kernel.org/r/20211219024404.29779-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Hans de Goede	ae62fbe299	proc: make the proc_create[_data]() stubs static inlines Change the proc_create[_data]() stubs which are used when CONFIG_PROC_FS is not set from #defines to a static inline stubs. This should fix clang -Werror builds failing due to errors like this: drivers/platform/x86/thinkpad_acpi.c:918:30: error: unused variable 'dispatch_proc_ops' [-Werror,-Wunused-const-variable] Fixing this in include/linux/proc_fs.h should ensure that the same issue is also fixed in any other drivers hitting the same -Werror issue. [akpm@linux-foundation.org: fix CONFIG_PROC_FS=n] [akpm@linux-foundation.org: fix arch/sparc/kernel/led.c] [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/20211116131112.508304-1-hdegoede@redhat.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hans de Goede <hdegoede@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
David Hildenbrand	25bc5b0de9	proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration In commit `cc5f2704c9` ("proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks"), we added detection of surprise vmcore_cb unregistration after the vmcore was already opened. Once detected, we warn the user and simulate reading zeroes from that point on when accessing the vmcore. The basic reason was that unexpected unregistration, for example, by manually unbinding a driver from a device after opening the vmcore, is not supported and could result in reading oldmem the vmcore_cb would have actually prohibited while registered. However, something like that can similarly be trigger by a user that's really looking for trouble simply by unbinding the relevant driver before opening the vmcore -- or by disallowing loading the driver in the first place. So it's actually of limited help. Currently, unregistration can only be triggered via virtio-mem when manually unbinding the driver from the device inside the VM; there is no way to trigger it from the hypervisor, as hypervisors don't allow for unplugging virtio-mem devices -- ripping out system RAM from a VM without coordination with the guest is usually not a good idea. The important part is that unbinding the driver and unregistering the vmcore_cb while concurrently reading the vmcore won't crash the system, and that is handled by the rwsem. To make the mechanism more future proof, let's remove the "read zero" part, but leave the warning in place. For example, we could have a future driver (like virtio-balloon) that will contact the hypervisor to figure out if we already populated a page for a given PFN. Hotunplugging such a device and consequently unregistering the vmcore_cb could be triggered from the hypervisor without harming the system even while kdump is running. In that case, we don't want to silently end up with a vmcore that contains wrong data, because the user inside the VM might be unaware of the hypervisor action and might easily miss the warning in the log. Link: https://lkml.kernel.org/r/20211111192243.22002-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Philipp Rudo <prudo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Kefeng Wang	20c0357646	mm: percpu: add generic pcpu_populate_pte() function With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to populate pte, this patch adds a generic pcpu populate pte function, pcpu_populate_pte(), which is marked __weak and used on most architectures, but it is overridden on x86, which has its own implementation. Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Kefeng Wang	23f917169e	mm: percpu: add generic pcpu_fc_alloc/free funciton With the previous patch, we could add a generic pcpu first chunk allocate and free function to cleanup the duplicated definations on each architecture. Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Kefeng Wang	1ca3fb3abd	mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu first chunk allocation will call it to alloc memblock on the corresponding node by it, this is prepare for the next patch. Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Kefeng Wang	7ecd19cfdf	mm: percpu: generalize percpu related config Patch series "mm: percpu: Cleanup percpu first chunk function". When supporting page mapping percpu first chunk allocator on arm64, we found there are lots of duplicated codes in percpu embed/page first chunk allocator. This patchset is aimed to cleanup them and should no function change. The currently supported status about 'embed' and 'page' in Archs shows below, embed: NEED_PER_CPU_PAGE_FIRST_CHUNK page: NEED_PER_CPU_EMBED_FIRST_CHUNK embed page ------------------------ arm64 Y Y mips Y N powerpc Y Y riscv Y N sparc Y Y x86 Y Y ------------------------ There are two interfaces about percpu first chunk allocator, extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, size_t atom_size, pcpu_fc_cpu_distance_fn_t cpu_distance_fn, - pcpu_fc_alloc_fn_t alloc_fn, - pcpu_fc_free_fn_t free_fn); + pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn); extern int __init pcpu_page_first_chunk(size_t reserved_size, - pcpu_fc_alloc_fn_t alloc_fn, - pcpu_fc_free_fn_t free_fn, - pcpu_fc_populate_pte_fn_t populate_pte_fn); + pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn); The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the pcpu_embed/page_first_chunk(). 1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be provided when archs supported NUMA. 2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too, a generic pcpu_populate_pte() which marked '__weak' is provided, if you need a different function to populate pte on the arch(like x86), please provide its own implementation. [1] https://github.com/kevin78/linux.git percpu-cleanup This patch (of 4): The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/ NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have duplicate definitions on platforms that subscribe it. Move them into mm, drop these redundant definitions and instead just select it on applicable platforms. Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-20 08:52:52 +02:00
Steve French	51620150ca	cifs: update internal module number To 2.35 Signed-off-by: Steve French <stfrench@microsoft.com>	2022-01-19 23:14:34 -06:00
Steve French	52d005337b	smb3: send NTLMSSP version information For improved debugging it can be helpful to send version information as other clients do during NTLMSSP negotiation. See protocol document MS-NLMP section 2.2.1.1 Set the major and minor versions based on the kernel version, and the BuildNumber based on the internal cifs.ko module version number, and following the recommendation in the protocol documentation (MS-NLMP section 2.2.10) we set the NTLMRevisionCurrent field to 15. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-01-19 23:14:15 -06:00
kernel test robot	20aa49541a	riscv: fix boolconv.cocci warnings arch/riscv/mm/init.c:48:11-16: WARNING: conversion to bool not needed here Remove unneeded conversion to bool Semantic patch information: Relational and logical operators evaluate to bool, explicit conversion is overly verbose and unneeded. Generated by: scripts/coccinelle/misc/boolconv.cocci Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 19:52:30 -08:00
Palmer Dabbelt	0c34e79e52	RISC-V: Introduce sv48 support without relocatable kernel This patchset allows to have a single kernel for sv39 and sv48 without being relocatable. The idea comes from Arnd Bergmann who suggested to do the same as x86, that is mapping the kernel to the end of the address space, which allows the kernel to be linked at the same address for both sv39 and sv48 and then does not require to be relocated at runtime. This implements sv48 support at runtime. The kernel will try to boot with 4-level page table and will fallback to 3-level if the HW does not support it. Folding the 4th level into a 3-level page table has almost no cost at runtime. Note that kasan region had to be moved to the end of the address space since its location must be known at compile-time and then be valid for both sv39 and sv48 (and sv57 that is coming). * riscv-sv48-v3: riscv: Explicit comment about user virtual address space size riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo riscv: Implement sv48 support asm-generic: Prepare for riscv use of pud_alloc_one and pud_free riscv: Allow to dynamically define VA_BITS riscv: Introduce functions to switch pt_ops riscv: Split early kasan mapping to prepare sv48 introduction riscv: Move KASAN mapping next to the kernel mapping riscv: Get rid of MAXPHYSMEM configs Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 19:37:44 -08:00
Alexandre Ghiti	c774de22c4	riscv: Explicit comment about user virtual address space size Define precisely the size of the user accessible virtual space size for sv32/39/48 mmu types and explain why the whole virtual address space is split into 2 equal chunks between kernel and user space. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:11 -08:00
Alexandre Ghiti	73c7c8f68e	riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo Now that the mmu type is determined at runtime using SATP characteristic, use the global variable pgtable_l4_enabled to output mmu type of the processor through /proc/cpuinfo instead of relying on device tree infos. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:10 -08:00
Alexandre Ghiti	e8a62cc26d	riscv: Implement sv48 support By adding a new 4th level of page table, give the possibility to 64bit kernel to address 2^48 bytes of virtual address: in practice, that offers 128TB of virtual address space to userspace and allows up to 64TB of physical memory. If the underlying hardware does not support sv48, we will automatically fallback to a standard 3-level page table by folding the new PUD level into PGDIR level. In order to detect HW capabilities at runtime, we use SATP feature that ignores writes with an unsupported mode. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:09 -08:00
Alexandre Ghiti	60639f74c2	asm-generic: Prepare for riscv use of pud_alloc_one and pud_free In the following commits, riscv will almost use the generic versions of pud_alloc_one and pud_free but an additional check is required since those functions are only relevant when using at least a 4-level page table, which will be determined at runtime on riscv. So move the content of those functions into other functions that riscv can use without duplicating code. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:08 -08:00
Alexandre Ghiti	3270bfdb9e	riscv: Allow to dynamically define VA_BITS With 4-level page table folding at runtime, we don't know at compile time the size of the virtual address space so we must set VA_BITS dynamically so that sparsemem reserves the right amount of memory for struct pages. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:07 -08:00
Alexandre Ghiti	840125a97a	riscv: Introduce functions to switch pt_ops This simply gathers the different pt_ops initialization in functions where a comment was added to explain why the page table operations must be changed along the boot process. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:06 -08:00
Alexandre Ghiti	2efad17e57	riscv: Split early kasan mapping to prepare sv48 introduction Now that kasan shadow region is next to the kernel, for sv48, this region won't be aligned on PGDIR_SIZE and then when populating this region, we'll need to get down to lower levels of the page table. So instead of reimplementing the page table walk for the early population, take advantage of the existing functions used for the final population. Note that kasan swapper initialization must also be split since memblock is not initialized at this point and as the last PGD is shared with the kernel, we'd need to allocate a PUD so postpone the kasan final population after the kernel population is done. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:05 -08:00
Alexandre Ghiti	f7ae02333d	riscv: Move KASAN mapping next to the kernel mapping Now that KASAN_SHADOW_OFFSET is defined at compile time as a config, this value must remain constant whatever the size of the virtual address space, which is only possible by pushing this region at the end of the address space next to the kernel mapping. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 17:54:04 -08:00
Alexandre Ghiti	db1503d355	riscv: Get rid of MAXPHYSMEM configs CONFIG_MAXPHYSMEM_* are actually never used, even the nommu defconfigs selecting the MAXPHYSMEM_2GB had no effects on PAGE_OFFSET since it was preempted by !MMU case right before. In addition, the move of the kernel mapping at the end of the address space broke the use of MAXPHYSMEM_2G with MMU since it defines PAGE_OFFSET at the same address as the kernel mapping. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: `2bfc6cd81b` ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Conor Dooley <Conor.Dooley@microchip.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2022-01-19 15:12:32 -08:00
Brian Foster	6191cf3ad5	xfs: flush inodegc workqueue tasks before cancel The xfs_inodegc_stop() helper performs a high level flush of pending work on the percpu queues and then runs a cancel_work_sync() on each of the percpu work tasks to ensure all work has completed before returning. While cancel_work_sync() waits for wq tasks to complete, it does not guarantee work tasks have started. This means that the _stop() helper can queue and instantly cancel a wq task without having completed the associated work. This can be observed by tracepoint inspection of a simple "rm -f <file>; fsfreeze -f <mnt>" test: xfs_destroy_inode: ... ino 0x83 ... xfs_inode_set_need_inactive: ... ino 0x83 ... xfs_inodegc_stop: ... ... xfs_inodegc_start: ... xfs_inodegc_worker: ... xfs_inode_inactivating: ... ino 0x83 ... The first few lines show that the inode is removed and need inactive state set, but the inactivation work has not completed before the inodegc mechanism stops. The inactivation doesn't actually occur until the fs is unfrozen and the gc mechanism starts back up. Note that this test requires fsfreeze to reproduce because xfs_freeze indirectly invokes xfs_fs_statfs(), which calls xfs_inodegc_flush(). When this occurs, the workqueue try_to_grab_pending() logic first tries to steal the pending bit, which does not succeed because the bit has been set by queue_work_on(). Subsequently, it checks for association of a pool workqueue from the work item under the pool lock. This association is set at the point a work item is queued and cleared when dequeued for processing. If the association exists, the work item is removed from the queue and cancel_work_sync() returns true. If the pwq association is cleared, the remove attempt assumes the task is busy and retries (eventually returning false to the caller after waiting for the work task to complete). To avoid this race, we can flush each work item explicitly before cancel. However, since the _queue_all() already schedules each underlying work item, the workqueue level helpers are sufficient to achieve the same ordering effect. E.g., the inodegc enabled flag prevents scheduling any further work in the _stop() case. Use the drain_workqueue() helper in this particular case to make the intent a bit more self explanatory. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-01-19 14:58:26 -08:00
Alexei Starovoitov	4e950747ba	Merge branch 'bpf: allow cgroup progs to export custom retval to userspace' YiFei Zhu says: ==================== Right now, most cgroup hooks are best used for permission checks. They can only reject a syscall with -EPERM, so a cause of a rejection, if the rejected by eBPF cgroup hooks, is ambiguous to userspace. Additionally, if the syscalls are implemented in eBPF, all permission checks and the implementation has to happen within the same filter, as programs executed later in the series of progs are unaware of the return values return by the previous progs. This patch series adds two helpers, bpf_get_retval and bpf_set_retval, that allows hooks to get/set the return value of syscall to userspace. This also allows later progs to retrieve retval set by previous progs. For legacy programs that rejects a syscall without setting the retval, for backwards compatibility, if a prog rejects without itself or a prior prog setting retval to an -err, the retval is set by the kernel to -EPERM. For getsockopt hooks that has ctx->retval, this variable mirrors that that accessed by the helpers. Additionally, the following user-visible behavior for getsockopt hooks has changed: - If a prior filter rejected the syscall, it will be visible in ctx->retval. - Attempting to change the retval arbitrarily is now allowed and will not cause an -EFAULT. - If kernel rejects a getsockopt syscall before running the hooks, the error will be visible in ctx->retval. Returning 0 from the prog will not overwrite the error to -EPERM unless there is an explicit call of bpf_set_retval(-EPERM) Tests have been added in this series to test the behavior of the helper with cgroup setsockopt getsockopt hooks. Patch 1 changes the API of macros to prepare for the next patch and should be a no-op. Patch 2 moves ctx->retval to a struct pointed to by current task_struct. Patch 3 implements the helpers. Patch 4 tests the behaviors of the helpers. Patch 5 updates a test after the test broke due to the visible changes. v1 -> v2: - errno -> retval - split one helper to get & set helpers - allow retval to be set arbitrarily in the general case - made the helper retval and context retval mirror each other ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 13:05:30 -08:00
YiFei Zhu	1080ef5cc0	selftests/bpf: Update sockopt_sk test to the use bpf_set_retval The tests would break without this patch, because at one point it calls getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen) This getsockopt receives the kernel-set -EINVAL. Prior to this patch series, the eBPF getsockopt hook's -EPERM would override kernel's -EINVAL, however, after this patch series, return 0's automatic -EPERM will not; the eBPF prog has to explicitly bpf_set_retval(-EPERM) if that is wanted. I also removed the explicit mentions of EPERM in the comments in the prog. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/4f20b77cb46812dbc2bdcd7e3fa87c7573bde55e.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 13:05:13 -08:00
YiFei Zhu	b8bff6f890	selftests/bpf: Test bpf_{get,set}_retval behavior with cgroup/sockopt The tests checks how different ways of interacting with the helpers (getting retval, setting EUNATCH, EISCONN, and legacy reject returning 0 without setting retval), produce different results in both the setsockopt syscall and the retval returned by the helper. A few more tests verify the interaction between the retval of the helper and the retval in getsockopt context. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/43ec60d679ae3f4f6fd2460559c28b63cb93cd12.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 12:51:30 -08:00
YiFei Zhu	b44123b4a3	bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value The helpers continue to use int for retval because all the hooks are int-returning rather than long-returning. The return value of bpf_set_retval is int for future-proofing, in case in the future there may be errors trying to set the retval. After the previous patch, if a program rejects a syscall by returning 0, an -EPERM will be generated no matter if the retval is already set to -err. This patch change it being forced only if retval is not -err. This is because we want to support, for example, invoking bpf_set_retval(-EINVAL) and return 0, and have the syscall return value be -EINVAL not -EPERM. For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is that, if the return value is NET_XMIT_DROP, the packet is silently dropped. We preserve this behavior for backward compatibility reasons, so even if an errno is set, the errno does not return to caller. However, setting a non-err to retval cannot propagate so this is not allowed and we return a -EFAULT in that case. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/b4013fd5d16bed0b01977c1fafdeae12e1de61fb.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 12:51:30 -08:00
YiFei Zhu	c4dcfdd406	bpf: Move getsockopt retval to struct bpf_cg_run_ctx The retval value is moved to struct bpf_cg_run_ctx for ease of access in different prog types with different context structs layouts. The helper implementation (to be added in a later patch in the series) can simply perform a container_of from current->bpf_ctx to retrieve bpf_cg_run_ctx. Unfortunately, there is no easy way to access the current task_struct via the verifier BPF bytecode rewrite, aside from possibly calling a helper, so a pointer to current task is added to struct bpf_sockopt_kern so that the rewritten BPF bytecode can access struct bpf_cg_run_ctx with an indirection. For backward compatibility, if a getsockopt program rejects a syscall by returning 0, an -EPERM will be generated, by having the BPF_PROG_RUN_ARRAY_CG family macros automatically set the retval to -EPERM. Unlike prior to this patch, this -EPERM will be visible to ctx->retval for any other hooks down the line in the prog array. Additionally, the restriction that getsockopt filters can only set the retval to 0 is removed, considering that certain getsockopt implementations may return optlen. Filters are now able to set the value arbitrarily. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/73b0325f5c29912ccea7ea57ec1ed4d388fc1d37.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 12:51:30 -08:00
YiFei Zhu	f10d059661	bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean Right now BPF_PROG_RUN_ARRAY and related macros return 1 or 0 for whether the prog array allows or rejects whatever is being hooked. The caller of these macros then return -EPERM or continue processing based on thw macro's return value. Unforunately this is inflexible, since -EPERM is the only err that can be returned. This patch should be a no-op; it prepares for the next patch. The returning of the -EPERM is moved to inside the macros, so the outer functions are directly returning what the macros returned if they are non-zero. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/788abcdca55886d1f43274c918eaa9f792a9f33b.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 12:51:30 -08:00
Jens Axboe	73031f761c	io-wq: delete dead lock shuffling code We used to have more code around the work loop, but now the goto and lock juggling just makes it less readable than it should. Get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-19 13:11:58 -07:00
Sam Shih	b4966a7dc0	clk: mediatek: relicense mt7986 clock driver to GPL-2.0 The previous mt7986 clock drivers were incorrectly marked as GPL-1.0. This patch changes the driver to the standard GPL-2.0 license. Signed-off-by: Sam Shih <sam.shih@mediatek.com> Link: https://lore.kernel.org/r/20220119123658.10095-2-sam.shih@mediatek.com Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>	2022-01-19 12:05:07 -08:00
Kui-Feng Lee	d81283d272	libbpf: Improve btf__add_btf() with an additional hashmap for strings. Add a hashmap to map the string offsets from a source btf to the string offsets from a target btf to reduce overheads. btf__add_btf() calls btf__add_str() to add strings from a source to a target btf. It causes many string comparisons, and it is a major hotspot when adding a big btf. btf__add_str() uses strcmp() to check if a hash entry is the right one. The extra hashmap here compares offsets of strings, that are much cheaper. It remembers the results of btf__add_str() for later uses to reduce the cost. We are parallelizing BTF encoding for pahole by creating separated btf instances for worker threads. These per-thread btf instances will be added to the btf instance of the main thread by calling btf__add_str() to deduplicate and write out. With this patch and -j4, the running time of pahole drops to about 6.0s from 6.6s. The following lines are the summary of 'perf stat' w/o the change. 6.668126396 seconds time elapsed 13.451054000 seconds user 0.715520000 seconds sys The following lines are the summary w/ the change. 5.986973919 seconds time elapsed 12.939903000 seconds user 0.724152000 seconds sys V4 fixes a bug of error checking against the pointer returned by hashmap__new(). [v3] https://lore.kernel.org/bpf/20220118232053.2113139-1-kuifeng@fb.com/ [v2] https://lore.kernel.org/bpf/20220114193713.461349-1-kuifeng@fb.com/ Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220119180214.255634-1-kuifeng@fb.com	2022-01-19 10:57:27 -08:00

... 60 61 62 63 64 ...

1075150 commits