When building an external module, if users don't need to separate the
compilation output and source code, they run the following command:
"make -C $(LINUX_SRC_DIR) M=$(PWD)". At this point, "$(KBUILD_EXTMOD)"
and "$(src)" are the same.
If they need to separate them, they run "make -C $(KERNEL_SRC_DIR)
O=$(KERNEL_OUT_DIR) M=$(OUT_DIR) src=$(PWD)". Before running the
command, they need to copy "Kbuild" or "Makefile" to "$(OUT_DIR)" to
prevent compilation failure.
So the kernel should change the included path to avoid the copy operation.
Signed-off-by: Jing Leng <jleng@ambarella.com>
[masahiro: I do not think "M=$(OUT_DIR) src=$(PWD)" is the official way,
but this patch is a nice clean up anyway.]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Make the default terminal emulator configurable so e.g.
Debian can set it to x-terminal-emulator instead of the
current default of xterm.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-by: Ritesh Raj Sarraf <ritesh@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
set. This change improves DM's hipri bio polling (REQ_POLLED)
performance by 7 - 20% depending on the system.
- Update DM core to use jump_labels to further reduce cost of unlikely
branches for zoned block devices, dm-stats and swap_bios throttling.
- Various DM core changes to reduce bio-based DM overhead and simplify
IO accounting.
- Fundamental DM core improvements to dm_io reference counting and the
elimination of using bio_split()+bio_chain() -- instead DM's
bio-based IO accounting is updated to account that a split occurred.
- Improve DM core's abnormal bio processing to do less work.
- Improve DM core's hipri polling support to use a single list rather
than an hlist.
- Update DM core to pass NULL bdev to bio_alloc_clone() so that
initialization that isn't useful for DM can be elided.
- Add cond_resched to DM stats' various loops that loop over all
entries.
- Fix incorrect error code return from DM integrity's constructor.
- Make DM crypt's printing of the key constant-time.
- Update bio-based DM multipath to provide high-resolution timer to
the Historical Service Time (HST) path selector.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmKOiAwACgkQxSPxCi2d
A1qCxgf/VLmiywUR7zCIDiPyJkc547z2MlPVaTCzpclklyLZ5wgfTtqsb8RnEqRs
KIXrvKbakbfZtrG0k9ZYdbIIVO4jHQavCXtVZH0Mj4XqQArDhHxZzlBx8i6BirO5
aya7HKWHcXj+s7GF146hRcPtD53LbSNDrB/bVaiZcHxOemthSgtht2xwmZfzoCcS
JtorpWkA97lgDbdSAcR0kn1zkHDWFkQMJgnUL9FotnlhOVXZVUvAbgb4OjfxB3fG
bCYlytDhs+6VwO+/CZXcOA7LNNtkLJPEkxUxjrChS5do64SgJUlDvY66iB6+7K0w
0Jri5D/NZvjwYVeLgXn5UwyQJ75cjQ==
=zlaE
-----END PGP SIGNATURE-----
Merge tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set.
This change improves DM's hipri bio polling (REQ_POLLED) performance
by 7 - 20% depending on the system.
- Update DM core to use jump_labels to further reduce cost of unlikely
branches for zoned block devices, dm-stats and swap_bios throttling.
- Various DM core changes to reduce bio-based DM overhead and simplify
IO accounting.
- Fundamental DM core improvements to dm_io reference counting and the
elimination of using bio_split()+bio_chain() -- instead DM's
bio-based IO accounting is updated to account that a split occurred.
- Improve DM core's abnormal bio processing to do less work.
- Improve DM core's hipri polling support to use a single list rather
than an hlist.
- Update DM core to pass NULL bdev to bio_alloc_clone() so that
initialization that isn't useful for DM can be elided.
- Add cond_resched to DM stats' various loops that loop over all
entries.
- Fix incorrect error code return from DM integrity's constructor.
- Make DM crypt's printing of the key constant-time.
- Update bio-based DM multipath to provide high-resolution timer to the
Historical Service Time (HST) path selector.
* tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
dm: pass NULL bdev to bio_alloc_clone
dm cache metadata: remove unnecessary variable in __dump_mapping
dm mpath: provide high-resolution timer to HST for bio-based
dm crypt: make printing of the key constant-time
dm integrity: fix error code in dm_integrity_ctr()
dm stats: add cond_resched when looping over entries
dm: improve abnormal bio processing
dm: simplify bio-based IO accounting further
dm: put all polled dm_io instances into a single list
dm: improve dm_io reference counting
dm: don't grab target io reference in dm_zone_map_bio
dm: improve bio splitting and associated IO accounting
dm: switch to bdev based IO accounting interfaces
dm: pass dm_io instance to dm_io_acct directly
dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct
dm: use bio_sectors in dm_aceept_partial_bio
dm: simplify basic targets
dm: conditionally enable branching for less used features
dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
dm: move hot dm_io members to same cacheline as dm_target_io
...
Small collection of incremental improvement patches:
- Minor code cleanup patches, comment improvements, etc from static tools
- Clean the some of the kernel caps, reducing the historical stealth uAPI
leftovers
- Bug fixes and minor changes for rdmavt, hns, rxe, irdma
- Remove unimplemented cruft from rxe
- Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer
- flush_workqueue(system_unbound_wq) removal
- Ensure rxe waits for objects to be unused before allowing the core to
free them
- Several rc quality bug fixes for hfi1
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYo+NxgAKCRCFwuHvBreF
YbqSAQDJ+QolaATUvOQUPLbuLopUCJLe95VS15Kl3SNXiVUUFAEA8DLL1s6+WShd
AgypUxGHipx5BAytrn45/WiwuDeEbQ8=
=jgTl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Small collection of incremental improvement patches:
- Minor code cleanup patches, comment improvements, etc from static
tools
- Clean the some of the kernel caps, reducing the historical stealth
uAPI leftovers
- Bug fixes and minor changes for rdmavt, hns, rxe, irdma
- Remove unimplemented cruft from rxe
- Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
layer
- flush_workqueue(system_unbound_wq) removal
- Ensure rxe waits for objects to be unused before allowing the core
to free them
- Several rc quality bug fixes for hfi1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
RDMA/rtrs-clt: Fix one kernel-doc comment
RDMA/hfi1: Remove all traces of diagpkt support
RDMA/hfi1: Consolidate software versions
RDMA/hfi1: Remove pointless driver version
RDMA/hfi1: Fix potential integer multiplication overflow errors
RDMA/hfi1: Prevent panic when SDMA is disabled
RDMA/hfi1: Prevent use of lock before it is initialized
RDMA/rxe: Fix an error handling path in rxe_get_mcg()
IB/core: Fix typo in comment
RDMA/core: Fix typo in comment
IB/hf1: Fix typo in comment
IB/qib: Fix typo in comment
IB/iser: Fix typo in comment
RDMA/mlx4: Avoid flush_scheduled_work() usage
IB/isert: Avoid flush_scheduled_work() usage
RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
RDMA/irdma: Add SW mechanism to generate completions on error
...
We introduce "courteous server" in this release. Previously NFSD
would purge open and lock state for an unresponsive client after
one lease period (typically 90 seconds). Now, after one lease
period, another client can open and lock those files and the
unresponsive client's lease is purged; otherwise if the unrespon-
sive client's open and lock state is uncontended, the server retains
that open and lock state for up to 24 hours, allowing the client's
workload to resume after a lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error
to the client, but leave the newly created file in place as an
artifact. The file creation code path has been reorganized so that
internal failures and race conditions are less likely to result in
an unwanted file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems.
Many of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix
for an ancient "sleep while spin-locked" splat that seems to have
become easier to hit since v5.18-rc3.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKPliAACgkQM2qzM29m
f5dB3BAAorPa2L8xu5P1Ge1oTNogNSOVRkLPDzEkfEwK07ZM2qvz78eMZGkMziJ/
strorvBWl3SWBlVtTePgNpJUjgYQ75MRRwaX7Qh2WuHeRKm1JlZm0/NId3+zKgbh
N40QI20jdswWcNDuhidxVFFWurd09GlcM4z1cu8gZLbfthkiUOjZoPiLkXeNcvhk
7wC9GiueWxHefYQQDAKh1nQS/L0GG1EkzJdJo7WUVAldZ9qVY9LpmJVMRqrBBbta
XrFYfpeY1zFFDY4Qolyz5PUJSeQuDj9PctlhoZ6B1hp56PD/6yaqVhYXiPxtlALj
tITtktfiekULZkgfvfvyzssCv+wkbYiaEBZcSSCauR7dkGOmBmajO+cf7vpsERgE
fbCU8DWGk78SMeehdCrO+26cV37VP+8c2t2Txq/rG5Eq4ZoCi++Hj5poRboFLqb+
oom+0Ee0LfcAKXkxH5gWTPTblHo49GzGitPZtRzTgZ9uFnVwvEaJ4+t0ij0J8JpL
HuVtWrg5/REhqpEvOSwF0sRmkYWLTu7KdueGn/iZ8xUi7GHEue01NsVkClohKJcR
WOjWrbNCNF/LJaG88MX0z5u7IO7s9bOHphd7PJ92vR+4YsehW3uRhk+rNi2ZBqQz
hzULfu8BiaicV9fdB/hDcMmKQD6U6due2AVVPtxTf5XY+CHQNRY=
=phE1
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"We introduce 'courteous server' in this release. Previously NFSD would
purge open and lock state for an unresponsive client after one lease
period (typically 90 seconds). Now, after one lease period, another
client can open and lock those files and the unresponsive client's
lease is purged; otherwise if the unresponsive client's open and lock
state is uncontended, the server retains that open and lock state for
up to 24 hours, allowing the client's workload to resume after a
lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error to
the client, but leave the newly created file in place as an artifact.
The file creation code path has been reorganized so that internal
failures and race conditions are less likely to result in an unwanted
file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems. Many
of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix for
an ancient 'sleep while spin-locked' splat that seems to have become
easier to hit since v5.18-rc3"
* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
NFSD: nfsd_file_put() can sleep
NFSD: Add documenting comment for nfsd4_release_lockowner()
NFSD: Modernize nfsd4_release_lockowner()
NFSD: Fix possible sleep during nfsd4_release_lockowner()
nfsd: destroy percpu stats counters after reply cache shutdown
nfsd: Fix null-ptr-deref in nfsd_fill_super()
nfsd: Unregister the cld notifier when laundry_wq create failed
SUNRPC: Use RMW bitops in single-threaded hot paths
NFSD: Clean up the show_nf_flags() macro
NFSD: Trace filecache opens
NFSD: Move documenting comment for nfsd4_process_open2()
NFSD: Fix whitespace
NFSD: Remove dprintk call sites from tail of nfsd4_open()
NFSD: Instantiate a struct file when creating a regular NFSv4 file
NFSD: Clean up nfsd_open_verified()
NFSD: Remove do_nfsd_create()
NFSD: Refactor NFSv4 OPEN(CREATE)
NFSD: Refactor NFSv3 CREATE
NFSD: Refactor nfsd_create_setattr()
NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
...
In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
does not provide any ordering guarantee as test_bit() is not an atomic
operation. This, added to the fact that the spin_trylock() call at
the beginning of qdisc_run_begin() does not guarantee acquire
semantics if it does not grab the lock, makes it possible for the
following statement :
if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))
to be executed before an enqueue operation called before
qdisc_run_begin().
As a result the following race can happen :
CPU 1 CPU 2
qdisc_run_begin() qdisc_run_begin() /* true */
set(MISSED) .
/* returns false */ .
. /* sees MISSED = 1 */
. /* so qdisc not empty */
. __qdisc_run()
. .
. pfifo_fast_dequeue()
----> /* may be done here */ .
| . clear(MISSED)
| . .
| . smp_mb __after_atomic();
| . .
| . /* recheck the queue */
| . /* nothing => exit */
| enqueue(skb1)
| .
| qdisc_run_begin()
| .
| spin_trylock() /* fail */
| .
| smp_mb__before_atomic() /* not enough */
| .
---- if (test_bit(MISSED))
return false; /* exit */
In the above scenario, CPU 1 and CPU 2 both try to grab the
qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
bypass code path, where it emits its skb then calls __qdisc_run().
CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
dequeue() code path. But when executing qdisc_run_begin() for the
second time, after enqueuing its skbuff, it sees the MISSED bit still
set (by itself) and consequently chooses to exit early without setting
it again nor trying to grab the spinlock again.
Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
and found it empty, so it returned.
At the end of the sequence, we end up with skb1 enqueued in the
backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
and no __netif_schedule() called made. skb1 will now linger in the
qdisc until somebody later performs a full __qdisc_run(). Associated
to the bypass capacity of the qdisc, and the ability of the TCP layer
to avoid resending packets which it knows are still in the qdisc, this
can lead to serious traffic "holes" in a TCP connection.
We fix this by replacing the smp_mb__before_atomic() / test_bit() /
set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
by a single test_and_set_bit() call, which is more concise and
enforces the needed memory barriers.
Fixes: 89837eb4b2 ("net: sched: add barrier to ensure correct ordering for lockless qdisc")
Signed-off-by: Vincent Ray <vray@kalrayinc.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the moment, if devm_of_phy_get() returns an error the serdes
simply isn't set. While it is bad to ignore an error in general, there
is a particular bug that network isn't working if the serdes driver is
compiled as a module. In that case, devm_of_phy_get() returns
-EDEFER_PROBE and the error is silently ignored.
The serdes is optional, it is not there if the port is using RGMII, in
which case devm_of_phy_get() returns -ENODEV. Rearrange the error
handling so that -ENODEV will be handled but other error codes will
abort the probing.
Fixes: d28d6d2e37 ("net: lan966x: add port module support")
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20220525231239.1307298-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix UAF when creating non-stateful expression in set.
2) Set limit cost when cloning expression accordingly, from Phil Sutter.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_limit: Clone packet limits' cost value
netfilter: nf_tables: disallow non-stateful expression in sets earlier
====================
Link: https://lore.kernel.org/r/20220526205411.315136-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The parameter name in comments of event_trigger_separate_filter() is
inconsistent with actual parameter name, fix it.
Link: https://lkml.kernel.org/r/20220526072957.165655-1-sunliming@kylinos.cn
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Commit:
4b9a8dca0e ("x86/idt: Remove the tracing IDT completely")
removed the 'tracing IDT' from arch/x86/kernel/tracepoint.c,
but left related comment. So that the comment become anachronistic.
Just remove the comment.
Link: https://lkml.kernel.org/r/20220526110831.175743-1-sunliming@kylinos.cn
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Commit 4b9a8dca0e ("x86/idt: Remove the tracing IDT completely")
removed the tracing IDT from the file arch/x86/kernel/tracepoint.c,
but left the related headers unused, remove it.
Link: https://lkml.kernel.org/r/20220525012827.93464-1-sunliming@kylinos.cn
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The name in comments of parameter "filter_string" in function
create_filter is annotated as "filter_str", just fix it.
Link: https://lkml.kernel.org/r/20220524063937.52873-1-sunliming@kylinos.cn
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Functions in trace_preemptirq.c could be invoked from early interrupt
code that bypasses kcov trace function's in_task() check. Disable kcov
on this file to reduce random code coverage.
Link: https://lkml.kernel.org/r/20220523063033.1778974-1-liu3101@purdue.edu
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Initialize the integer variable to 0 to fix the clang scan warning:
Undefined or garbage value returned to caller
[core.uninitialized.UndefReturn]
return ret;
Link: https://lkml.kernel.org/r/20220522061826.1751-1-gautammenghani201@gmail.com
Cc: stable@vger.kernel.org
Fixes: 8993665abc ("tracing/boot: Support multiple handlers for per-event histogram")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
All instances of the function ftrace_arch_modify_prepare() and
ftrace_arch_modify_post_process() return zero. There's no point in
checking their return value. Just have them be void functions.
Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The "char []" string form declares a single variable. It is better
than "char *" which creates two variables in the final assembly.
Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com
Signed-off-by: liqiong <liqiong@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There is no need to wakeup the timerlat/ thread if stop tracing is hit
at the timerlat's IRQ handler.
Return before waking up timerlat's thread.
Link: https://lkml.kernel.org/r/b392356c91b56aedd2b289513cc56a84cf87e60d.1652175637.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If print_stack and stop_tracing_us are set, and stop_tracing_us is hit
with latency higher than or equal to print_stack, print the
stack at the IRQ handler as it is useful to define the root cause for
the IRQ latency.
Link: https://lkml.kernel.org/r/fd04530ce98ae9270e41bb124ee5bf67b05ecfed.1652175637.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently, the notification of a new max latency is sent from
timerlat's IRQ handler anytime a new max latency is found.
While this behavior is not wrong, the send IPI overhead itself
will increase the thread latency and that is not the desired
effect (tracing overhead).
Moreover, the thread will notify a new max latency again because
the thread latency as it is always higher than the IRQ latency
that woke it up.
The only case in which it is helpful to notify a new max latency
from IRQ is when stop tracing (for the IRQ) is set, as in this
case, the thread will not be dispatched.
Notify a new max latency from the IRQ handler only if stop tracing is
set for the IRQ handler.
Link: https://lkml.kernel.org/r/2c2d9a56c0886c8402ba320de32856cbbb10c2bb.1652175637.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Reported-by: Clark Williams <williams@redhat.com>
Fixes: a955d7eac1 ("trace: Add timerlat tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Max Filippov reported:
When building kernel with CONFIG_KRETPROBES=n kernel/kprobes.c
compilation fails with the following messages:
kernel/kprobes.c: In function ‘recycle_rp_inst’:
kernel/kprobes.c:1273:32: error: implicit declaration of function
‘get_kretprobe’
kernel/kprobes.c: In function ‘kprobe_flush_task’:
kernel/kprobes.c:1299:35: error: ‘struct task_struct’ has no member
named ‘kretprobe_instances’
This came from the commit d741bf41d7 ("kprobes: Remove
kretprobe hash") which introduced get_kretprobe() and
kretprobe_instances member in task_struct when CONFIG_KRETPROBES=y,
but did not make recycle_rp_inst() and kprobe_flush_task()
depending on CONFIG_KRETPORBES.
Since those functions are only used for kretprobe, move those
functions into #ifdef CONFIG_KRETPROBE area.
Link: https://lkml.kernel.org/r/165163539094.74407.3838114721073251225.stgit@devnote2
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Fixes: d741bf41d7 ("kprobes: Remove kretprobe hash")
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: linuxgeek@linuxgeek.io
Cc: stable@vger.kernel.org
Fixes: 4909010788 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
In create_var_ref(), init_var_ref() is called to initialize the fields
of variable ref_field, which is allocated in the previous function call
to create_hist_field(). Function init_var_ref() allocates the
corresponding fields such as ref_field->system, but frees these fields
when the function encounters an error. The caller later calls
destroy_hist_field() to conduct error handling, which frees the fields
and the variable itself. This results in double free of the fields which
are already freed in the previous function.
Fix this by storing NULL to the corresponding fields when they are freed
in init_var_ref().
Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
CC: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The tracing_set_trace_write() function just removes the trailing whitespace
from the user supplied tracer name, but the leading whitespace should also
be removed.
In addition, if the user supplied tracer name contains only a few
whitespace characters, the first one will not be removed using the current
method, which results it a single whitespace character left in the buf.
To fix all of these issues, we use strim() to correctly remove both the
leading and trailing whitespace.
Link: https://lkml.kernel.org/r/20220121095623.1826679-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The ftrace_process_locs() function may return -ENOMEM error code, which
should be handled by the callers.
Link: https://lkml.kernel.org/r/20220120065949.1813231-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Creating tracefs entries with tracefs_create_file() followed by pr_warn()
is tedious and repetitive, we can use trace_create_file() to simplify
this process and make the code more readable.
Link: https://lkml.kernel.org/r/20220114131052.534382-1-ytcoode@gmail.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that sets up logged xattr updates out of libxfs and into
xfs_xattr.c so that libxfs no longer has to know about xlog_* functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that enables logged xattr updates out of "lib"xlog and into
xfs_xattr.c so that it no longer has to know about xlog_* functions.
While we're at it, give xfs_xattr.c its own header file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Since LARP is an experimental debug-only feature, we should try to warn
about it being in use once per mount, not once per reboot.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Currently, we don't have a consistent story around logging when an
EXPERIMENTAL feature gets turned on at runtime -- online fsck and shrink
log a message once per day across all mounts, and the recently merged
LARP mode only ever does it once per insmod cycle or reboot.
Because EXPERIMENTAL tags are supposed to go away eventually, convert
the existing daily warnings into state flags that travel with the mount,
and warn once per mount. Making this an opstate flag means that we'll
be able to capture the experimental usage in the ftrace output too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
There's no need to spam the logs every time we clear the log incompat
flags -- if someone is periodically using one of these features, they'll
be cleared every time the log tries to clean itself, which can get
pretty chatty:
$ dmesg | grep -i clear
[ 5363.894711] XFS (sdd): Clearing log incompat feature flags.
[ 5365.157516] XFS (sdd): Clearing log incompat feature flags.
[ 5369.388543] XFS (sdd): Clearing log incompat feature flags.
[ 5371.281246] XFS (sdd): Clearing log incompat feature flags.
These aren't high value messages either -- nothing's gone wrong, and
nobody's trying anything tricky.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Fixups and enhancements for OpenRISC:
- A few sparse warning fixups and other cleanups I noticed when working
on a recent TLB bug found on a new OpenRISC core bring up.
- A few fixup's from me and Jason A Donenfeld to help shutdown OpenRISC
platforms when running CI tests
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmKP+24ACgkQw7McLV5m
J+Q59w//eiFQv/L//IPZ//fDhDKYILs0JYnJGmAY2IBFpI/m/KkSWOeo32L5qyXe
IlvbxyrM3RVzDQVS/q1DG7vz9HuEI52asuefPEUeCmDSEcZH6g8ZzRHOrYqqDcM6
nKBoi2CIv9BpfkOdoewgUdaMoijhIWsT5QSZpLjkfPZvc2E78tlHvWiaZffZAJiR
XliE16XeuuKN6/vu5MNK2qExwsO9ayAiylv1K6bfN/0KkJEysY2FscRifHCbwscs
bGi367C7XfHE8Tks31SyUDqNlD760FjjL+ygAMG63v20mswhQggwEb/dApjzHj1l
MwrRBOG6CGYHE+AATsQLrd4N+fM9NNMfIARhtnbQYQlicAQpwGWNIPbpe1WOnBcw
KfXaiJwUcntf3bfBLyfZVvWs0ZYMgfUD8NGX4nDtTomywdxSUlTPefj2KP7UKiaV
NRwCOgZaT5iRzuJT832wCIRU75CjbFLvZ3JgMVwZtAlmGKTUopAikxvHTKRDDybn
O9ghq1b34hJf0BmnPTWF0pELTu2z3Vpt8qrUdNtfy3ICNiMDJV0e3aUH1gFJ1LtA
++bh98wErqaxVsNzQXS7PUwiB2xmUB8YGMfJoQIN/spysJ3mAlOjVPxFKLVXHopi
aHatlCvIV/Lc4ygt7no8PHtjvt8JckwCRpG0BLnrw09TokAQkp0=
=JxMd
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
- A few sparse warning fixups and other cleanups I noticed when working
on a recent TLB bug found on a new OpenRISC core bring up.
- A few fixup's from me and Jason A Donenfeld to help shutdown OpenRISC
platforms when running CI tests
* tag 'for-linus' of https://github.com/openrisc/linux:
openrisc: Allow power off handler overriding
openrisc: Remove unused IMMU tlb workardound
openrisc/fault: Fix symbol scope warnings
openrisc/delay: Add include to fix symbol not declared warning
openrisc/time: Fix symbol scope warnings
openrisc/traps: Declare unhandled_exception for asmlinkage
openrisc/traps: Remove die_if_kernel function
openrisc/traps: Declare file scope symbols as static
openrisc: Update litex defconfig to support glibc userland
openrisc: Pretty print show_registers memory dumps
openrisc: Add syscall details to emergency syscall debugging
openrisc: Add support for liteuart emergency printing
openrisc: Cleanup emergency print handling
openrisc: Add gcc machine instruction flag configuration
openrisc: define nop command for simulator reboot
openrisc: remove bogus nops and shutdowns
openrisc: fix typos in comments
While we're messing around with how recovery allocates and frees the
buffer cancellation table, convert the allocation to use kmalloc_array
instead of the old kmem_alloc APIs, and make it handle a null return,
even though that's not likely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Move the code that allocates and frees the buffer cancellation tables
used by log recovery into the file that actually uses the tables. This
is a precursor to some cleanups and a memory leak fix.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The recent patch to improve btree cycle checking caused a regression
when I rebased the in-memory btree branch atop the 5.19 for-next branch,
because in-memory short-pointer btrees do not have AG numbers. This
produced the following complaint from kmemleak:
unreferenced object 0xffff88803d47dde8 (size 264):
comm "xfs_io", pid 4889, jiffies 4294906764 (age 24.072s)
hex dump (first 32 bytes):
90 4d 0b 0f 80 88 ff ff 00 a0 bd 05 80 88 ff ff .M..............
e0 44 3a a0 ff ff ff ff 00 df 08 06 80 88 ff ff .D:.............
backtrace:
[<ffffffffa0388059>] xfbtree_dup_cursor+0x49/0xc0 [xfs]
[<ffffffffa029887b>] xfs_btree_dup_cursor+0x3b/0x200 [xfs]
[<ffffffffa029af5d>] __xfs_btree_split+0x6ad/0x820 [xfs]
[<ffffffffa029b130>] xfs_btree_split+0x60/0x110 [xfs]
[<ffffffffa029f6da>] xfs_btree_make_block_unfull+0x19a/0x1f0 [xfs]
[<ffffffffa029fada>] xfs_btree_insrec+0x3aa/0x810 [xfs]
[<ffffffffa029fff3>] xfs_btree_insert+0xb3/0x240 [xfs]
[<ffffffffa02cb729>] xfs_rmap_insert+0x99/0x200 [xfs]
[<ffffffffa02cf142>] xfs_rmap_map_shared+0x192/0x5f0 [xfs]
[<ffffffffa02cf60b>] xfs_rmap_map_raw+0x6b/0x90 [xfs]
[<ffffffffa0384a85>] xrep_rmap_stash+0xd5/0x1d0 [xfs]
[<ffffffffa0384dc0>] xrep_rmap_visit_bmbt+0xa0/0xf0 [xfs]
[<ffffffffa0384fb6>] xrep_rmap_scan_iext+0x56/0xa0 [xfs]
[<ffffffffa03850d8>] xrep_rmap_scan_ifork+0xd8/0x160 [xfs]
[<ffffffffa0385195>] xrep_rmap_scan_inode+0x35/0x80 [xfs]
[<ffffffffa03852ee>] xrep_rmap_find_rmaps+0x10e/0x270 [xfs]
I noticed that xfs_btree_insrec has a bunch of debug code that return
out of the function immediately, without freeing the "new" btree cursor
that can be returned when _make_block_unfull calls xfs_btree_split. Fix
the error return in this function to free the btree cursor.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs/434 and xfs/436 have been reporting occasional memory leaks of
xfs_dquot objects. These tests themselves were the messenger, not the
culprit, since they unload the xfs module, which trips the slub
debugging code while tearing down all the xfs slab caches:
=============================================================================
BUG xfs_dquot (Tainted: G W ): Objects remaining in xfs_dquot on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Slab 0xffffea000606de00 objects=30 used=5 fp=0xffff888181b78a78 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff)
CPU: 0 PID: 3953166 Comm: modprobe Tainted: G W 5.18.0-rc6-djwx #rc6 d5824be9e46a2393677bda868f9b154d917ca6a7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
Since we don't generally rmmod the xfs module between fstests, this
means that xfs/434 is really just the canary in the coal mine --
something leaked a dquot, but we don't know who. After days of pounding
on fstests with kmemleak enabled, I finally got it to spit this out:
unreferenced object 0xffff8880465654c0 (size 536):
comm "u10:4", pid 88, jiffies 4294935810 (age 29.512s)
hex dump (first 32 bytes):
60 4a 56 46 80 88 ff ff 58 ea e4 5c 80 88 ff ff `JVF....X..\....
00 e0 52 49 80 88 ff ff 01 00 01 00 00 00 00 00 ..RI............
backtrace:
[<ffffffffa0740f6c>] xfs_dquot_alloc+0x2c/0x530 [xfs]
[<ffffffffa07443df>] xfs_qm_dqread+0x6f/0x330 [xfs]
[<ffffffffa07462a2>] xfs_qm_dqget+0x132/0x4e0 [xfs]
[<ffffffffa0756bb0>] xfs_qm_quotacheck_dqadjust+0xa0/0x3e0 [xfs]
[<ffffffffa075724d>] xfs_qm_dqusage_adjust+0x35d/0x4f0 [xfs]
[<ffffffffa06c9068>] xfs_iwalk_ag_recs+0x348/0x5d0 [xfs]
[<ffffffffa06c95d3>] xfs_iwalk_run_callbacks+0x273/0x540 [xfs]
[<ffffffffa06c9e8d>] xfs_iwalk_ag+0x5ed/0x890 [xfs]
[<ffffffffa06ca22f>] xfs_iwalk_ag_work+0xff/0x170 [xfs]
[<ffffffffa06d22c9>] xfs_pwork_work+0x79/0x130 [xfs]
[<ffffffff81170bb2>] process_one_work+0x672/0x1040
[<ffffffff81171b1b>] worker_thread+0x59b/0xec0
[<ffffffff8118711e>] kthread+0x29e/0x340
[<ffffffff810032bf>] ret_from_fork+0x1f/0x30
Now we know that quotacheck is at fault, but even this report was
canaryish -- it was triggered by xfs/494, which doesn't actually mount
any filesystems. (kmemleak can be a little slow to notice leaks, even
with fstests repeatedly whacking it to look for them.) Looking at the
*previous* fstest, however, showed that the test run before xfs/494 was
xfs/117. The tipoff to the problem is in this excerpt from dmesg:
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0xdb/0x7b0 [xfs], inode 0x119 dinode
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00 IN..............
00000010: 00 00 00 01 00 00 00 00 00 90 57 54 54 1a 4c 68 ..........WTT.Lh
00000020: 81 f9 7d e1 6d ee 16 00 34 bd 7d e1 6d ee 16 00 ..}.m...4.}.m...
00000030: 34 bd 7d e1 6d ee 16 00 00 00 00 00 00 00 00 00 4.}.m...........
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000050: 00 00 00 02 00 00 00 00 00 00 00 00 96 80 f3 ab ................
00000060: ff ff ff ff da 57 7b 11 00 00 00 00 00 00 00 03 .....W{.........
00000070: 00 00 00 01 00 00 00 10 00 00 00 00 00 00 00 08 ................
XFS (sda4): Quotacheck: Unsuccessful (Error -117): Disabling quotas.
The dinode verifier decided that the inode was corrupt, which causes
iget to return with EFSCORRUPTED. Since this happened during
quotacheck, it is obvious that the kernel aborted the inode walk on
account of the corruption error and disabled quotas. Unfortunately, we
neglect to purge the dquot cache before doing that, which is how the
dquots leaked.
The problems started 10 years ago in commit b84a3a, when the dquot lists
were converted to a radix tree, but the error handling behavior was not
correctly preserved -- in that commit, if the bulkstat failed and
usrquota was enabled, the bulkstat failure code would be overwritten by
the result of flushing all the dquots to disk. As long as that
succeeds, we'd continue the quota mount as if everything were ok, but
instead we're now operating with a corrupt inode and incorrect quota
usage counts. I didn't notice this bug in 2019 when I wrote commit
ebd126a, which changed quotacheck to skip the dqflush when the scan
doesn't complete due to inode walk failures.
Introduced-by: b84a3a9675 ("xfs: remove the per-filesystem list of dquots")
Fixes: ebd126a651 ("xfs: convert quotacheck to use the new iwalk functions")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
The problem was that an allocation failed unexpectedly in
xfs_bmbt_alloc_block() after roughly 150,000 minlen allocation error
injections, resulting in an EFSCORRUPTED error being returned to
xfs_bmapi_write(). The error occurred on extent-to-btree format
conversion allocating the new root block:
RIP: 0010:xfs_bmbt_alloc_block+0x177/0x210
Call Trace:
<TASK>
xfs_btree_new_iroot+0xdf/0x520
xfs_btree_make_block_unfull+0x10d/0x1c0
xfs_btree_insrec+0x364/0x790
xfs_btree_insert+0xaa/0x210
xfs_bmap_add_extent_hole_real+0x1fe/0x9a0
xfs_bmapi_allocate+0x34c/0x420
xfs_bmapi_write+0x53c/0x9c0
xfs_alloc_file_space+0xee/0x320
xfs_file_fallocate+0x36b/0x450
vfs_fallocate+0x148/0x340
__x64_sys_fallocate+0x3c/0x70
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa
Why the allocation failed at this point is unknown, but is likely
that we ran the transaction out of reserved space and filesystem out
of space with bmbt blocks because of all the minlen allocations
being done causing worst case fragmentation of a large allocation.
Regardless of the cause, we've then called xfs_bmapi_finish() which
calls xfs_btree_del_cursor(cur, error) to tear down the cursor.
So we have a failed operation, error != 0, cur->bc_ino.allocated > 0
and the filesystem is still up. The assert fails to take into
account that allocation can fail with an error and the transaction
teardown will shut the filesystem down if necessary. i.e. the
assert needs to check "|| error != 0" as well, because at this point
shutdown is pending because the current transaction is dirty....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Not fatal, the assert is there to catch developer attention. I'm
seeing this occasionally during recoveryloop testing after a
shutdown, and I don't want this to stop an overnight recoveryloop
run as it is currently doing.
Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
corruption report into the log and cause a test failure that way,
but it won't stop the machine dead.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Commit dc04db2aa7 has caused a small aim7 regression, showing a
small increase in CPU usage in __xfs_btree_check_sblock() as a
result of the extra checking.
This is likely due to the endian conversion of the sibling poitners
being unconditional instead of relying on the compiler to endian
convert the NULL pointer at compile time and avoiding the runtime
conversion for this common case.
Rework the checks so that endian conversion of the sibling pointers
is only done if they are not null as the original code did.
.... and these need to be "inline" because the compiler completely
fails to inline them automatically like it should be doing.
$ size fs/xfs/libxfs/xfs_btree.o*
text data bss dec hex filename
51874 240 0 52114 cb92 fs/xfs/libxfs/xfs_btree.o.orig
51562 240 0 51802 ca5a fs/xfs/libxfs/xfs_btree.o.inline
Just when you think the tools have advanced sufficiently we don't
have to care about stuff like this anymore, along comes a reminder
that *our tools still suck*.
Fixes: dc04db2aa7 ("xfs: detect self referencing btree sibling pointers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
As promised, for v5.19 I queued up quite a bit of work for modules, but
still with a pretty conservative eye. These changes have been soaking on
modules-next (and so linux-next) for quite some time, the code shift was
merged onto modules-next on March 22, and the last patch was queued on May
5th.
The following are the highlights of what bells and whistles we will get for
v5.19:
1) It was time to tidy up kernel/module.c and one way of starting with
that effort was to split it up into files. At my request Aaron Tomlin
spearheaded that effort with the goal to not introduce any
functional at all during that endeavour. The penalty for the split
is +1322 bytes total, +112 bytes in data, +1210 bytes in text while
bss is unchanged. One of the benefits of this other than helping
make the code easier to read and review is summoning more help on review
for changes with livepatching so kernel/module/livepatch.c is now
pegged as maintained by the live patching folks.
The before and after with just the move on a defconfig on x86-64:
$ size kernel/module.o
text data bss dec hex filename
38434 4540 104 43078 a846 kernel/module.o
$ size -t kernel/module/*.o
text data bss dec hex filename
4785 120 0 4905 1329 kernel/module/kallsyms.o
28577 4416 104 33097 8149 kernel/module/main.o
1158 8 0 1166 48e kernel/module/procfs.o
902 108 0 1010 3f2 kernel/module/strict_rwx.o
3390 0 0 3390 d3e kernel/module/sysfs.o
832 0 0 832 340 kernel/module/tree_lookup.o
39644 4652 104 44400 ad70 (TOTALS)
2) Aaron added module unload taint tracking (MODULE_UNLOAD_TAINT_TRACKING),
so to enable tracking unloaded modules which did taint the kernel.
3) Christophe Leroy added CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
which lets architectures to request having modules data in vmalloc
area instead of module area. There are three reasons why an
architecture might want this:
a) On some architectures (like book3s/32) it is not possible to protect
against execution on a page basis. The exec stuff can be mapped by
different arch segment sizes (on book3s/32 that is 256M segments). By
default the module area is in an Exec segment while vmalloc area is in
a NoExec segment. Using vmalloc lets you muck with module data as
NoExec on those architectures whereas before you could not.
b) By pushing more module data to vmalloc you also increase the
probability of module text to remain within a closer distance
from kernel core text and this reduces trampolines, this has been
reported on arm first and powerpc folks are following that lead.
c) Free'ing module_alloc() (Exec by default) area leaves this
exposed as Exec by default, some architectures have some
security enhancements to set this as NoExec on free, and splitting
module data with text let's future generic special allocators
be added to the kernel without having developers try to grok
the tribal knowledge per arch. Work like Rick Edgecombe's
permission vmalloc interface [0] becomes easier to address over
time.
[0] https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/#r
4) Masahiro Yamada's symbol search enhancements
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmKOnHkSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinFw4P/1ADdvfj+b6wbAxou6tPa2ZKnx/ImEnE
0T1P/n2guWg+2Q8oYjqifTpadGzr8td4c/PaGb5UpfdEOdBIyIGklrVZpQ+xkqfT
X4KIvqsf4ajL24OKxOSNtvL8RXEIDUhJ4Veq6BImBk8CPrPjsUBlNyAIlvV0aom2
BsFROQ2pMTSCiFY47gkMKLBlBny1l7zktoF0lhWTzHimw8VSDbTJFlu+fZvspd0o
lCqiHTkpiBSJDSEEjqk0lT6wIb27fvdzjmjy+Ur71bBKiPIEPiL5XNUufkGe6oB3
mnTOPow+wPTQc0dtkTpCHQYXE/a70Sbkwp1JfkbSYeHzJLlFru/tkmKiwN0RUo9l
0mY7VPEKuQWmxsOkLqvwcPBGx5JOSWOJKrbgpFmH+RLgeEgEa8t7uQDURK2KeIj8
P7ZzN5M2klKIHHA4vjfekYOJAb1Tii9Ibp7iGeiYxf93mPJBqwvRwbtBXBZpB4ce
FoDrxwEq812KPW7P2O1kgOvq7Fn1KWh0wVeKc8iBGxFxJhzOQY86H1ZRWDLAxRss
Rr1PMLt2TbTLUBt7MzR4vrg0NoQvpLYyf2jGFjWyZDRHU8nLeHkOlQot3xRDAtq9
Bpx5mSlM9BGfPibd1Kw4BaxBha5vVCQ+AcleT+NWnCjw4I0wLoFi9RLUSyItn9No
tlHLgdrM2a54
=cxtr
-----END PGP SIGNATURE-----
Merge tag 'modules-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
- It was time to tidy up kernel/module.c and one way of starting with
that effort was to split it up into files. At my request Aaron Tomlin
spearheaded that effort with the goal to not introduce any functional
at all during that endeavour. The penalty for the split is +1322
bytes total, +112 bytes in data, +1210 bytes in text while bss is
unchanged. One of the benefits of this other than helping make the
code easier to read and review is summoning more help on review for
changes with livepatching so kernel/module/livepatch.c is now pegged
as maintained by the live patching folks.
The before and after with just the move on a defconfig on x86-64:
$ size kernel/module.o
text data bss dec hex filename
38434 4540 104 43078 a846 kernel/module.o
$ size -t kernel/module/*.o
text data bss dec hex filename
4785 120 0 4905 1329 kernel/module/kallsyms.o
28577 4416 104 33097 8149 kernel/module/main.o
1158 8 0 1166 48e kernel/module/procfs.o
902 108 0 1010 3f2 kernel/module/strict_rwx.o
3390 0 0 3390 d3e kernel/module/sysfs.o
832 0 0 832 340 kernel/module/tree_lookup.o
39644 4652 104 44400 ad70 (TOTALS)
- Aaron added module unload taint tracking (MODULE_UNLOAD_TAINT_TRACKING),
to enable tracking unloaded modules which did taint the kernel.
- Christophe Leroy added CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
which lets architectures to request having modules data in vmalloc
area instead of module area. There are three reasons why an
architecture might want this:
a) On some architectures (like book3s/32) it is not possible to
protect against execution on a page basis. The exec stuff can be
mapped by different arch segment sizes (on book3s/32 that is 256M
segments). By default the module area is in an Exec segment while
vmalloc area is in a NoExec segment. Using vmalloc lets you muck
with module data as NoExec on those architectures whereas before
you could not.
b) By pushing more module data to vmalloc you also increase the
probability of module text to remain within a closer distance
from kernel core text and this reduces trampolines, this has been
reported on arm first and powerpc folks are following that lead.
c) Free'ing module_alloc() (Exec by default) area leaves this
exposed as Exec by default, some architectures have some security
enhancements to set this as NoExec on free, and splitting module
data with text let's future generic special allocators be added
to the kernel without having developers try to grok the tribal
knowledge per arch. Work like Rick Edgecombe's permission vmalloc
interface [0] becomes easier to address over time.
[0] https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/#r
- Masahiro Yamada's symbol search enhancements
* tag 'modules-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (33 commits)
module: merge check_exported_symbol() into find_exported_symbol_in_section()
module: do not binary-search in __ksymtab_gpl if fsa->gplok is false
module: do not pass opaque pointer for symbol search
module: show disallowed symbol name for inherit_taint()
module: fix [e_shstrndx].sh_size=0 OOB access
module: Introduce module unload taint tracking
module: Move module_assert_mutex_or_preempt() to internal.h
module: Make module_flags_taint() accept a module's taints bitmap and usable outside core code
module.h: simplify MODULE_IMPORT_NS
powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx
module: Remove module_addr_min and module_addr_max
module: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
module: Introduce data_layout
module: Prepare for handling several RB trees
module: Always have struct mod_tree_root
module: Rename debug_align() as strict_align()
module: Rework layout alignment to avoid BUG_ON()s
module: Move module_enable_x() and frob_text() in strict_rwx.c
module: Make module_enable_x() independent of CONFIG_ARCH_HAS_STRICT_MODULE_RWX
module: Move version support into a separate file
...
For two kernel releases now kernel/sysctl.c has been being cleaned up
slowly, since the tables were grossly long, sprinkled with tons of #ifdefs and
all this caused merge conflicts with one susbystem or another.
This tree was put together to help try to avoid conflicts with these cleanups
going on different trees at time. So nothing exciting on this pull request,
just cleanups.
I actually had this sysctl-next tree up since v5.18 but I missed sending a
pull request for it on time during the last merge window. And so these changes
have been being soaking up on sysctl-next and so linux-next for a while.
The last change was merged May 4th.
Most of the compile issues were reported by 0day and fixed.
To help avoid a conflict with bpf folks at Daniel Borkmann's request
I merged bpf-next/pr/bpf-sysctl into sysctl-next to get the effor which
moves the BPF sysctls from kernel/sysctl.c to BPF core.
Possible merge conflicts and known resolutions as per linux-next:
bfp:
https://lkml.kernel.org/r/20220414112812.652190b5@canb.auug.org.au
rcu:
https://lkml.kernel.org/r/20220420153746.4790d532@canb.auug.org.au
powerpc:
https://lkml.kernel.org/r/20220520154055.7f964b76@canb.auug.org.au
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmKOq8ASHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinDAkQAJVo5YVM9f74UwYp4PQhTpjxJBCjRoZD
z1u9bp5rMj2ujTC8Fr7VmzKaHrb8+r1C1WvCvZtIzemYNB4lZUrHpVDYfXuXiPRB
ihPmEjhlPO5PFBx6cVCpI3cu9bEhG00rLc1QXnABx/pXwNPcOTJAGZJVamZvqubk
chjgZrb7N+adHPfvS55v1+zpwdeKfpp5U3zuu5qlT/nn0GS0HCVzOj5fj4oC4wtJ
IqfUubo+FX50Ga58yQABWNrjaPD9Crykz5ohVazy3ElQl0hJ4VsK65ct3blqc2vz
1Bb8kPpWuv6aZ5nr1lCVE8qvF4ZIL33ySvpg5BSdWLQEDrBbSpzvJe9Yn7wgR+eq
y7fhpO24+zRM82EoDMEvyxX9u1n1RsvoXRtf3ds9BGf63MUxk8a1cgjlU6vuyO2U
JhDmfM1xzdKvPoY4COOnHzcAiIqzItTqKd09N5y0cahmYstROU8lvp9huhTAHqk1
SjQMbLIZG7OnX8ZeQcR1EB8sq/IOPZT48ejj0iJmQ8FyMaep71MOQLYyLPAq4lgh
JHXm8P6QdB57jfJbqAeNSyZoK0qdxOUR/83Zcah7Jjns6vkju1DNatEsaEEI2y2M
4n7/rkHeZ3TyFHBUX4e9FomKvGLsAalDBRiqsuxLSOPMU8rGrNLAslOAtKwvp90X
4ht3M2VP098l
=btwh
-----END PGP SIGNATURE-----
Merge tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"For two kernel releases now kernel/sysctl.c has been being cleaned up
slowly, since the tables were grossly long, sprinkled with tons of
#ifdefs and all this caused merge conflicts with one susbystem or
another.
This tree was put together to help try to avoid conflicts with these
cleanups going on different trees at time. So nothing exciting on this
pull request, just cleanups.
Thanks a lot to the Uniontech and Huawei folks for doing some of this
nasty work"
* tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits)
sched: Fix build warning without CONFIG_SYSCTL
reboot: Fix build warning without CONFIG_SYSCTL
kernel/kexec_core: move kexec_core sysctls into its own file
sysctl: minor cleanup in new_dir()
ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n
fs/proc: Introduce list_for_each_table_entry for proc sysctl
mm: fix unused variable kernel warning when SYSCTL=n
latencytop: move sysctl to its own file
ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y
ftrace: Fix build warning
ftrace: move sysctl_ftrace_enabled to ftrace.c
kernel/do_mount_initrd: move real_root_dev sysctls to its own file
kernel/delayacct: move delayacct sysctls to its own file
kernel/acct: move acct sysctls to its own file
kernel/panic: move panic sysctls to its own file
kernel/lockdep: move lockdep sysctls to its own file
mm: move page-writeback sysctls to their own file
mm: move oom_kill sysctls to their own file
kernel/reboot: move reboot sysctls to its own file
sched: Move energy_aware sysctls to topology.c
...
- qcom: log pending irq during resume
minor cosmetic changes
- omap: use pm_runtime_resume_and_get
- imx: use pm_runtime_resume_and_get
remove redundant initializer
- mtk: added GCE header for MT8186
enable support for MT8186
- tegra: remove redundant NULL check
added hsp_sm_ops for send/recv api
support shared mailboxes
- stm: remove unsupported "wakeup" irq
- pcc: sanitize mbox allocated memory before use
- misc: documentation fixes for arm_mhu and qcom-ipcc
-
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6EwehDt/SOnwFyTyf9lkf8eYP5UFAmKO3ZYACgkQf9lkf8eY
P5VYug/8DfK0Ang0Sgw7DkT0w1TVY5sAUvq9EgA5z65HL2Vf6hdCMN5JAAVOBuqg
L8BcKegci3/aa5rxbRXpjAcNAFVG9RiaZ6i0qrXL2QGpNGoqXDFH4z7thREjgCYz
LgwSILyNneOovEuNqCFas7zyJSzzaIUCOybJgHA8ZVpCOnCKNHW5mZhi8tlCIbJ9
aY0rLD7Kc9ZLQ3N4JfvjtevaQ5EQR8EpMCQSIksdm5U9k8ej8dJPfDibUzM0PICU
o1DAYG9WxsqibufkJjFYFKf3uxW6dlcWzhUB4QVP3gnmvSIicNfaHTNvOgYOeLYw
d9yXz/ixxakVH2zGTCz6WELG8YQSzvzkjwOFAeDI7vNYgKE23v6l0Hj7dnc+akYW
08wP4Nyo5HRUPk4KnIOiRpNMw7QQKfS7Lufp8L6uwHVlf0uLBvoT49Px9d5ZdIO9
4pR8DLfuSfhU5swuhepipCs7z+NGfvMp2eeqqv1urfvlgjh9d2Y2A/g5qhC4/Zjt
CK27rKIFTpL0Wn2r1pV6+1rLXc0x3o5CXJBFMNLYZEfMJ91ZvyS3InZqebIcUk7f
0yUzLGCSY0a86Xriq8lvkYs+roQxl4Gqm2Jwn9RQisQQ2Q1OOW0g2ZqaG7mZGQf3
v5/gq5w3x1rbHU5Cn8yuFMw8D9O0kJ6ExNdqHtfiNLkmcPMM8Vc=
=f51u
-----END PGP SIGNATURE-----
Merge tag 'mailbox-v5.19' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
"api:
- hrtimer fix
qcom:
- log pending irq during resume
- minor cosmetic changes
omap:
- use pm_runtime_resume_and_get
imx:
- use pm_runtime_resume_and_get
- remove redundant initializer
mtk:
- added GCE header for MT8186
- enable support for MT8186
tegra:
- remove redundant NULL check
- added hsp_sm_ops for send/recv api
- support shared mailboxes
stm:
- remove unsupported "wakeup" irq
pcc:
- sanitize mbox allocated memory before use
misc:
- documentation fixes for arm_mhu and qcom-ipcc"
* tag 'mailbox-v5.19' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: qcom-ipcc: Fix -Wunused-function with CONFIG_PM_SLEEP=n
mailbox: forward the hrtimer if not queued and under a lock
mailbox: qcom-ipcc: Log the pending interrupt during resume
mailbox: pcc: Fix an invalid-load caught by the address sanitizer
dt-bindings: mailbox: remove the IPCC "wakeup" IRQ
mailbox: correct kerneldoc
mailbox: omap: using pm_runtime_resume_and_get to simplify the code
mailbox:imx: using pm_runtime_resume_and_get
mailbox: mediatek: support mt8186 adsp mailbox
dt-bindings: mailbox: mtk,adsp-mbox: add mt8186 compatible name
mailbox: tegra-hsp: Add 128-bit shared mailbox support
dt-bindings: tegra186-hsp: add type for shared mailboxes
mailbox: tegra-hsp: Add tegra_hsp_sm_ops
dt-bindings: gce: add the GCE header file for MT8186
mailbox: remove an unneeded NULL check on list iterator
mailbox: imx: remove redundant initializer
dt-bindings: mailbox: qcom-ipcc: simplify the example
- use ioread()/iowrite() interfaces instead of raw inb()/outb() in drivers
- make irqchips immutable due to the new warning popping up when drivers try to
modify the irqchip structures
- add new compatibles to dt-bindings for realtek-otto, renesas-rcar and pca95xx
- add support for new models to gpio-rcar, gpio-pca953x & gpio-realtek-otto
- allow parsing of GPIO hogs represented as children nodes of gpio-uniphier
- define a set of common GPIO consumer strings in dt-bindings
- shrink code in gpio-ml-ioh by using more devres interfaces
- pass arguments to devm_kcalloc() in correct order in gpio-sim
- add new helpers for iterating over GPIO firmware nodes and descriptors to
gpiolib core and use it in several drivers
- drop unused syscon_regmap_lookup_by_compatible() function
- correct format specifiers and signedness of variables in GPIO ACPI
- drop unneeded error checks in gpio-ftgpio
- stop using the deprecated of_gpio.h header in gpio-zevio
- drop platform_data support in gpio-max732x
- simplify Kconfig dependencies in gpio-vf610
- use raw spinlocks where needed to make PREEMPT_RT happy
- fix return values in board files using gpio-pcf857x
- convert more drivers to using fwnode instead of of_node
- minor fixes and improvements in gpiolib core
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmKOU9gACgkQEacuoBRx
13L6+g//axfOCo1VFtrIZR3Sh8F3Zt6t5DfdWn7DU5OqKL9xQzF7o6dB5nqOa4ua
k8cgXAlMj3EnlFAxWalArCw3Lu3ntInXfl2EAgFOfhya3DLCjQRayoz7EGTMlGrA
XLZWNc+kUEDNOfN0L+fLHRopgi9jOtlS4XODaVMJKH31jVxufAwoQrFZF4d7pMvW
XC4vuSYmRfLrNCm77CqznBjw5hD44v5bxxkGyGmKhE+VmuFcLX1feSTKkttZ+ZMC
CP/Rp0/KSzJU4/1+9uPPrNY8NJGsBN9Uo+BQzH6nuSQrrO2MuOj5JA6UqgR+MHjI
9a/b/iftiPnsxSzbE8PKj/jWcswmScp7tvGqwCa0Q7Fh502+p+8RtEVKugy5nYEG
xNPONhQusu21Hw2ySHZZVjuxfQKi09uDEIZN55V5etsURHXiUB6RtZJwlgXWOYp5
8/h/TPemAZsfvAs/9OZwck171oQBUnX1K+gdbQ/5t4QoW+VxQCuGP0uTPB5kTxSV
yfVjeD2tpEdpjEAwmKrSLug4xLRlB4ed17DeEstFbFARUdOQZSLBiXln2KBg/d6Y
ofnAPvqZyMf0/MFSKBoXIb/aKw8svbTKDwy0HvU3Tf0lOGix4F/w9ih6VzUo/yVt
Aj9oqQyfK9E7EvPEQnZqn3h/3fmXvqDGrryABmuOmiy3N6RFGho=
=tj/B
-----END PGP SIGNATURE-----
Merge tag 'gpio-updates-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We have lots of small changes all over the place, but no huge reworks
or new drivers:
- use ioread()/iowrite() interfaces instead of raw inb()/outb() in
drivers
- make irqchips immutable due to the new warning popping up when
drivers try to modify the irqchip structures
- add new compatibles to dt-bindings for realtek-otto, renesas-rcar
and pca95xx
- add support for new models to gpio-rcar, gpio-pca953x &
gpio-realtek-otto
- allow parsing of GPIO hogs represented as children nodes of
gpio-uniphier
- define a set of common GPIO consumer strings in dt-bindings
- shrink code in gpio-ml-ioh by using more devres interfaces
- pass arguments to devm_kcalloc() in correct order in gpio-sim
- add new helpers for iterating over GPIO firmware nodes and
descriptors to gpiolib core and use it in several drivers
- drop unused syscon_regmap_lookup_by_compatible() function
- correct format specifiers and signedness of variables in GPIO ACPI
- drop unneeded error checks in gpio-ftgpio
- stop using the deprecated of_gpio.h header in gpio-zevio
- drop platform_data support in gpio-max732x
- simplify Kconfig dependencies in gpio-vf610
- use raw spinlocks where needed to make PREEMPT_RT happy
- fix return values in board files using gpio-pcf857x
- convert more drivers to using fwnode instead of of_node
- minor fixes and improvements in gpiolib core"
* tag 'gpio-updates-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (55 commits)
gpio: sifive: Make the irqchip immutable
gpio: rcar: Make the irqchip immutable
gpio: pcf857x: Make the irqchip immutable
gpio: pca953x: Make the irqchip immutable
gpio: dwapb: Make the irqchip immutable
gpio: sim: Use correct order for the parameters of devm_kcalloc()
gpio: ml-ioh: Convert to use managed functions pcim* and devm_*
gpio: ftgpio: Remove unneeded ERROR check before clk_disable_unprepare
gpio: ws16c48: Utilize iomap interface
gpio: gpio-mm: Utilize iomap interface
gpio: 104-idio-16: Utilize iomap interface
gpio: 104-idi-48: Utilize iomap interface
gpio: 104-dio-48e: Utilize iomap interface
gpio: zevio: drop of_gpio.h header
gpio: max77620: Make the irqchip immutable
dt-bindings: gpio: pca95xx: add entry for pca6408
gpio: pca953xx: Add support for pca6408
gpio: max732x: Drop unused support for irq and setup code via platform data
gpio: vf610: drop the SOC_VF610 dependency for GPIO_VF610
gpio: syscon: Remove usage of syscon_regmap_lookup_by_compatible
...
This fixes a handful of issues with the XIP support, which has bit
rotted some lately.
* palmer/riscv-xip:
RISC-V: Fix the XIP build
RISC-V: Split out the XIP fixups into their own file
RISC-V: ignore xipImage
RISC-V: Avoid empty create_*_mapping definitions