From 999340d51174ce4141dd723105d4cef872b13ee9 Mon Sep 17 00:00:00 2001 From: Chen Jun Date: Mon, 22 Feb 2021 13:58:40 +0000 Subject: [PATCH 1/8] ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount On little endian system, Use aarch64_be(gcc v7.3) downloaded from linaro.org to build image with CONFIG_CPU_BIG_ENDIAN = y, CONFIG_FTRACE = y, CONFIG_DYNAMIC_FTRACE = y. gcc will create symbols of _mcount but recordmcount can not create mcount_loc for *.o. aarch64_be-linux-gnu-objdump -r fs/namei.o | grep mcount 00000000000000d0 R_AARCH64_CALL26 _mcount ... 0000000000007190 R_AARCH64_CALL26 _mcount The reason is than funciton arm64_is_fake_mcount can not work correctly. A symbol of _mcount in *.o compiled with big endian compiler likes: 00 00 00 2d 00 00 01 1b w(rp->r_info) will return 0x2d instead of 0x011b. Because w() takes uint32_t as parameter, which truncates rp->r_info. Use w8() instead w() to read relp->r_info Link: https://lkml.kernel.org/r/20210222135840.56250-1-chenjun102@huawei.com Fixes: ea0eada45632 ("recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.") Acked-by: Will Deacon Signed-off-by: Chen Jun Signed-off-by: Steven Rostedt (VMware) --- scripts/recordmcount.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index b9c2ee7ab43f..cce12e1971d8 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -438,7 +438,7 @@ static int arm_is_fake_mcount(Elf32_Rel const *rp) static int arm64_is_fake_mcount(Elf64_Rel const *rp) { - return ELF64_R_TYPE(w(rp->r_info)) != R_AARCH64_CALL26; + return ELF64_R_TYPE(w8(rp->r_info)) != R_AARCH64_CALL26; } /* 64-bit EM_MIPS has weird ELF64_Rela.r_info. From 70d443d8463339869f371e77fa594b850f374565 Mon Sep 17 00:00:00 2001 From: "Yordan Karadzhov (VMware)" Date: Thu, 4 Mar 2021 11:23:48 +0200 Subject: [PATCH 2/8] tracing: Remove duplicate declaration from trace.h A declaration of function "int trace_empty(struct trace_iterator *iter)" shows up twice in the header file kernel/trace/trace.h Link: https://lkml.kernel.org/r/20210304092348.208033-1-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.h | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index dec13ff66077..a6446c03cfbc 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -605,7 +605,6 @@ void trace_graph_function(struct trace_array *tr, void trace_latency_header(struct seq_file *m); void trace_default_header(struct seq_file *m); void print_trace_header(struct seq_file *m, struct trace_iterator *iter); -int trace_empty(struct trace_iterator *iter); void trace_graph_return(struct ftrace_graph_ret *trace); int trace_graph_entry(struct ftrace_graph_ent *trace); From 69268094a1c16f3f44b369f9da78ce98bab5f244 Mon Sep 17 00:00:00 2001 From: Rolf Eike Beer Date: Tue, 2 Mar 2021 09:49:28 +0100 Subject: [PATCH 3/8] tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig It's "cond_resched()" not "cond_sched()". Link: https://lkml.kernel.org/r/1863065.aFVDpXsuPd@devpool47 Signed-off-by: Rolf Eike Beer Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 9c266b93cbc0..7fa82778c3e6 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -694,7 +694,7 @@ config TRACEPOINT_BENCHMARK help This option creates the tracepoint "benchmark:benchmark_event". When the tracepoint is enabled, it kicks off a kernel thread that - goes into an infinite loop (calling cond_sched() to let other tasks + goes into an infinite loop (calling cond_resched() to let other tasks run), and calls the tracepoint. Each iteration will record the time it took to write to the tracepoint and the next iteration that data will be passed to the tracepoint itself. That is, the tracepoint From 6f6be606e763f2da9fc21de00538c97fe4ca1492 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 3 Mar 2021 18:03:52 -0500 Subject: [PATCH 4/8] ring-buffer: Force before_stamp and write_stamp to be different on discard Part of the logic of the new time stamp code depends on the before_stamp and the write_stamp to be different if the write_stamp does not match the last event on the buffer, as it will be used to calculate the delta of the next event written on the buffer. The discard logic depends on this, as the next event to come in needs to inject a full timestamp as it can not rely on the last event timestamp in the buffer because it is unknown due to events after it being discarded. But by changing the write_stamp back to the time before it, it forces the next event to use a full time stamp, instead of relying on it. The issue came when a full time stamp was used for the event, and rb_time_delta() returns zero in that case. The update to the write_stamp (which subtracts delta) made it not change. Then when the event is removed from the buffer, because the before_stamp and write_stamp still match, the next event written would calculate its delta from the write_stamp, but that would be wrong as the write_stamp is of the time of the event that was discarded. In the case that the delta change being made to write_stamp is zero, set the before_stamp to zero as well, and this will force the next event to inject a full timestamp and not use the current write_stamp. Cc: stable@vger.kernel.org Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/ring_buffer.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index b9dad3500041..342f49c3ccc5 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2814,6 +2814,17 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer, write_stamp, write_stamp - delta)) return 0; + /* + * It's possible that the event time delta is zero + * (has the same time stamp as the previous event) + * in which case write_stamp and before_stamp could + * be the same. In such a case, force before_stamp + * to be different than write_stamp. It doesn't + * matter what it is, as long as its different. + */ + if (!delta) + rb_time_set(&cpu_buffer->before_stamp, 0); + /* * If an event were to come in now, it would see that the * write_stamp and the before_stamp are different, and assume From 6549de1fe34162d7ace8b870ae11ca6cae5b8609 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 3 Mar 2021 18:23:40 -0500 Subject: [PATCH 5/8] ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected When the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is enabled, and the time stamps are detected as not being valid, it reports information about the write stamp, but does not show the before_stamp which is still useful information. Also, it should give a warning once, such that tests detect this happening. Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/ring_buffer.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 342f49c3ccc5..68744c51517e 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3318,9 +3318,13 @@ static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer, goto out; } atomic_inc(&cpu_buffer->record_disabled); - pr_warn("[CPU: %d]TIME DOES NOT MATCH expected:%lld actual:%lld delta:%lld after:%lld\n", - cpu_buffer->cpu, - ts + info->delta, info->ts, info->delta, info->after); + /* There's some cases in boot up that this can happen */ + WARN_ON_ONCE(system_state != SYSTEM_BOOTING); + pr_warn("[CPU: %d]TIME DOES NOT MATCH expected:%lld actual:%lld delta:%lld before:%lld after:%lld%s\n", + cpu_buffer->cpu, + ts + info->delta, info->ts, info->delta, + info->before, info->after, + full ? " (full)" : ""); dump_buffer_page(bpage, info, tail); atomic_dec(&ts_dump); /* Do not re-enable checking */ From f40fc799afc598b3d130d5a0ada994c9d4fb6cf8 Mon Sep 17 00:00:00 2001 From: Vamshi K Sthambamkadi Date: Thu, 4 Mar 2021 15:15:24 +0530 Subject: [PATCH 6/8] tracing: Fix memory leak in __create_synth_event() kmemleak report: unreferenced object 0xc5a6f708 (size 8): comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s) hex dump (first 8 bytes): 00 c1 3d 60 14 83 1f 8a ..=`.... backtrace: [] __kmalloc_track_caller+0x2a6/0x460 [<7d3d60a6>] kstrndup+0x37/0x70 [<45a0e739>] argv_split+0x1c/0x120 [] __create_synth_event+0x192/0xb00 [<0708b8a3>] create_synth_event+0xbb/0x150 [<3d1941e1>] create_dyn_event+0x5c/0xb0 [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140 [<04deb2ef>] dyn_event_write+0x10/0x20 [<8779ac95>] vfs_write+0xa9/0x3c0 [] ksys_write+0x89/0xc0 [] __ia32_sys_write+0x15/0x20 [<7ce02d85>] __do_fast_syscall_32+0x45/0x80 [] do_fast_syscall_32+0x29/0x60 [<2467454a>] do_SYSENTER_32+0x15/0x20 [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc unreferenced object 0xc5a6f078 (size 8): comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s) hex dump (first 8 bytes): 08 f7 a6 c5 00 00 00 00 ........ backtrace: [] __kmalloc+0x2b6/0x470 [] argv_split+0x82/0x120 [] __create_synth_event+0x192/0xb00 [<0708b8a3>] create_synth_event+0xbb/0x150 [<3d1941e1>] create_dyn_event+0x5c/0xb0 [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140 [<04deb2ef>] dyn_event_write+0x10/0x20 [<8779ac95>] vfs_write+0xa9/0x3c0 [] ksys_write+0x89/0xc0 [] __ia32_sys_write+0x15/0x20 [<7ce02d85>] __do_fast_syscall_32+0x45/0x80 [] do_fast_syscall_32+0x29/0x60 [<2467454a>] do_SYSENTER_32+0x15/0x20 [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc In __create_synth_event(), while iterating field/type arguments, the argv_split() will return array of atleast 2 elements even when zero arguments(argc=0) are passed. for e.g. when there is double delimiter or string ends with delimiter To fix call argv_free() even when argc=0. Link: https://lkml.kernel.org/r/20210304094521.GA1826@cosmos Signed-off-by: Vamshi K Sthambamkadi Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_synth.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c index 2979a96595b4..8d71e6c83f10 100644 --- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -1225,8 +1225,10 @@ static int __create_synth_event(const char *name, const char *raw_fields) goto err; } - if (!argc) + if (!argc) { + argv_free(argv); continue; + } n_fields_this_loop = 0; consumed = 0; From ee666a185558ac9a929e53b902a568442ed62416 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Mon, 1 Mar 2021 10:49:35 -0500 Subject: [PATCH 7/8] tracing: Skip selftests if tracing is disabled If tracing is disabled for some reason (traceoff_on_warning, command line, etc), the ftrace selftests are guaranteed to fail, as their results are defined by trace data in the ring buffers. If the ring buffers are turned off, the tests will fail, due to lack of data. Because tracing being disabled is for a specific reason (warning, user decided to, etc), it does not make sense to enable tracing to run the self tests, as the test output may corrupt the reason for the tracing to be disabled. Instead, simply skip the self tests and report that they are being skipped due to tracing being disabled. Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index e295c413580e..eccb4e1187cc 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1929,6 +1929,12 @@ static int run_tracer_selftest(struct tracer *type) if (!selftests_can_run) return save_selftest(type); + if (!tracing_is_on()) { + pr_warn("Selftest for tracer %s skipped due to tracing disabled\n", + type->name); + return 0; + } + /* * Run a selftest on this tracer. * Here we reset the trace buffer, and set the current From f9f344479d8b40b3b001c913fb992d85d19261d0 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 26 Feb 2021 14:09:15 -0500 Subject: [PATCH 8/8] tracing: Fix comment about the trace_event_call flags In the declaration of the struct trace_event_call, the flags has the bits defined in the comment above it. But these bits are also defined by the TRACE_EVENT_FL_* enums just above the declaration of the struct. As the comment about the flags in the struct has become stale and incorrect, just replace it with a reference to the TRACE_EVENT_FL_* enum above. Signed-off-by: Steven Rostedt (VMware) --- include/linux/trace_events.h | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 7077fec653bb..28e7af1406f2 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -349,15 +349,8 @@ struct trace_event_call { struct event_filter *filter; void *mod; void *data; - /* - * bit 0: filter_active - * bit 1: allow trace by non root (cap any) - * bit 2: failed to apply filter - * bit 3: trace internal event (do not enable) - * bit 4: Event was enabled by module - * bit 5: use call filter rather than file filter - * bit 6: Event is a tracepoint - */ + + /* See the TRACE_EVENT_FL_* flags above */ int flags; /* static flags of different events */ #ifdef CONFIG_PERF_EVENTS