It was only set, and never looked at, so drop the thing as no one cares
about it.
Cc: Lidza Louina <lidza.louina@gmail.com>
Cc: Mark Hounschell <markh@compro.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was only set, and never looked at, so drop the thing as no one cares
about it.
Cc: Lidza Louina <lidza.louina@gmail.com>
Cc: Mark Hounschell <markh@compro.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the few #defines we actually use into the dgnc_driver.c file, no
need to have a .h file for a driver that is only used by a single .c
file.
Cc: Lidza Louina <lidza.louina@gmail.com>
Cc: Mark Hounschell <markh@compro.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As the Neo devices were never actually supported by the driver, delete
the files that purported to control that type of device as they are not
needed. This will let us shrink the driver a lot over time.
Cc: Lidza Louina <lidza.louina@gmail.com>
Cc: Mark Hounschell <markh@compro.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "Neo" devices are never allowed to be bound to this driver, as they
are not in the valid list of PCI ids, so delete them from the driver as
the code in the probe function has no way to ever run.
Also delete the defines that are no longer used in relation to the Neo
devices, as they are not needed anymore.
Cc: Lidza Louina <lidza.louina@gmail.com>
Cc: Mark Hounschell <markh@compro.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the CPU has support for 5-level paging enabled and the IOMMU also
supports 5-level paging then enable the 5-level paging mode for first-
level translations - used when SVM is enabled.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a check to verify IOMMU 5-level paging support. If the CPU supports
supports 5-level paging but the IOMMU does not support it then disable
SVM by not allocating PASID tables.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a check to verify IOMMU 1GB page support. If the CPU supports 1GB
pages but the IOMMU does not support it then disable SVM by not
allocating PASID tables.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Update the IOMMU default domain address width to 57 bits. This would
enable the IOMMU to do upto 5-levels of paging for second level
translations - IOVA translation requests without PASID.
Even though the maximum supported address width is being increased to
57, __iommu_calculate_agaw() would set the actual supported address
width to the maximum support available in IOMMU hardware.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit
status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is
to fix it.
Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set)
Reported-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Three more fixes for v4.15 fixing incorrect huge page mappings on systems using
the contigious hint for hugetlbfs; supporting an alternative GICv4 init
sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJaXi9yAAoJEEtpOizt6ddymb4H/R6Q7uPSNY31d/wcMHg8qYS7
foDW76r7mKliRVmCJoq9oqLqC7BLpQszfZ8dFjPSfdLA4xVMsuZ3GG3S7jlghiuN
9+rZK+ZZX8g5uQNsqVITC3WrXmozBj+VEs/uH2Z1pu0g+siPTp7J2iv5+A5tvM3A
NCySqgEjefQyy7Zs2r7TuvM+E3p9MY7jZih9E2o8mn2TQipVKrcnHRN3IjNNtI4u
C17x70OQ1ZY7bwnmPnuPPqnX3H1fQ6+UgwtfDCu3KP7DAFVjqAz03X6wbf1nCLAB
zzKok/SnIFWpr56JUSOzMpHWG8sOFscdVXxW97a2Ova0ur0rHW2iPiucTb8jOjQ=
=gJL6
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-fixes-for-v4.15-3-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Fixes for v4.15, Round 3 (v2)
Three more fixes for v4.15 fixing incorrect huge page mappings on systems using
the contigious hint for hugetlbfs; supporting an alternative GICv4 init
sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
With the previous patch to switch to 64-bit mode after returning from
RTAS and before doing any memory accesses, the RMA limit need not be
clamped to 1GB to avoid RTAS bugs.
Keep the 1GB limit for older firmware (although this is more of a kernel
concern than RTAS), and remove it starting with POWER9.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With the previous patch to switch to 64-bit mode after returning from
RTAS and before doing any memory accesses, the RMA limit need not be
clamped to 1GB to avoid RTAS bugs.
Keep the 1GB limit for older firmware (although this is more of a kernel
concern than RTAS), and remove it starting with POWER9.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 177ba7c647 ("powerpc/mm/radix: Limit paca allocation in radix")
limited the paca allocation address to 1G on pSeries because RTAS return
accesses the paca in 32-bit mode:
On return from RTAS we access the paca variables and we have 64 bit
disabled. This requires us to limit paca in 32 bit range.
Fix this by setting ppc64_rma_size to first_memblock_size/1G range.
Avoid this limit by switching to 64-bit mode before accessing any memory.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The radix guest is not subject to the paravirtualized HPT VRMA limit,
so remove that from ppc64_rma_size calculation for that platform.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This removes the RMA limit on powernv platform, which constrains
early allocations such as PACAs and stacks. There are still other
restrictions that must be followed, such as bolted SLB limits, but
real mode addressing has no constraints.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The die() oops path contains a serializing lock to prevent oops
messages from being interleaved. In the case of a system reset
initiated oops (e.g., qemu nmi command), __die was being called
which lacks that synchronisation and oops reports could be
interleaved across CPUs.
A recent patch 4388c9b3a6 ("powerpc: Do not send system reset
request through the oops path") changed this to __die to avoid
the debugger() call, but there is no real harm to calling it twice
if the first time fell through. So go back to using die() here.
This was observed to fix the problem.
Fixes: 4388c9b3a6 ("powerpc: Do not send system reset request through the oops path")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Removing the early device registration hook overlooked the fact that
it only ran conditionally on a compatible device being present in the
DT. With exynos_iommu_init() now running as an unconditional initcall,
problems arise on non-Exynos systems when other IOMMU drivers find
themselves unable to install their ops on the platform bus, or at worst
the Exynos ops get called with someone else's domain and all hell breaks
loose.
The global ops/cache setup could probably all now be triggered from the
first IOMMU probe, as with dma_dev assigment, but for the time being the
simplest fix is to resurrect the logic from commit a7b67cd5d9
("iommu/exynos: Play nice in multi-platform builds") to explicitly check
the DT for the presence of an Exynos IOMMU before trying anything.
Fixes: 928055a01b ("iommu/exynos: Remove custom platform device registration code")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Turn gfs2_block_truncate_page into a function that zeroes a range within
a block rather than only the end of a block. This will be used for
cleaning the end of the first partial block and the start of the last
partial block when punching a hole in a file.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
In rare cases, the current non-recursive delete algorithm doesn't
deallocate empty intermediary indirect blocks. This should have very
little practical effect, but deallocating all blocks correctly should
still be preferable as it is cleaner and easier to validate.
The fix consists of using the first block to deallocate to compute the
start marker of the truncate point instead of the last block that needs
to be kept. With that change, computing which indirect blocks are still
needed becomes relatively easy.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
The metadata read-ahead algorithm broke when switching from recursive to
non-recursive delete: the current algorithm reads ahead blocks at height
N - 1 while deallocating the blocks at hight N. However, deallocating
the blocks at height N requires a complete walk of the metadata tree,
not only down to height N - 1. Consequently, all blocks below height
N - 1 will be accessed without read-ahead.
Fix this by issuing read-aheads as early as possible, after each
metapath lookup.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Split out the entire lookup loop from lookup_metapath and
fillup_metapath. Make both functions return the actual height in
mp->mp_aheight, and return 0 on success. Handle lookup errors properly
in trunc_dealloc.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
First, this function truncates the file in chunks. When the original
file size isn't block aligned, each chunk that is truncated will remain
be misaligned. This is inefficient.
Second, this function doesn't recognize where holes are, so it loops
through them. For each chunk of a hole, it creates a new transaction.
At least avoid creating another transactions whe the current one is
still empty. (An better fix would be to skip large holes, of course.)
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
The current transaction is being dereferenced before asserting that is
not NULL; that isn't going to help.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Document when to use gfs2_blk2rgrpd for "inexact" resource group
matching. Based on that, fix an incorrect use of gfs2_blk2rgrpd in
sweep_bh_for_rgrps.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
In x86 architecture dependend part function get_cpuid_str() mallocs a
128 byte buffer, but does not check if the memory allocation succeeded
or not.
When the memory allocation fails, function __get_cpuid() is called with
first parameter being a NULL pointer. However this function references
its first parameter and operates on a NULL pointer which might cause
core dumps.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180117131611.34319-1-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously it was only allowed to use at most 10 time slices in 'perf
script --time'.
This patch removes this limitation.
For example, following command line is OK (12 time slices)
perf script --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-9-git-send-email-yao.jin@linux.intel.com
[ No need to check for NULL to call free, use zfree ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously it was only allowed to use at most 10 time slices in 'perf
report --time'.
This patch removes this limitation.
For example, following command line is OK (12 time slices)
perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-8-git-send-email-yao.jin@linux.intel.com
[ No need to check for NULL to call free, use zfree ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously we use a magic number 10 to limit the number of time slices.
It's not very good.
This patch creates a new function perf_time__range_alloc() to allocate
time slices buffer. The number of buffer entries is determined by the
number of comma in string but at least it will allocate one entry even
if no comma is found.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a time slices indication to the perf report header.
For example,
# perf report --stdio --time 10%
# Total Lost Samples: 0
#
# Samples: 9K of event 'cycles:ppp' (time slices: 10%)
# Event count (approx.): 8951288803
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested--by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-6-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously, the time percent slice needs an index to specify which one
the user wants.
It may be easier to use if the index can be omitted. So with this
patch, for example,
perf report --stdio --time 10%/1 should be equivalent to
perf report --stdio --time 10%
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The command line like 'perf report --stdio --time 1abc%/1' could be
accepted by perf. It looks not very good.
This patch uses strtod() to replace original atof() and check the entire
string. Now for the same command line, it would return error message
"Invalid time string".
root@skl:/tmp# perf report --stdio --time 1abc%/1
Invalid time string
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following message will be returned to user when executing 'perf
script --time' if perf data file doesn't contain the first/last sample
time.
"HINT: no first/last sample time found in perf data.
Please use latest perf binary to execute 'perf record'
(if '--buildid-all' is enabled, needs to set '--timestamp-boundary')."
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following message will be returned to user when executing
'perf report --time' if perf data file doesn't contain the
first/last sample time.
"HINT: no first/last sample time found in perf data.
Please use latest perf binary to execute 'perf record'
(if '--buildid-all' is enabled, needs to set '--timestamp-boundary')."
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1515596433-24653-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When exposing data access through debugfs, the correct
debugfs_create_*() functions must be used, depending on data type.
Remove all casts from data pointers passed to debugfs_create_*()
functions, as such casts prevent the compiler from flagging bugs.
omap_iommu.nr_tlb_entries is "int", hence casting to "u8 *" exposes only
a part of it. Fix this by using debugfs_create_u32() instead.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When we use a global DWARF setting as in:
perf record --call-graph dwarf
According to 5c0cf22477 ("perf record: Store data mmaps for dwarf unwind") we need
to set up some extra perf_event_attr bits.
But when we instead do a per event dwarf setting:
perf record -e cycles/call-graph=dwarf/
This was not being done, make them equivalent.
This didn't produce any output changes in my tests while fixing up loose
ends in the per-event settings, I found it just by comparing the
perf_event_attr fields trying to find an explanation for those problems.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Noel Grandin <noelgrandin@gmail.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6476r53h2o38skbs9qa4ust4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we use:
perf trace --max-stack=4
then the syscall events will use DWARF callchains, when available
(libunwind enabled in the build) and the printing will stop at 4 levels.
When we introduced support for tracepoint events this ended up not
applying for them, fix it.
Before:
# perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.058 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms
0.000 probe_libc:inet_pton:(7fc6c2a16350))
#
After:
# perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.087 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.087/0.087/0.087/0.000 ms
0.000 probe_libc:inet_pton:(7fbf9a041350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa947cb67f3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa947cb68379] (/usr/bin/ping)
#
Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-afsu9eegd43ppihiuafhh9qv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When setting up DWARF callchains on specific events, without using
'record' or 'trace' --call-graph, but instead doing it like:
perf trace -e cycles/call-graph=dwarf/
The unwind__prepare_access() call in thread__insert_map() when we
process PERF_RECORD_MMAP(2) metadata events were not being performed,
precluding us from using per-event DWARF callchains, handling them just
when we asked for all events to be DWARF, using "--call-graph dwarf".
We do it in the PERF_RECORD_MMAP because we have to look at one of the
executable maps to figure out the executable type (64-bit, 32-bit) of
the DSO laid out in that mmap. Also to look at the architecture where
the perf.data file was recorded.
All this probably should be deferred to when we process a sample for
some thread that has callchains, so that we do this processing only for
the threads with samples, not for all of them.
For now, fix using DWARF on specific events.
Before:
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.048 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.048/0.048/0.048/0.000 ms
0.000 probe_libc:inet_pton:(7fe9597bb350))
Problem processing probe_libc:inet_pton callchain, skipping...
#
After:
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.060 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
0.000 probe_libc:inet_pton:(7fd4aa930350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa804e51af3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa804e51b379] (/usr/bin/ping)
#
# perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.057 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.057/0.057/0.057/0.000 ms
0.000 probe_libc:inet_pton:(7f9363b9e350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffa9e8a14e0f3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffa9e8a14e1379] (/usr/bin/ping)
#
# perf trace --call-graph=fp --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.077 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.077/0.077/0.077/0.000 ms
0.000 probe_libc:inet_pton:(7f4947e1c350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa716d88ef3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa716d88f379] (/usr/bin/ping)
#
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=fp/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.078 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
0.000 probe_libc:inet_pton:(7fa157696350))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
getaddrinfo (/usr/lib64/libc-2.26.so)
[0xffffa9ba39c74f40] (/usr/bin/ping)
#
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/r/20180116182650.GE16107@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When setting the "dwarf" unwinder for a specific event and not
specifying the max-stack, the attr.sample_max_stack ended up using an
uninitialized callchain_param.max_stack, fix it by using designated
initializers for that callchain_param variable, zeroing all non
explicitely initialized struct members.
Here is what happened:
# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
{ wakeup_events, wakeup_watermark } 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 50656
sys_perf_event_open failed, error -75
Value too large for defined data type
# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 30448
sys_perf_event_open failed, error -75
Value too large for defined data type
#
Now the attr.sample_max_stack is set to zero and the above works as
expected:
# perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.072 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
0.000 probe_libc:inet_pton:(7feb7a998350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa39b6108f3f] (/usr/bin/ping)
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-is9tramondqa9jlxxsgcm9iz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf record' and 'perf report --dump-raw-trace' supported in this
release.
Example usage:
# perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=10000
# perf report --dump-raw-trace
Note that the perf.data file is portable, so the report can be run on
another architecture host if necessary.
Output will contain raw SPE data and its textual representation, such
as:
0x5c8 [0x30]: PERF_RECORD_AUXTRACE size: 0x200000 offset: 0 ref: 0x1891ad0e idx: 1 tid: 2227 cpu: 1
.
. ... ARM SPE data: size 2097152 bytes
. 00000000: 49 00 LD
. 00000002: b2 c0 3b 29 0f 00 00 ff ff VA 0xffff00000f293bc0
. 0000000b: b3 c0 eb 24 fb 00 00 00 80 PA 0xfb24ebc0 ns=1
. 00000014: 9a 00 00 LAT 0 XLAT
. 00000017: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS
. 00000019: b0 00 c4 15 08 00 00 ff ff PC 0xff00000815c400 el3 ns=1
. 00000022: 98 00 00 LAT 0 TOT
. 00000025: 71 36 6c 21 2c 09 00 00 00 TS 39395093558
. 0000002e: 49 00 LD
. 00000030: b2 80 3c 29 0f 00 00 ff ff VA 0xffff00000f293c80
. 00000039: b3 80 ec 24 fb 00 00 00 80 PA 0xfb24ec80 ns=1
. 00000042: 9a 00 00 LAT 0 XLAT
. 00000045: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS
. 00000047: b0 f4 11 16 08 00 00 ff ff PC 0xff0000081611f4 el3 ns=1
. 00000050: 98 00 00 LAT 0 TOT
. 00000053: 71 36 6c 21 2c 09 00 00 00 TS 39395093558
. 0000005c: 48 00 INSN-OTHER
. 0000005e: 42 02 EV RETIRED
. 00000060: b0 2c ef 7f 08 00 00 ff ff PC 0xff0000087fef2c el3 ns=1
. 00000069: 98 00 00 LAT 0 TOT
. 0000006c: 71 d1 6f 21 2c 09 00 00 00 TS 39395094481
...
Other release notes:
- applies to acme's perf/{core,urgent} branches, likely elsewhere
- Report is self-contained within the tool.
Record requires enabling the kernel SPE driver by
setting CONFIG_ARM_SPE_PMU.
- The intel-bts implementation was used as a starting point; its
min/default/max buffer sizes and power of 2 pages granularity need to be
revisited for ARM SPE
- Recording across multiple SPE clusters/domains not supported
- Snapshot support (record -S), and conversion to native perf events
(e.g., via 'perf inject --itrace'), are also not supported
- Technically both cs-etm and spe can be used simultaneously, however
disabled for simplicity in this release
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Reviewed-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180114132850.0b127434b704a26bad13268f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If a field is a dynamic string, get_field_str() returned just the
offset/size value and not the string. Have it parse the offset/size
correctly to return the actual string. Otherwise filtering fails when
trying to filter fields that are dynamic strings.
Reported-by: Gopanapalli Pradeep <prap_hai@yahoo.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20180112004823.146333275@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the FILTER_TRIVIAL_FALSE case has a missing break statement,
if the trivial type is FALSE, it will also run into the TRUE case, and
always be skipped as the TRUE statement will continue the loop on the
inverse condition of the FALSE statement.
Reported-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20180112004823.012918807@goodmis.org
Link: http://lkml.kernel.org/r/1493218540-12296-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It makes the code clearer and less error prone.
clearer:
- less code
- the code is now using the same format to create strings dynamically
less error prone:
- no magic number +2 +9 +5 to compute the size
- no copy&paste of the strings to compute the size and to concatenate
The function `asprintf` is not POSIX standard but the program
was already using it. Later it can be decided to use only POSIX
functions, then we can easly replace all the `asprintf(3)` with a local
implementation of that function.
Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Federico Vaga <federico.vaga@vaga.pv.it>
Link: http://lkml.kernel.org/r/20170802221558.9684-2-federico.vaga@vaga.pv.it
Link: http://lkml.kernel.org/r/20180112004822.686281649@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a record has an unrecognized type, an error message is reported,
but it would also be helpful to see the contents of that record. At
least show what it is in hex, instead of just showing a blank line.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20180112004822.542204577@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>