Commit graph

9009 commits

Author SHA1 Message Date
Paolo Bonzini
8b13c36493 KVM: introduce KVM_CAP_SET_GUEST_DEBUG2
This capability will allow the user to know which KVM_GUESTDBG_* bits
are supported.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401135451.1004564-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:02 -04:00
Jakub Kicinski
a8b06e9d40 ethtool: add interface to read RMON stats
Most devices maintain RMON (RFC 2819) stats - particularly
the "histogram" of packets received by size. Unlike other
RFCs which duplicate IEEE stats, the short/oversized frame
counters in RMON don't seem to match IEEE stats 1-to-1 either,
so expose those, too. Do not expose basic packet, CRC errors
etc - those are already otherwise covered.

Because standard defines packet ranges only up to 1518, and
everything above that should theoretically be "oversized"
- devices often create their own ranges.

Going beyond what the RFC defines - expose the "histogram"
in the Tx direction (assume for now that the ranges will
be the same).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:59:20 -07:00
Jakub Kicinski
bfad2b979d ethtool: add interface to read standard MAC Ctrl stats
Number of devices maintains the standard-based MAC control
counters for control frames. Add a API for those.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:59:20 -07:00
Jakub Kicinski
ca2244547e ethtool: add interface to read standard MAC stats
Most of the MAC statistics are included in
struct rtnl_link_stats64, but some fields
are aggregated. Besides it's good to expose
these clearly hardware stats separately.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:59:20 -07:00
Jakub Kicinski
f09ea6fb12 ethtool: add a new command for reading standard stats
Add an interface for reading standard stats, including
stats which don't have a corresponding control interface.

Start with IEEE 802.3 PHY stats. There seems to be only
one stat to expose there.

Define API to not require user space changes when new
stats or groups are added. Groups are based on bitset,
stats have a string set associated.

v1: wrap stats in a nest

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:59:20 -07:00
Namhyung Kim
d0d1dd6285 perf core: Add PERF_COUNT_SW_CGROUP_SWITCHES event
This patch adds a new software event to count context switches
involving cgroup switches.  So it's counted only if cgroups of
previous and next tasks are different.  Note that it only checks the
cgroups in the perf_event subsystem.  For cgroup v2, it shouldn't
matter anyway.

One can argue that we can do this by using existing sched_switch event
with eBPF.  But some systems might not have eBPF for some reason so
I'd like to add this as a simple way.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210210083327.22726-2-namhyung@kernel.org
2021-04-16 18:58:52 +02:00
Joerg Roedel
49d11527e5 Merge branches 'iommu/fixes', 'arm/mediatek', 'arm/smmu', 'arm/exynos', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next 2021-04-16 17:16:03 +02:00
Marco Elver
97ba62b278 perf: Add support for SIGTRAP on perf events
Adds bit perf_event_attr::sigtrap, which can be set to cause events to
send SIGTRAP (with si_code TRAP_PERF) to the task where the event
occurred. The primary motivation is to support synchronous signals on
perf events in the task where an event (such as breakpoints) triggered.

To distinguish perf events based on the event type, the type is set in
si_errno. For events that are associated with an address, si_addr is
copied from perf_sample_data.

The new field perf_event_attr::sig_data is copied to si_perf, which
allows user space to disambiguate which event (of the same type)
triggered the signal. For example, user space could encode the relevant
information it cares about in sig_data.

We note that the choice of an opaque u64 provides the simplest and most
flexible option. Alternatives where a reference to some user space data
is passed back suffer from the problem that modification of referenced
data (be it the event fd, or the perf_event_attr) can race with the
signal being delivered (of course, the same caveat applies if user space
decides to store a pointer in sig_data, but the ABI explicitly avoids
prescribing such a design).

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.net/
2021-04-16 16:32:41 +02:00
Marco Elver
fb6cc127e0 signal: Introduce TRAP_PERF si_code and si_perf to siginfo
Introduces the TRAP_PERF si_code, and associated siginfo_t field
si_perf. These will be used by the perf event subsystem to send signals
(if requested) to the task where an event occurred.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: Arnd Bergmann <arnd@arndb.de> # asm-generic
Link: https://lkml.kernel.org/r/20210408103605.1676875-6-elver@google.com
2021-04-16 16:32:41 +02:00
Marco Elver
2e498d0a74 perf: Add support for event removal on exec
Adds bit perf_event_attr::remove_on_exec, to support removing an event
from a task on exec.

This option supports the case where an event is supposed to be
process-wide only, and should not propagate beyond exec, to limit
monitoring to the original process image only.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210408103605.1676875-5-elver@google.com
2021-04-16 16:32:41 +02:00
Marco Elver
2b26f0aa00 perf: Support only inheriting events if cloned with CLONE_THREAD
Adds bit perf_event_attr::inherit_thread, to restricting inheriting
events only if the child was cloned with CLONE_THREAD.

This option supports the case where an event is supposed to be
process-wide only (including subthreads), but should not propagate
beyond the current process's shared environment.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YBvj6eJR%2FDY2TsEB@hirez.programming.kicks-ass.net/
2021-04-16 16:32:40 +02:00
Jakub Kicinski
be85dbfeb3 ethtool: add FEC statistics
Similarly to pause statistics add stats for FEC.

The IEEE standard mandates two sets of counters:
 - 30.5.1.1.17 aFECCorrectedBlocks
 - 30.5.1.1.18 aFECUncorrectableBlocks
where block is a block of bits FEC operates on.
Each of these counters is defined per lane (PCS instance).

Multiple vendors provide number of corrected _bits_ rather
than/as well as blocks.

This set adds the 2 standard-based block counters and a extra
one for corrected bits.

Counters are exposed to user space via netlink in new attributes.
Each attribute carries an array of u64s, first element is
the total count, and the following ones are a per-lane break down.

Much like with pause stats the operation will not fail when driver
does not implement the get_fec_stats callback (nor can the driver
fail the operation by returning an error). If stats can't be
reported the relevant attributes will be empty.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 17:08:29 -07:00
Linus Torvalds
c17a3066b4 dmaengine fixes for v5.12
Couple of dmaengine driver fixes for:
 - race and descriptor issue for xilinx driver
 - fix interrupt handling, wq state & cleanup, field sizes for
   completion, msix permissions for idxd driver
 - rumtim pm fix for tegra driver
 - double free fix in dma_async_device_register
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmB3F20ACgkQfBQHDyUj
 g0d4GhAAw24ujgoKFdtV11h8eVyHwbsP45saRD8FVZ81oQ/3PI60rwnfuBNRnv4y
 KkWxQj0TsV1XezJ2t16Ovoq/9y8m/x5pWgP0LxwYj463P8PP40p31WZJ1qaAcSCp
 elau7TaoUSPW5HmD+Rv9IKpzVwLR9VLu0nqL22aJV62zouTJ0CCBFC/tJO+qrnmx
 vsa8kE3B85M+KzONUlmYstclgYEMaH2IROnrY0fhY68Txg2BENl9+mT69YaY6btb
 UABhKCUlfHYlXqCZNNsXlPo0E+n/jjVnTUceblCWWMwz3bPuYlc2KQLz49Nd3dWQ
 CRj2WQ9YWBXcPntUOicLMOFEUiAyUEC+bRssCfLofqKltRDXI5vyq6Wdo23L1Dz5
 eXLqI9DgccX4uoDQNp72AIPR7WO2pJnWjeLf7+tiXQ+htyr/SLfaZm2jciI4DcvK
 NLczhwMo0aWb50XYPI1iyN+Rhohq/s724xHPx5JrlxJlacuXwHIqUs27GrMpdEQv
 7Cpn4Hb0orPVjDeMQLFUcN8L7OmxC3M+vfTTl35ACI0EyAaeMW5Bm8SsKY8FjmFz
 f6OioKE7wDlfUnJqqQyYaZoeaW40ofGJoVSB6cs3XyJXH6PNYoVD+rKw5joS06Jt
 PbKkSJm3ZyV/m4Dd+GALRqntpHuYB4AaJbDI/blPzRzCJj4olJI=
 =GduB
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
 "A couple of dmaengine driver fixes for:

   - race and descriptor issue for xilinx driver

   - fix interrupt handling, wq state & cleanup, field sizes for
     completion, msix permissions for idxd driver

   - runtime pm fix for tegra driver

   - double free fix in dma_async_device_register"

* tag 'dmaengine-fix-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: idxd: fix wq cleanup of WQCFG registers
  dmaengine: idxd: clear MSIX permission entry on shutdown
  dmaengine: plx_dma: add a missing put_device() on error path
  dmaengine: tegra20: Fix runtime PM imbalance on error
  dmaengine: Fix a double free in dma_async_device_register
  dmaengine: dw: Make it dependent to HAS_IOMEM
  dmaengine: idxd: fix wq size store permission state
  dmaengine: idxd: fix opcap sysfs attribute output
  dmaengine: idxd: fix delta_rec and crc size field for completion record
  dmaengine: idxd: Fix clobbering of SWERR overflow bit on writeback
  dmaengine: xilinx: dpdma: Fix race condition in done IRQ
  dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
2021-04-14 09:36:54 -07:00
Vivek Goyal
550a7d3bc0 fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID
When posix access ACL is set, it can have an effect on file mode and it can
also need to clear SGID if.

- None of caller's group/supplementary groups match file owner group.
AND
- Caller is not priviliged (No CAP_FSETID).

As of now fuser server is responsible for changing the file mode as
well. But it does not know whether to clear SGID or not.

So add a flag FUSE_SETXATTR_ACL_KILL_SGID and send this info with SETXATTR
to let file server know that sgid needs to be cleared as well.

Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Vivek Goyal
52a4c95f4d fuse: extend FUSE_SETXATTR request
Fuse client needs to send additional information to file server when it
calls SETXATTR(system.posix_acl_access), so add extra flags field to the
structure.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Toke Høiland-Jørgensen
441e8c66b2 bpf: Return target info when a tracing bpf_link is queried
There is currently no way to discover the target of a tracing program
attachment after the fact. Add this information to bpf_link_info and return
it when querying the bpf_link fd.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-1-toke@redhat.com
2021-04-13 18:18:57 -07:00
Daniel Vetter
213cc929cb Merge drm/drm-fixes into drm-next
msm-next pull request has a baseline with stuff from -fixes, roll
forward first.

Some simple conflicts in amdgpu, ttm and one in i915 where git gets
confused and tries to add the same function twice.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-04-13 23:15:09 +02:00
Peter Collingbourne
201698626f arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)
This change introduces a prctl that allows the user program to control
which PAC keys are enabled in a particular task. The main reason
why this is useful is to enable a userspace ABI that uses PAC to
sign and authenticate function pointers and other pointers exposed
outside of the function, while still allowing binaries conforming
to the ABI to interoperate with legacy binaries that do not sign or
authenticate pointers.

The idea is that a dynamic loader or early startup code would issue
this prctl very early after establishing that a process may load legacy
binaries, but before executing any PAC instructions.

This change adds a small amount of overhead to kernel entry and exit
due to additional required instruction sequences.

On a DragonBoard 845c (Cortex-A75) with the powersave governor, the
overhead of similar instruction sequences was measured as 4.9ns when
simulating the common case where IA is left enabled, or 43.7ns when
simulating the uncommon case where IA is disabled. These numbers can
be seen as the worst case scenario, since in more realistic scenarios
a better performing governor would be used and a newer chip would be
used that would support PAC unlike Cortex-A75 and would be expected
to be faster than Cortex-A75.

On an Apple M1 under a hypervisor, the overhead of the entry/exit
instruction sequences introduced by this patch was measured as 0.3ns
in the case where IA is left enabled, and 33.0ns in the case where
IA is disabled.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://linux-review.googlesource.com/id/Ibc41a5e6a76b275efbaa126b31119dc197b927a5
Link: https://lore.kernel.org/r/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-04-13 17:31:44 +01:00
Zhang Yunkai
655cdafdec lightnvm: remove duplicate include in lightnvm.h
'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h'
is duplicated.It is also included in the 5th and 7th line.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-13 09:16:12 -06:00
Marc Zyngier
d8f37d291c Merge branch 'kvm-arm64/ptp' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:41:22 +01:00
Pedro Tammela
5c50732900 libbpf: Clarify flags in ringbuf helpers
In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.

For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
notification to the process if needed.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210412192434.944343-1-pctammela@mojatatu.com
2021-04-12 21:28:33 -07:00
Joe Stringer
f3c45326ee bpf: Document PROG_TEST_RUN limitations
Per net/bpf/test_run.c, particular prog types have additional
restrictions around the parameters that can be provided, so document
these in the header.

I didn't bother documenting the limitation on duration for raw
tracepoints since that's an output parameter anyway.

Tested with ./tools/testing/selftests/bpf/test_doc_build.sh.

Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210410174549.816482-1-joe@cilium.io
2021-04-12 17:18:05 +02:00
Dave Jiang
4ac823e9cd dmaengine: idxd: fix delta_rec and crc size field for completion record
The delta_rec_size and crc_val in the completion record should
be 32bits and not 16bits.

Fixes: bfe1d56091 ("dmaengine: idxd: Init and probe for Intel data accelerators")
Reported-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/161645618572.2003490.14466173451736323035.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-12 13:26:03 +05:30
Greg Kroah-Hartman
14d34d2dbb Merge 5.12-rc7 into usb-next
We need the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-12 08:15:27 +02:00
Jens Axboe
b69de288e9 io_uring: allow events and user_data update of running poll requests
This adds two new POLL_ADD flags, IORING_POLL_UPDATE_EVENTS and
IORING_POLL_UPDATE_USER_DATA. As with the other POLL_ADD flag, these are
masked into sqe->len. If set, the POLL_ADD will have the following
behavior:

- sqe->addr must contain the the user_data of the poll request that
  needs to be modified. This field is otherwise invalid for a POLL_ADD
  command.

- If IORING_POLL_UPDATE_EVENTS is set, sqe->poll_events must contain the
  new mask for the existing poll request. There are no checks for whether
  these are identical or not, if a matching poll request is found, then it
  is re-armed with the new mask.

- If IORING_POLL_UPDATE_USER_DATA is set, sqe->off must contain the new
  user_data for the existing poll request.

A POLL_ADD with any of these flags set may complete with any of the
following results:

1) 0, which means that we successfully found the existing poll request
   specified, and performed the re-arm procedure. Any error from that
   re-arm will be exposed as a completion event for that original poll
   request, not for the update request.
2) -ENOENT, if no existing poll request was found with the given
   user_data.
3) -EALREADY, if the existing poll request was already in the process of
   being removed/canceled/completing.
4) -EACCES, if an attempt was made to modify an internal poll request
   (eg not one originally issued ass IORING_OP_POLL_ADD).

The usual -EINVAL cases apply as well, if any invalid fields are set
in the sqe for this command type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 17:42:00 -06:00
Jens Axboe
88e41cf928 io_uring: add multishot mode for IORING_OP_POLL_ADD
The default io_uring poll mode is one-shot, where once the event triggers,
the poll command is completed and won't trigger any further events. If
we're doing repeated polling on the same file or socket, then it can be
more efficient to do multishot, where we keep triggering whenever the
event becomes true.

This deviates from the usual norm of having one CQE per SQE submitted. Add
a CQE flag, IORING_CQE_F_MORE, which tells the application to expect
further completion events from the submitted SQE. Right now the only user
of this is POLL_ADD in multishot mode.

Since sqe->poll_events is using the space that we normally use for adding
flags to commands, use sqe->len for the flag space for POLL_ADD. Multishot
mode is selected by setting IORING_POLL_ADD_MULTI in sqe->len. An
application should expect more CQEs for the specificed SQE if the CQE is
flagged with IORING_CQE_F_MORE. In multishot mode, only cancelation or an
error will terminate the poll request, in which case the flag will be
cleared.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 17:41:59 -06:00
Vladyslav Tarasiuk
c781ff12a2 ethtool: Allow network drivers to dump arbitrary EEPROM data
Define get_module_eeprom_by_page() ethtool callback and implement
netlink infrastructure.

get_module_eeprom_by_page() allows network drivers to dump a part of
module's EEPROM specified by page and bank numbers along with offset and
length. It is effectively a netlink replacement for get_module_info()
and get_module_eeprom() pair, which is needed due to emergence of
complex non-linear EEPROM layouts.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:34:56 -07:00
Hang Lu
a7dc1e6f99 binder: tell userspace to dump current backtrace when detected oneway spamming
When async binder buffer got exhausted, some normal oneway transactions
will also be discarded and may cause system or application failures. By
that time, the binder debug information we dump may not be relevant to
the root cause. And this issue is difficult to debug if without the
backtrace of the thread sending spam.

This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway
spamming is detected, request to dump current backtrace. Oneway spamming
will be reported only once when exceeding the threshold (target process
dips below 80% of its oneway space, and current process is responsible for
either more than 50 transactions, or more than 50% of the oneway space).
And the detection will restart when the async buffer has returned to a
healthy state.

Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:52:04 +02:00
Johan Hovold
5a35b040d0 tty: add ASYNC_SPLIT_TERMIOS to deprecation mask
Callout devices are long-gone, but the ASYNC_SPLIT_TERMIOS flag was
never added to the deprecation mask.

Add it so that a warning is printed if it is ever used.

Fixes: 8a8ae62f82 ("tty: warn on deprecated serial flags")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20210407095208.31838-7-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:36:34 +02:00
Johan Hovold
d09845e98a tty: actually undefine superseded ASYNC flags
Some kernel-internal ASYNC flags have been superseded by tty-port flags
and should no longer be used by kernel drivers.

Fix the misspelled "__KERNEL__" compile guards which failed their sole
purpose to break out-of-tree drivers that have not yet been updated.

Fixes: 5c0517fefc ("tty: core: Undefine ASYNC_* flags superceded by TTY_PORT* flags")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20210407095208.31838-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 10:36:34 +02:00
Jakub Kicinski
8859a44ea0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

MAINTAINERS
 - keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
 - simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
 - trivial
include/linux/ethtool.h
 - trivial, fix kdoc while at it
include/linux/skmsg.h
 - move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
 - add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
 - trivial

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 20:48:35 -07:00
Linus Torvalds
4e04e7513b Networking fixes for 5.12-rc7, including fixes from can, ipsec,
mac80211, wireless, and bpf trees. No scary regressions here
 or in the works, but small fixes for 5.12 changes keep coming.
 
 Current release - regressions:
 
  - virtio: do not pull payload in skb->head
 
  - virtio: ensure mac header is set in virtio_net_hdr_to_skb()
 
  - Revert "net: correct sk_acceptq_is_full()"
 
  - mptcp: revert "mptcp: provide subflow aware release function"
 
  - ethernet: lan743x: fix ethernet frame cutoff issue
 
  - dsa: fix type was not set for devlink port
 
  - ethtool: remove link_mode param and derive link params
             from driver
 
  - sched: htb: fix null pointer dereference on a null new_q
 
  - wireless: iwlwifi: Fix softirq/hardirq disabling in
                       iwl_pcie_enqueue_hcmd()
 
  - wireless: iwlwifi: fw: fix notification wait locking
 
  - wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding
                             the rtnl dependency
 
 Current release - new code bugs:
 
  - napi: fix hangup on napi_disable for threaded napi
 
  - bpf: take module reference for trampoline in module
 
  - wireless: mt76: mt7921: fix airtime reporting and related
                            tx hangs
 
  - wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending
                                 config command
 
 Previous releases - regressions:
 
  - rfkill: revert back to old userspace API by default
 
  - nfc: fix infinite loop, refcount & memory leaks in LLCP sockets
 
  - let skb_orphan_partial wake-up waiters
 
  - xfrm/compat: Cleanup WARN()s that can be user-triggered
 
  - vxlan, geneve: do not modify the shared tunnel info when PMTU
                   triggers an ICMP reply
 
  - can: fix msg_namelen values depending on CAN_REQUIRED_SIZE
 
  - can: uapi: mark union inside struct can_frame packed
 
  - sched: cls: fix action overwrite reference counting
 
  - sched: cls: fix err handler in tcf_action_init()
 
  - ethernet: mlxsw: fix ECN marking in tunnel decapsulation
 
  - ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
 
  - ethernet: i40e: fix receiving of single packets in xsk zero-copy
                    mode
 
  - ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic
 
 Previous releases - always broken:
 
  - bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET
 
  - bpf: Refcount task stack in bpf_get_task_stack
 
  - bpf, x86: Validate computation of branch displacements
 
  - ieee802154: fix many similar syzbot-found bugs
     - fix NULL dereferences in netlink attribute handling
     - reject unsupported operations on monitor interfaces
     - fix error handling in llsec_key_alloc()
 
  - xfrm: make ipv4 pmtu check honor ip header df
 
  - xfrm: make hash generation lock per network namespace
 
  - xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp
               offload
 
  - ethtool: fix incorrect datatype in set_eee ops
 
  - xdp: fix xdp_return_frame() kernel BUG throw for page_pool
         memory model
 
  - openvswitch: fix send of uninitialized stack memory in ct limit
                 reply
 
 Misc:
 
  - udp: add get handling for UDP_GRO sockopt
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmBwyfAACgkQMUZtbf5S
 IruJ/BAAnjghw2kWXRCKK3Tkm0pi0zjaKvTS30AcKCW2+GnqSxTdiWNv+mxqFgnm
 YdduPKiGwLoDkA2i2d4EF8/HK6m+Q6bHcUbZ2npEm1ElkKfxCYGmocor8n2kD+a9
 je94VGYV7zytnxXw85V6/jFLDqOXXwhBfHhlDMVBZP8OyzUfbDKGorWmyGuy9GJp
 81bvzqN2bHUGIM0cDr+ol3eYw2ituGWgiqNfnq7z+/NVcYmD0EPChDRbp0jtH1ng
 dcoONI6YlymDEDpu/9GmyKL1ken9lcWoVdvv/aDGtP62x6SYDt5HKe3wAtJ+Kjbq
 jIPADxPx5BymYIZRBtdNR0rP66LycA7hDtM/C/h1WoihDXwpGeNUU4g0aJ+hsP5Q
 ldwJI1DJo79VbwM2c3Kg73PaphLcPD4RdwF0/ovFsl0+bTDfj8i93ah4Wnzj0Qli
 EMiSDEDNb51e9nkW+xu+FjLWmxHJvLOL/+VgHV5bPJJBob2fqnjAMj2PkPEuEtXY
 TPWEh9y3zaEyp/9tNx0cstGOt6Gf5DQ5Nk6tX6hMpJT/BeL8mju1jm0yPLZhMJjF
 LlTrJgXftfP/cjltdSm4aVqSU5okjHNYDhmHlNgvzih5mt+NVslRJfzwq62Vudqy
 C0kpmVdQNFkOB0UcqQihevZg9mvem3m/dYl+v/MV7Uq6r4s4M2A=
 =SHL0
 -----END PGP SIGNATURE-----

Merge tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.12-rc7, including fixes from can, ipsec,
  mac80211, wireless, and bpf trees.

  No scary regressions here or in the works, but small fixes for 5.12
  changes keep coming.

  Current release - regressions:

   - virtio: do not pull payload in skb->head

   - virtio: ensure mac header is set in virtio_net_hdr_to_skb()

   - Revert "net: correct sk_acceptq_is_full()"

   - mptcp: revert "mptcp: provide subflow aware release function"

   - ethernet: lan743x: fix ethernet frame cutoff issue

   - dsa: fix type was not set for devlink port

   - ethtool: remove link_mode param and derive link params from driver

   - sched: htb: fix null pointer dereference on a null new_q

   - wireless: iwlwifi: Fix softirq/hardirq disabling in
     iwl_pcie_enqueue_hcmd()

   - wireless: iwlwifi: fw: fix notification wait locking

   - wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding the
     rtnl dependency

  Current release - new code bugs:

   - napi: fix hangup on napi_disable for threaded napi

   - bpf: take module reference for trampoline in module

   - wireless: mt76: mt7921: fix airtime reporting and related tx hangs

   - wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending
     config command

  Previous releases - regressions:

   - rfkill: revert back to old userspace API by default

   - nfc: fix infinite loop, refcount & memory leaks in LLCP sockets

   - let skb_orphan_partial wake-up waiters

   - xfrm/compat: Cleanup WARN()s that can be user-triggered

   - vxlan, geneve: do not modify the shared tunnel info when PMTU
     triggers an ICMP reply

   - can: fix msg_namelen values depending on CAN_REQUIRED_SIZE

   - can: uapi: mark union inside struct can_frame packed

   - sched: cls: fix action overwrite reference counting

   - sched: cls: fix err handler in tcf_action_init()

   - ethernet: mlxsw: fix ECN marking in tunnel decapsulation

   - ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx

   - ethernet: i40e: fix receiving of single packets in xsk zero-copy
     mode

   - ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic

  Previous releases - always broken:

   - bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET

   - bpf: Refcount task stack in bpf_get_task_stack

   - bpf, x86: Validate computation of branch displacements

   - ieee802154: fix many similar syzbot-found bugs
       - fix NULL dereferences in netlink attribute handling
       - reject unsupported operations on monitor interfaces
       - fix error handling in llsec_key_alloc()

   - xfrm: make ipv4 pmtu check honor ip header df

   - xfrm: make hash generation lock per network namespace

   - xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp
     offload

   - ethtool: fix incorrect datatype in set_eee ops

   - xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory
     model

   - openvswitch: fix send of uninitialized stack memory in ct limit
     reply

  Misc:

   - udp: add get handling for UDP_GRO sockopt"

* tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (182 commits)
  net: fix hangup on napi_disable for threaded napi
  net: hns3: Trivial spell fix in hns3 driver
  lan743x: fix ethernet frame cutoff issue
  net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
  net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
  net: dsa: lantiq_gswip: Don't use PHY auto polling
  net: sched: sch_teql: fix null-pointer dereference
  ipv6: report errors for iftoken via netlink extack
  net: sched: fix err handler in tcf_action_init()
  net: sched: fix action overwrite reference counting
  Revert "net: sched: bump refcount for new action in ACT replace mode"
  ice: fix memory leak of aRFS after resuming from suspend
  i40e: Fix sparse warning: missing error code 'err'
  i40e: Fix sparse error: 'vsi->netdev' could be null
  i40e: Fix sparse error: uninitialized symbol 'ring'
  i40e: Fix sparse errors in i40e_txrx.c
  i40e: Fix parameters in aq_get_phy_register()
  nl80211: fix beacon head validation
  bpf, x86: Validate computation of branch displacements for x86-32
  bpf, x86: Validate computation of branch displacements for x86-64
  ...
2021-04-09 15:26:51 -07:00
David S. Miller
4438669eb7 bluetooth-next pull request for net-next:
- Proper support for BCM4330 and BMC4334
  - Various improvements for firmware download of Intel controllers
  - Update management interface revision to 20
  - Support for AOSP HCI vendor commands
  - Initial Virtio support
 
 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmBvL7UZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKU+eD/4qGiwnwQVia4hjnQ5PZ0c/
 sBmZvzzZE43RNRvy8gAwcVLypLaiIkEYml4ruVdF6RGoqaDA6kRB7D0tdk8EtZc4
 6h72LX0zefnyZNj/Fcl6+dKuq7bSHjdOK0G+oXFmpvYfScPsI6Oq0ntpYOFYwUE0
 cDdeZyb/sW4f6L5AXUgLjD9AIjzlBOJEU5HcRL/uY+E/fH1oe757/IhUDerXjTIh
 9EhY28mrI7KeoEcJ1NIgvjPnRCxPRLAKgrzc8NXkUsLeNZSIywrwjgRfTvCZ+qQf
 g/MmNv+n7yFlUzA7Fu2d8YAgRtgF9nZc45geqOSHqcSZURORSs0CzEoYinqwT7f0
 xdMBi/UoSzJsLksgv+OfXOyoQ4lPNJ47pcH/edmT/ZJ/Ai2yTqUUSODHQ3MLC4Dp
 kQj34thQvfTf5b6+9HwKmBjlfMK0QGPqWjp6dth2bzos/ugurDP+XimAv0urvFz+
 NkES5kCMUs/1z7Yh+Nj1MyU3Wfdwol0wOh+PXBTAw2fxSy6kmKYGyLw7HEdWegB0
 K2w/ZS58g0EGK89ZSiqlZKmXtAd22dQqS6JlZje+9YTtEtRWu+rAF1dddS8vJDjl
 Rq3VSQ1B6Qw0rdGpyfxdgm/sWwq5S1YgQQcAZgVaeo9ZiIhfscVD5v5L0Ol6k9ms
 3eCK/2brsX0IaV61eVnoOw==
 =jiCz
 -----END PGP SIGNATURE-----

Merge tag 'for-net-next-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Proper support for BCM4330 and BMC4334
 - Various improvements for firmware download of Intel controllers
 - Update management interface revision to 20
 - Support for AOSP HCI vendor commands
 - Initial Virtio support
====================

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 14:19:32 -07:00
Marek Behún
53f111cbfa net: phy: add constants for 2.5G and 5G speed in PCS speed register
Add constants for 2.5G and 5G speed in PCS speed register into mdio.h.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:15:33 -07:00
Vamsi Krishna
d84d13d6f6 nl80211: Add interface to indicate TDLS peer's HE capability
Enhance enum nl80211_tdls_peer_capability to configure TDLS peer's
support for HE mode. Userspace decodes the TDLS setup response frame
and confugures the HE mode support to driver if the peer has advertized
HE mode support in TDLS setup response frame. The driver uses this
information to decide whether to include HE operation IE in TDLS setup
confirmation frame.

Signed-off-by: Vamsi Krishna <vamsin@codeaurora.org>
Link: https://lore.kernel.org/r/1614696636-30144-1-git-send-email-vamsin@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-04-08 13:07:02 +02:00
Marcel Holtmann
afd2daa26c Bluetooth: Add support for virtio transport driver
This adds support for Bluetooth HCI transport over virtio.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2021-04-08 12:26:34 +02:00
James Prestwood
0750cfd8b7 nl80211: better document CMD_ROAM behavior
The docs were very sparse with how exactly CMD_ROAM should be
used. Specifically related to BSS information normally obtained
through a user space scan.

Signed-off-by: James Prestwood <prestwoj@gmail.com>
Link: https://lore.kernel.org/r/20210311230333.103934-1-prestwoj@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-04-08 12:04:08 +02:00
Johannes Berg
71826654ce rfkill: revert back to old userspace API by default
Recompiling with the new extended version of struct rfkill_event
broke systemd in *two* ways:
 - It used "sizeof(struct rfkill_event)" to read the event, but
   then complained if it actually got something != 8, this broke
   it on new kernels (that include the updated API);
 - It used sizeof(struct rfkill_event) to write a command, but
   didn't implement the intended expansion protocol where the
   kernel returns only how many bytes it accepted, and errored
   out due to the unexpected smaller size on kernels that didn't
   include the updated API.

Even though systemd has now been fixed, that fix may not be always
deployed, and other applications could potentially have similar
issues.

As such, in the interest of avoiding regressions, revert the
default API "struct rfkill_event" back to the original size.

Instead, add a new "struct rfkill_event_ext" that extends it by
the new field, and even more clearly document that applications
should be prepared for extensions in two ways:
 * write might only accept fewer bytes on older kernels, and
   will return how many to let userspace know which data may
   have been ignored;
 * read might return anything between 8 (the original size) and
   whatever size the application sized its buffer at, indicating
   how much event data was supported by the kernel.

Perhaps that will help avoid such issues in the future and we
won't have to come up with another version of the struct if we
ever need to extend it again.

Applications that want to take advantage of the new field will
have to be modified to use struct rfkill_event_ext instead now,
which comes with the danger of them having already been updated
to use it from 'struct rfkill_event', but I found no evidence
of that, and it's still relatively new.

Cc: stable@vger.kernel.org # 5.11
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v12.0.0-r4 (x86-64)
Link: https://lore.kernel.org/r/20210319232510.f1a139cfdd9c.Ic5c7c9d1d28972059e132ea653a21a427c326678@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-04-08 10:14:45 +02:00
Jakub Kicinski
d9c65de0c1 ethtool: fix kdoc in headers
Fix remaining issues with kdoc in the ethtool headers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:22:49 -07:00
Jakub Kicinski
83e5feeb38 ethtool: document reserved fields in the uAPI
Add a note on expected handling of reserved fields,
and references to all kdocs. This fixes a bunch
of kdoc warnings.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:22:49 -07:00
Jakub Kicinski
f0ebc2b6b7 ethtool: un-kdocify extended link state
Extended link state structures and enums use kdoc headers
but then do not describe any of the members.

Convert to normal comments.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:22:49 -07:00
Jianyong Wu
3bf725699b KVM: arm64: Add support for the KVM PTP service
Implement the hypervisor side of the KVM PTP interface.

The service offers wall time and cycle count from host to guest.
The caller must specify whether they want the host's view of
either the virtual or physical counter.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com
2021-04-07 16:33:20 +01:00
Christoph Hellwig
b392a19891 vfio/pci: remove vfio_pci_nvlink2
This driver never had any open userspace (which for VFIO would include
VM kernel drivers) that use it, and thus should never have been added
by our normal userspace ABI rules.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20210326061311.1497642-2-hch@lst.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06 11:54:13 -06:00
Stanimir Varbanov
1ad0de78e7 media: v4l: Add HDR10 static metadata controls
Introduce Content light level and Mastering display colour
volume Colorimetry compound controls with relevant payload
structures and validation.

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-04-06 14:56:00 +02:00
Stanimir Varbanov
f31b9ffd96 media: v4l: Add new Colorimetry Class
Add Colorimetry control class for colorimetry controls

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-04-06 14:55:06 +02:00
Stanimir Varbanov
b52051a409 media: v4l2-ctrl: Add decoder conceal color control
Add decoder v4l2 control to set conceal color.

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-04-06 14:53:31 +02:00
Dikshita Agarwal
f15c54cf3f media: v4l2-ctrl: add controls for long term reference.
Long Term Reference (LTR) frames are the frames that are encoded
sometime in the past and stored in the DPB buffer list to be used
as reference to encode future frames.
This change adds controls to enable this feature.

Signed-off-by: Dikshita Agarwal <dikshita@codeaurora.org>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-04-06 14:49:02 +02:00
Dafna Hirschfeld
b98fc6b6c0 media: rkisp1: uapi: document which flags/structs relate to statistics config
Modify the documentation to point out which flags and structs are
used to configure the statistics.

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: Sebastian Fricke <sebastian.fricke@posteo.net>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-04-06 10:29:56 +02:00
Suzuki K Poulose
7dde51767c perf: aux: Add CoreSight PMU buffer formats
CoreSight PMU supports aux-buffer for the ETM tracing. The trace
generated by the ETM (associated with individual CPUs, like Intel PT)
is captured by a separate IP (CoreSight TMC-ETR/ETF until now).

The TMC-ETR applies formatting of the raw ETM trace data, as it
can collect traces from multiple ETMs, with the TraceID to indicate
the source of a given trace packet.

Arm Trace Buffer Extension is new "sink" IP, attached to individual
CPUs and thus do not provide additional formatting, like TMC-ETR.

Additionally, a system could have both TRBE *and* TMC-ETR for
the trace collection. e.g, TMC-ETR could be used as a single
trace buffer to collect data from multiple ETMs to correlate
the traces from different CPUs. It is possible to have a
perf session where some events end up collecting the trace
in TMC-ETR while the others in TRBE. Thus we need a way
to identify the type of the trace for each AUX record.

Define the trace formats exported by the CoreSight PMU.
We don't define the flags following the "ETM" as this
information is available to the user when issuing
the session. What is missing is the additional
formatting applied by the "sink" which is decided
at the runtime and the user may not have a control on.

So we define :
 - CORESIGHT format (indicates the Frame format)
 - RAW format (indicates the format of the source)

The default value is CORESIGHT format for all the records
(i,e == 0). Add the RAW format for others that use
raw format.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20210405164307.1720226-3-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-04-05 11:23:20 -06:00
Suzuki K Poulose
547b60988e perf: aux: Add flags for the buffer format
Allocate a byte for advertising the PMU specific format type
of the given AUX record. A PMU could end up providing hardware
trace data in multiple format in a single session.

e.g, The format of hardware buffer produced by CoreSight ETM
PMU depends on the type of the "sink" device used for collection
for an event (Traditional TMC-ETR/Bs with formatting or
TRBEs without any formatting).

 # Boring story of why this is needed. Goto The_End_of_Story for skipping.

CoreSight ETM trace allows instruction level tracing of Arm CPUs.
The ETM generates the CPU excecution trace and pumps it into CoreSight
AMBA Trace Bus and is collected by a different CoreSight component
(traditionally CoreSight TMC-ETR /ETB/ETF), called "sink".
Important to note that there is no guarantee that every CPU has
a dedicated sink.  Thus multiple ETMs could pump the trace data
into the same "sink" and thus they apply additional formatting
of the trace data for the user to decode it properly and attribute
the trace data to the corresponding ETM.

However, with the introduction of Arm Trace buffer Extensions (TRBE),
we now have a dedicated per-CPU architected sink for collecting the
trace. Since the TRBE is always per-CPU, it doesn't apply any formatting
of the trace. The support for this driver is under review [1].

Now a system could have a per-cpu TRBE and one or more shared
TMC-ETRs on the system. A user could choose a "specific" sink
for a perf session (e.g, a TMC-ETR) or the driver could automatically
select the nearest sink for a given ETM. It is possible that
some ETMs could end up using TMC-ETR (e.g, if the TRBE is not
usable on the CPU) while the others using TRBE in a single
perf session. Thus we now have "formatted" trace collected
from TMC-ETR and "unformatted" trace collected from TRBE.
However, we don't get into a situation where a single event
could end up using TMC-ETR & TRBE. i.e, any AUX buffer is
guaranteed to be either RAW or FORMATTED, but not a mix
of both.

As for perf decoding, we need to know the type of the data
in the individual AUX buffers, so that it can set up the
"OpenCSD" (library for decoding CoreSight trace) decoder
instance appropriately. Thus the perf.data file must conatin
the hints for the tool to decode the data correctly.

Since this is a runtime variable, and perf tool doesn't have
a control on what sink gets used (in case of automatic sink
selection), we need this information made available from
the PMU driver for each AUX record.

 # The_End_of_Story

Cc: Peter Ziljstra <peterz@infradead.org>
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Cc: will@kernel.org
Cc: mark.rutland@arm.com
Cc: mike.leach@linaro.org
Cc: acme@kernel.org
Cc: jolsa@redhat.com
Cc: Mathieu Poirier <mathieu.poirer@linaro.org>
Reviewed by: Mike Leach <mike.leach@linaro.org>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20210405164307.1720226-2-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-04-05 11:20:56 -06:00