Commit graph

932869 commits

Author SHA1 Message Date
Yonghong Song
9c5f8a1008 bpf: Support variable length array in tracing programs
In /proc/net/ipv6_route, we have
  struct fib6_info {
    struct fib6_table *fib6_table;
    ...
    struct fib6_nh fib6_nh[0];
  }
  struct fib6_nh {
    struct fib_nh_common nh_common;
    struct rt6_info **rt6i_pcpu;
    struct rt6_exception_bucket *rt6i_exception_bucket;
  };
  struct fib_nh_common {
    ...
    u8 nhc_gw_family;
    ...
  }

The access:
  struct fib6_nh *fib6_nh = &rt->fib6_nh;
  ... fib6_nh->nh_common.nhc_gw_family ...

This patch ensures such an access is handled properly.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175916.2476853-1-yhs@fb.com
2020-05-09 17:05:27 -07:00
Yonghong Song
1d68f22b3d bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundary
This specifically to handle the case like below:
   // ptr below is a socket ptr identified by PTR_TO_BTF_ID
   u64 param[2] = { ptr, val };
   bpf_seq_printf(seq, fmt, sizeof(fmt), param, sizeof(param));

In this case, the 16 bytes stack for "param" contains:
   8 bytes for ptr with spilled PTR_TO_BTF_ID
   8 bytes for val as STACK_MISC

The current verifier will complain the ptr should not be visible
to the helper.
   ...
   16: (7b) *(u64 *)(r10 -64) = r2
   18: (7b) *(u64 *)(r10 -56) = r1
   19: (bf) r4 = r10
   ;
   20: (07) r4 += -64
   ; BPF_SEQ_PRINTF(seq, fmt1, (long)s, s->sk_protocol);
   21: (bf) r1 = r6
   22: (18) r2 = 0xffffa8d00018605a
   24: (b4) w3 = 10
   25: (b4) w5 = 16
   26: (85) call bpf_seq_printf#125
    R0=inv(id=0) R1_w=ptr_seq_file(id=0,off=0,imm=0)
    R2_w=map_value(id=0,off=90,ks=4,vs=144,imm=0) R3_w=inv10
    R4_w=fp-64 R5_w=inv16 R6=ptr_seq_file(id=0,off=0,imm=0)
    R7=ptr_netlink_sock(id=0,off=0,imm=0) R10=fp0 fp-56_w=mmmmmmmm
    fp-64_w=ptr_
   last_idx 26 first_idx 13
   regs=8 stack=0 before 25: (b4) w5 = 16
   regs=8 stack=0 before 24: (b4) w3 = 10
   invalid indirect read from stack off -64+0 size 16

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175915.2476783-1-yhs@fb.com
2020-05-09 17:05:27 -07:00
Yonghong Song
492e639f0c bpf: Add bpf_seq_printf and bpf_seq_write helpers
Two helpers bpf_seq_printf and bpf_seq_write, are added for
writing data to the seq_file buffer.

bpf_seq_printf supports common format string flag/width/type
fields so at least I can get identical results for
netlink and ipv6_route targets.

For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
specifically indicates a write failure due to overflow, which
means the object will be repeated in the next bpf invocation
if object collection stays the same. Note that if the object
collection is changed, depending how collection traversal is
done, even if the object still in the collection, it may not
be visited.

For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to
read kernel memory. Reading kernel memory may fail in
the following two cases:
  - invalid kernel address, or
  - valid kernel address but requiring a major fault
If reading kernel memory failed, the %s string will be
an empty string and %p{i,I}{4,6} will be all 0.
Not returning error to bpf program is consistent with
what bpf_trace_printk() does for now.

bpf_seq_printf may return -EBUSY meaning that internal percpu
buffer for memory copy of strings or other pointees is
not available. Bpf program can return 1 to indicate it
wants the same object to be repeated. Right now, this should not
happen on no-RT kernels since migrate_disable(), which guards
bpf prog call, calls preempt_disable().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
b121b341e5 bpf: Add PTR_TO_BTF_ID_OR_NULL support
Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support.
For tracing/iter program, the bpf program context
definition, e.g., for previous bpf_map target, looks like
  struct bpf_iter__bpf_map {
    struct bpf_iter_meta *meta;
    struct bpf_map *map;
  };

The kernel guarantees that meta is not NULL, but
map pointer maybe NULL. The NULL map indicates that all
objects have been traversed, so bpf program can take
proper action, e.g., do final aggregation and/or send
final report to user space.

Add btf_id_or_null_non0_off to prog->aux structure, to
indicate that if the context access offset is not 0,
set to PTR_TO_BTF_ID_OR_NULL instead of PTR_TO_BTF_ID.
This bit is set for tracing/iter program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175912.2476576-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
eaaacd2391 bpf: Add task and task/file iterator targets
Only the tasks belonging to "current" pid namespace
are enumerated.

For task/file target, the bpf program will have access to
  struct task_struct *task
  u32 fd
  struct file *file
where fd/file is an open file for the task.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175911.2476407-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
138d0be35b net: bpf: Add netlink and ipv6_route bpf_iter targets
This patch added netlink and ipv6_route targets, using
the same seq_ops (except show() and minor changes for stop())
for /proc/net/{netlink,ipv6_route}.

The net namespace for these targets are the current net
namespace at file open stage, similar to
/proc/net/{netlink,ipv6_route} reference counting
the net namespace at seq_file open stage.

Since module is not supported for now, ipv6_route is
supported only if the IPV6 is built-in, i.e., not compiled
as a module. The restriction can be lifted once module
is properly supported for bpf_iter.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
6086d29def bpf: Add bpf_map iterator
Implement seq_file operations to traverse all bpf_maps.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175909.2476096-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
e5158d987b bpf: Implement common macros/helpers for target iterators
Macro DEFINE_BPF_ITER_FUNC is implemented so target
can define an init function to capture the BTF type
which represents the target.

The bpf_iter_meta is a structure holding meta data, common
to all targets in the bpf program.

Additional marker functions are called before or after
bpf_seq_read() show()/next()/stop() callback functions
to help calculate precise seq_num and whether call bpf_prog
inside stop().

Two functions, bpf_iter_get_info() and bpf_iter_run_prog(),
are implemented so target can get needed information from
bpf_iter infrastructure and can run the program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175907.2475956-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
367ec3e483 bpf: Create file bpf iterator
To produce a file bpf iterator, the fd must be
corresponding to a link_fd assocciated with a
trace/iter program. When the pinned file is
opened, a seq_file will be generated.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175906.2475893-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
ac51d99bf8 bpf: Create anonymous bpf iterator
A new bpf command BPF_ITER_CREATE is added.

The anonymous bpf iterator is seq_file based.
The seq_file private data are referenced by targets.
The bpf_iter infrastructure allocated additional space
at seq_file->private before the space used by targets
to store some meta data, e.g.,
  prog:       prog to run
  session_id: an unique id for each opened seq_file
  seq_num:    how many times bpf programs are queried in this session
  done_stop:  an internal state to decide whether bpf program
              should be called in seq_ops->stop() or not

The seq_num will start from 0 for valid objects.
The bpf program may see the same seq_num more than once if
 - seq_file buffer overflow happens and the same object
   is retried by bpf_seq_read(), or
 - the bpf program explicitly requests a retry of the
   same object

Since module is not supported for bpf_iter, all target
registeration happens at __init time, so there is no
need to change bpf_iter_unreg_target() as it is used
mostly in error path of the init function at which time
no bpf iterators have been created yet.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
fd4f12bc38 bpf: Implement bpf_seq_read() for bpf iterator
bpf iterator uses seq_file to provide a lossless
way to transfer data to user space. But we want to call
bpf program after all objects have been traversed, and
bpf program may write additional data to the
seq_file buffer. The current seq_read() does not work
for this use case.

Besides allowing stop() function to write to the buffer,
the bpf_seq_read() also fixed the buffer size to one page.
If any single call of show() or stop() will emit data
more than one page to cause overflow, -E2BIG error code
will be returned to user space.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175904.2475468-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
2057c92bc9 bpf: Support bpf tracing/iter programs for BPF_LINK_UPDATE
Added BPF_LINK_UPDATE support for tracing/iter programs.
This way, a file based bpf iterator, which holds a reference
to the link, can have its bpf program updated without
creating new files.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175902.2475262-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
de4e05cac4 bpf: Support bpf tracing/iter programs for BPF_LINK_CREATE
Given a bpf program, the step to create an anonymous bpf iterator is:
  - create a bpf_iter_link, which combines bpf program and the target.
    In the future, there could be more information recorded in the link.
    A link_fd will be returned to the user space.
  - create an anonymous bpf iterator with the given link_fd.

The bpf_iter_link can be pinned to bpffs mount file system to
create a file based bpf iterator as well.

The benefit to use of bpf_iter_link:
  - using bpf link simplifies design and implementation as bpf link
    is used for other tracing bpf programs.
  - for file based bpf iterator, bpf_iter_link provides a standard
    way to replace underlying bpf programs.
  - for both anonymous and free based iterators, bpf link query
    capability can be leveraged.

The patch added support of tracing/iter programs for BPF_LINK_CREATE.
A new link type BPF_LINK_TYPE_ITER is added to facilitate link
querying. Currently, only prog_id is needed, so there is no
additional in-kernel show_fdinfo() and fill_link_info() hook
is needed for BPF_LINK_TYPE_ITER link.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
15d83c4d7c bpf: Allow loading of a bpf_iter program
A bpf_iter program is a tracing program with attach type
BPF_TRACE_ITER. The load attribute
  attach_btf_id
is used by the verifier against a particular kernel function,
which represents a target, e.g., __bpf_iter__bpf_map
for target bpf_map which is implemented later.

The program return value must be 0 or 1 for now.
  0 : successful, except potential seq_file buffer overflow
      which is handled by seq_file reader.
  1 : request to restart the same object

In the future, other return values may be used for filtering or
teminating the iterator.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
ae24345da5 bpf: Implement an interface to register bpf_iter targets
The target can call bpf_iter_reg_target() to register itself.
The needed information:
  target:           target name
  seq_ops:          the seq_file operations for the target
  init_seq_private  target callback to initialize seq_priv during file open
  fini_seq_private  target callback to clean up seq_priv during file release
  seq_priv_size:    the private_data size needed by the seq_file
                    operations

The target name represents a target which provides a seq_ops
for iterating objects.

The target can provide two callback functions, init_seq_private
and fini_seq_private, called during file open/release time.
For example, /proc/net/{tcp6, ipv6_route, netlink, ...}, net
name space needs to be setup properly during file open and
released properly during file release.

Function bpf_iter_unreg_target() is also implemented to unregister
a particular target.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175859.2474669-1-yhs@fb.com
2020-05-09 17:05:25 -07:00
Sebastian Reichel
bf584e4dbd lib: Add linear ranges helper library and start using it
Series extracts a "linear ranges" helper out of the regulator
 framework. Linear ranges helper is intended to help converting
 real-world values to register values when conversion is linear. I
 suspect this is useful also for power subsystem and possibly for clk.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl61lKwTHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0HxeB/9dblIbC+28MvcEHXCcYBZBouOnrM8E
 bIOMXkgEj1uL78ozOm7tMCgEpaKgv6BHdDuClCBvjbr0uOtAi0qUiv0IBotuVrdo
 lq73l8l7OPz6TFFKIt8WsgwKnzdkuQC08+qrZasAdluRQnqnmkU2tvl2y9zaaaR4
 6hGw+Nwx/pgeCXCa3pu+rCYwA7g0Tf8a6DDC6LyQWZameBJ1ey/YDjhJEeSmY7P7
 306zs8YVxHhQMLUQ5T7DA6r/KWMNkO1SOueCqTjxWZc/XamGEcbsZG1cWrAnkoE2
 VKLXBtYC75coNxIiu8ZxnQwLLdz1EQPdtg0qmzSjXJ68QjbWWzf4K1ra
 =LG9G
 -----END PGP SIGNATURE-----

Merge tag 'tags/linear-ranges-lib' into psy-next

lib: Add linear ranges helper library and start using it

Series extracts a "linear ranges" helper out of the regulator
framework. Linear ranges helper is intended to help converting
real-world values to register values when conversion is linear. I
suspect this is useful also for power subsystem and possibly for clk.

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2020-05-10 01:56:03 +02:00
Linus Torvalds
2e28f3b13a RISC-V Fixes for 5.7-rc5
This contains a smattering of fixes and cleanups that I'd like to target for
 5.7:
 
 * Dead code removal.
 * Exporting riscv_cpuid_to_hartid_mask for modules.
 * Per-CPU tracking of ISA features.
 * Setting max_pfn correctly when probing memory.
 * Adding a note to the VDSO so glibc can check the kernel's version without a
   uname().
 * A fix to force the bootloader to initialize the boot spin tables, which still
   get used as a fallback when SBI-0.1 is enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl61prsTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYia2UD/44ILoaQySVnLZ+ZzXaMXn3WwGHe8bS
 NVPQJB21ejkfbM8cDR5A8+w45FBrHquIRwhHnVkl5JU2AtvcdWh3tztmFx6Ejsu9
 FFBzcbHcXnYthkm1xLVPQASY0Pl6VOPdx47Mip9gvoLK79VetjQWNzUpFk4CBJdw
 nObgYgxE9twCQ7JOcK0VnPL9IpJ6E/lCcIyCi11NL9xRWtUyWk4hcmAFj/+tUegm
 DroT7QzKKxFS24eLaRkJgQGwAJ1jb0/b0ztl04U8NTOqVjgFXkGTC1Kuzd06Ch2U
 U34CYRL+A2sXwWnnNsIyjD7Epdalc/xx+JMEuD8dhnr0YK8WilvvG53gGwCwFgVc
 wpFhvsIuINYTw253Rv0q1oeRcDmMCKmV7bhOKSX4x0V1iGM1ognl/6zkCY4J0dQC
 7BCoeAGlpBTNbidatZ6jl5e32jes50ZRjhf3LxXe3mgrBd+diKXyOyLT01SVwqv/
 A1Sur/KquwoqT4RSx2Cel8JswPhfErhB0otL3CYoao8V7rxYGTKWKXg5SFAgwDHZ
 rib1UpYmyh2tjmoXb99ctlBpRHsYcVzXOZS9tG7B2ue7YhEwiZdV3249uwitAQgm
 NmGCH7tDe/nu5DLBoFyTjBJ64pZyn3YmE58M/uCmbXyMRVSGp2TXK83u3mfiw+gh
 kKNSRHJDAAl7Fg==
 =bGU8
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "A smattering of fixes and cleanups:

   - Dead code removal.

   - Exporting riscv_cpuid_to_hartid_mask for modules.

   - Per-CPU tracking of ISA features.

   - Setting max_pfn correctly when probing memory.

   - Adding a note to the VDSO so glibc can check the kernel's version
     without a uname().

   - A fix to force the bootloader to initialize the boot spin tables,
     which still get used as a fallback when SBI-0.1 is enabled"

* tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Remove unused code from STRICT_KERNEL_RWX
  riscv: force __cpu_up_ variables to put in data section
  riscv: add Linux note to vdso
  riscv: set max_pfn to the PFN of the last page
  RISC-V: Remove N-extension related defines
  RISC-V: Add bitmap reprensenting ISA features common across CPUs
  RISC-V: Export riscv_cpuid_to_hartid_mask() API
2020-05-09 16:24:16 -07:00
Hongbo Yao
1072ceada4 power: reset: ltc2952: remove unused variable
Fix gcc '-Wunused-but-set-variable' warning:
drivers/power/reset/ltc2952-poweroff.c:97:16: warning: variable
‘overruns’ set but not used [-Wunused-but-set-variable]
  unsigned long overruns;

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Hongbo Yao <yaohongbo@huawei.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2020-05-10 01:20:41 +02:00
Christophe JAILLET
934ed3847a power: supply: lp8788: Fix an error handling path in 'lp8788_charger_probe()'
In the probe function, in case of error, resources allocated in
'lp8788_setup_adc_channel()' must be released.

This can be achieved easily by using the devm_ variant of
'iio_channel_get()'.
This has the extra benefit to simplify the remove function and to axe the
'lp8788_release_adc_channel()' function which is now useless.

Fixes: 98a2766493 ("power_supply: Add new lp8788 charger driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2020-05-10 01:20:41 +02:00
Jakub Kicinski
02a5043b22 Merge branch 'mlxsw-spectrum-Enforce-some-HW-limitations-for-matchall-TC-offload'
Ido Schimmel says:

====================
mlxsw: spectrum: Enforce some HW limitations for matchall TC offload

Jiri says:

There are some limitations for TC matchall classifier offload that are
given by the mlxsw HW dataplane. It is not possible to do sampling on
egress and also the mirror/sample vs. ACL (flower) ordering is fixed. So
check this and forbid to offload incorrect setup.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:03:39 -07:00
Jiri Pirko
aa7431123f selftests: mlxsw: tc_restrictions: add couple of test for the correct matchall-flower ordering
Make sure that the drive restricts incorrect order of inserted matchall
vs. flower rules.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
240fe73457 selftests: mlxsw: tc_restrictions: add test to check sample action restrictions
Check that matchall rules with sample actions are not possible to be
inserted to egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
b886dea37b selftests: mlxsw: rename tc_flower_restrictions.sh to tc_restrictions.sh
The file is about to contain matchall restrictions too, so change the
name to make it more generic.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
67ed68fc0c mlxsw: spectrum_flower: Forbid to insert flower rules in collision with matchall rules
On ingress, the matchall rules doing mirroring and sampling are offloaded
into hardware blocks that are processed before any flower rules.
On egress, the matchall mirroring rules are offloaded into hardware
block that is processed after all flower rules.

Therefore check the priorities of inserted flower rules against
existing matchall rules and ensure the correct ordering.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
18346b70ab mlxsw: spectrum_matchall: Forbid to insert matchall rules in collision with flower rules
On ingress, the matchall rules doing mirroring and sampling are offloaded
into hardware blocks that are processed before any flower rules.
On egress, the matchall mirroring rules are offloaded into hardware
block that is processed after all flower rules.

Therefore check the priorities of inserted matchall rules against
existing flower rules and ensure the correct ordering.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
aed65285fb mlxsw: spectrum_matchall: Expose a function to get min and max rule priority
Introduce an infrastructure that allows to get minimum and maximum
rule priority for specified chain. This is going to be used by
a subsequent patch to enforce ordering between flower and
matchall filters.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
5a2939b9d7 mlxsw: spectrum_matchall: Put matchall list into substruct of flow struct
As there are going to be other matchall specific fields in flow
structure, put the existing list field into matchall substruct.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
593bb84379 mlxsw: spectrum_flower: Expose a function to get min and max rule priority
Introduce an infrastructure that allows to get minimum and maximum
rule priority for specified chain. This is going to be used by
a subsequent patch to enforce ordering between flower and
matchall filters.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
18aa23b31f mlxsw: spectrum_matchall: Restrict sample action to be allowed only on ingress
HW supports packet sampling on ingress only. Check and fail if user
is adding sample on egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Linus Torvalds
1a263ae60b gcc-10: avoid shadowing standard library 'free()' in crypto
gcc-10 has started warning about conflicting types for a few new
built-in functions, particularly 'free()'.

This results in warnings like:

   crypto/xts.c:325:13: warning: conflicting types for built-in function ‘free’; expected ‘void(void *)’ [-Wbuiltin-declaration-mismatch]

because the crypto layer had its local freeing functions called
'free()'.

Gcc-10 is in the wrong here, since that function is marked 'static', and
thus there is no chance of confusion with any standard library function
namespace.

But the simplest thing to do is to just use a different name here, and
avoid this gcc mis-feature.

[ Side note: gcc knowing about 'free()' is in itself not the
  mis-feature: the semantics of 'free()' are special enough that a
  compiler can validly do special things when seeing it.

  So the mis-feature here is that gcc thinks that 'free()' is some
  restricted name, and you can't shadow it as a local static function.

  Making the special 'free()' semantics be a function attribute rather
  than tied to the name would be the much better model ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 15:58:04 -07:00
Arnd Bergmann
99352c79af net: freescale: select CONFIG_FIXED_PHY where needed
I ran into a randconfig build failure with CONFIG_FIXED_PHY=m
and CONFIG_GIANFAR=y:

x86_64-linux-ld: drivers/net/ethernet/freescale/gianfar.o:(.rodata+0x418): undefined reference to `fixed_phy_change_carrier'

It seems the same thing can happen with dpaa and ucc_geth, so change
all three to do an explicit 'select FIXED_PHY'.

The fixed-phy driver actually has an alternative stub function that
theoretically allows building network drivers when fixed-phy is
disabled, but I don't see how that would help here, as the drivers
presumably would not work then.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 15:47:21 -07:00
Linus Torvalds
adc7192096 gcc-10: disable 'restrict' warning for now
gcc-10 now warns about passing aliasing pointers to functions that take
restricted pointers.

That's actually a great warning, and if we ever start using 'restrict'
in the kernel, it might be quite useful.  But right now we don't, and it
turns out that the only thing this warns about is an idiom where we have
declared a few functions to be "printf-like" (which seems to make gcc
pick up the restricted pointer thing), and then we print to the same
buffer that we also use as an input.

And people do that as an odd concatenation pattern, with code like this:

    #define sysfs_show_gen_prop(buffer, fmt, ...) \
        snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)

where we have 'buffer' as both the destination of the final result, and
as the initial argument.

Yes, it's a bit questionable.  And outside of the kernel, people do have
standard declarations like

    int snprintf( char *restrict buffer, size_t bufsz,
                  const char *restrict format, ... );

where that output buffer is marked as a restrict pointer that cannot
alias with any other arguments.

But in the context of the kernel, that 'use snprintf() to concatenate to
the end result' does work, and the pattern shows up in multiple places.
And we have not marked our own version of snprintf() as taking restrict
pointers, so the warning is incorrect for now, and gcc picks it up on
its own.

If we do start using 'restrict' in the kernel (and it might be a good
idea if people find places where it matters), we'll need to figure out
how to avoid this issue for snprintf and friends.  But in the meantime,
this warning is not useful.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 15:45:21 -07:00
Linus Torvalds
5a76021c2e gcc-10: disable 'stringop-overflow' warning for now
This is the final array bounds warning removal for gcc-10 for now.

Again, the warning is good, and we should re-enable all these warnings
when we have converted all the legacy array declaration cases to
flexible arrays. But in the meantime, it's just noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 15:40:52 -07:00
Luo bin
72ef908bb3 hinic: add three net_device_ops of vf
adds ndo_set_vf_rate/ndo_set_vf_spoofchk/ndo_set_vf_link_state
to configure netdev of virtual function

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 15:28:21 -07:00
Keith Busch
92decf118f nvme: define constants for identification values
Improve code readability by defining the specification's constants that
the driver is using when decoding identification payloads.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Bart van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Chaitanya Kulkarni
d02abd1986 nvmet: align addrfam list to spec
With reference to the NVMeOF Specification (page 44, Figure 38)
discovery log page entry provides address family field. We do set the
transport type field but the adrfam field is not set when using loop
transport and also it doesn't have support in the nvme-cli. So when
reading discovery log page with a loop transport it leads to confusing
output.

As per the spec for adrfam value 254 is reserved for Intra Host
Transport i.e. loopback), we add a required macro in the protocol
header file, set default port disc addr entry's adrfam to
NVMF_ADDR_FAMILY_MAX, and update nvmet_addr_family configfs array for
show/store attribute.

Without this patch, setting adrfam to (ipv4/ipv6/ib/fc/loop/" ") we get
following output for nvme discover command from nvme-cli which is
confusing.
trtype:  loop
adrfam:  ipv4
trtype:  loop
adrfam:  ipv6
trtype:  loop
adrfam:  infiniband
trtype:  loop
adrfam:  fibre-channel
trtype:  loop		# ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam:  pci            # <----- pci for loop
trtype:  loop		# ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = " "
adrfam:  pci            # <----- pci for unrecognized

This patch fixes above output :-
trtype:  loop
adrfam:  ipv4
trtype:  loop
adrfam:  ipv6
trtype:  loop
adrfam:  infiniband
trtype:  loop
adrfam:  fibre-channel
trtype:  loop           # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam:  loop           # <----- loop for loop
trtype:  loop		# ${CFGFS_HOME}/config/nvmet/ports/adrfam = " "
adrfam:  unrecognized   # <----- unrecognized when invalid value

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Chaitanya Kulkarni
3ecb5faa07 nvmet: centralize port enable access for configfs
The configfs attributes which are supposed to set when port is disable
such as addr[addrfam|portid|traddr|treq|trsvcid|inline_data_size|trtype]
has repetitive check and generic error message printing.

This patch creates centralize helper to check and print an error
message that also accepts caller as a parameter. This makes error
message easy to parse for the user, removes the duplicate code and
makes it available for futures such scenarios.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Chaitanya Kulkarni
87628e2851 nvmet: use type-name map for address treq
Currently nvmet_addr_treq_[store|show]() uses switch and if else
ladder for address transport requirements to string and reverse
mapping. With addtion of the generic nvmet_type_name_map structure
we can get rid of the switch and if else ladder with string
duplication.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Chaitanya Kulkarni
84b8d0d7aa nvmet: use type-name map for ana states
Now that we have a generic type to name map for configfs, get rid of
the nvmet_ana_state_names structure and replace it with newly added
nvmet_type_name_map.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Chaitanya Kulkarni
7e764179c8 nvmet: use type-name map for address family
Right now nvmet_addr_adrfam_[store|show]() uses switch and if else
ladder for address family to string and reverse mapping which also
repeats the strings in show and store function.

With addition of generic nvmet_type_name_map structure we can now get rid
of the switch and if else ladder and string duplication.

Also, we add a newline in before found label in nvmet_addr_trtype_store()
which keeps goto label code consistent with
nvmet_allowed_hosts_drop_link(), nvmet_port_subsys_drop_link() and
nvmet_ana_group_ana_state_store().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Chaitanya Kulkarni
45e2f3c2d2 nvmet: add generic type-name mapping
This patch adds a new type to name mapping generic structure. It
replaces nvmet_transport_name with new generic mapping structure
nvmet_transport.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Christoph Hellwig
7890b9701b nvme-multipath: stop using ->queuedata
nvme-multipath already uses the gendisk private data, not need to
also set up the request_queue queuedata and use it in one place only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Sagi Grimberg
db5ad6b7f8 nvme-tcp: try to send request in queue_rq context
Today, nvme-tcp automatically schedules a send request
to a workqueue context, which is 1 more than we'd need
in case the socket buffer is wide open.

However, because we have async send activity (as a result
of r2t, or write_space callbacks), we need to synchronize
sends from possibly multiple contexts (ideally all running
on the same cpu though).

Thus, we only try to send directly from queue_rq in cases:
1. the send_list is empty
2. we can send it synchronously (i.e. not from the RX path)
3. we run on the same cpu as the queue->io_cpu to avoid
   contention on the send operation.

Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Sagi Grimberg
72e5d757c6 nvme-tcp: avoid scheduling io_work if we are already polling
When the user runs polled I/O, we shouldn't have to trigger
the workqueue to generate the receive work upon the .data_ready
upcall. This prevents a redundant context switch when the
application is already polling for completions.

Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Sagi Grimberg
386e5e6e1a nvme-tcp: use bh_lock in data_ready
data_ready may be invoked from send context or from
softirq, so need bh locking for that.

Fixes: 3f2304f8c6 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Weiping Zhang
2a5bcfdd41 nvme-pci: align io queue count with allocted nvme_queue in nvme_probe
Since commit 147b27e4bd ("nvme-pci: allocate device queues storage
space at probe"), nvme_alloc_queue does not alloc the nvme queues
itself anymore.

If the write/poll_queues module parameters are changed at runtime to
values larger than the number of allocated queues in nvme_probe,
nvme_alloc_queue will access unallocated memory.

Add a new nr_allocated_queues member to struct nvme_dev to record how
many queues were alloctated in nvme_probe to avoid using more than the
allocated queues after a reset following a change to the
write/poll_queues module parameters.

Also add nr_write_queues and nr_poll_queues members to allow refreshing
the number of write and poll queues based on a change to the module
parameters when resetting the controller.

Fixes: 147b27e4bd ("nvme-pci: allocate device queues storage space at probe")
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
[hch: add nvme_max_io_queues, update the commit message]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Keith Busch
54b2fcee1d nvme-pci: remove last_sq_tail
The nvme driver does not have enough tags to wrap the queue, and blk-mq
will no longer call commit_rqs() when there are no new submissions to
notify.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Keith Busch
74943d45ee nvme-pci: remove volatile cqes
The completion queue entry is not volatile once the phase is confirmed.
Remove the volatile keywords and check the phase using the appropriate
READ_ONCE() accessor, allowing the compiler to optimize the remaining
completion path.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Keith Busch
b04df85d9a nvme: flush scan work on passthrough commands
If a passthrough command causes the namespace inventory or capabilities
to change, flush the scan work that handles these changes so the driver
synchronizes with the user command's effects before returning the result
to user space.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00
Christoph Hellwig
6623c5b3df nvme: clean up error handling in nvme_init_ns_head
Use a common label for putting the nshead if needed and only convert
nvme status codes for the one case where it actually is needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:36 -06:00