linux-xiaomi-chiron/kernel
Toke Høiland-Jørgensen 782347b6bc xdp: Add proper __rcu annotations to redirect map entries
XDP_REDIRECT works by a three-step process: the bpf_redirect() and
bpf_redirect_map() helpers will lookup the target of the redirect and store
it (along with some other metadata) in a per-CPU struct bpf_redirect_info.
Next, when the program returns the XDP_REDIRECT return code, the driver
will call xdp_do_redirect() which will use the information thus stored to
actually enqueue the frame into a bulk queue structure (that differs
slightly by map type, but shares the same principle). Finally, before
exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will
flush all the different bulk queues, thus completing the redirect.

Pointers to the map entries will be kept around for this whole sequence of
steps, protected by RCU. However, there is no top-level rcu_read_lock() in
the core code; instead drivers add their own rcu_read_lock() around the XDP
portions of the code, but somewhat inconsistently as Martin discovered[0].
However, things still work because everything happens inside a single NAPI
poll sequence, which means it's between a pair of calls to
local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could
document this intention by using rcu_dereference_check() with
rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and
lockdep to verify that everything is done correctly.

This patch does just that: we add an __rcu annotation to the map entry
pointers and remove the various comments explaining the NAPI poll assurance
strewn through devmap.c in favour of a longer explanation in filter.c. The
goal is to have one coherent documentation of the entire flow, and rely on
the RCU annotations as a "standard" way of communicating the flow in the
map code (which can additionally be understood by sparse and lockdep).

The RCU annotation replacements result in a fairly straight-forward
replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE()
becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the
proper constructs to cast the pointer back and forth between __rcu and
__kernel address space (for the benefit of sparse). The one complication is
that xskmap has a few constructions where double-pointers are passed back
and forth; these simply all gain __rcu annotations, and only the final
reference/dereference to the inner-most pointer gets changed.

With this, everything can be run through sparse without eliciting
complaints, and lockdep can verify correctness even without the use of
rcu_read_lock() in the drivers. Subsequent patches will clean these up from
the drivers.

[0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/
[1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210624160609.292325-6-toke@redhat.com
2021-06-24 19:41:15 +02:00
..
bpf xdp: Add proper __rcu annotations to redirect map entries 2021-06-24 19:41:15 +02:00
cgroup cgroup: fix spelling mistakes 2021-05-24 12:45:26 -04:00
configs drivers/char: remove /dev/kmem for good 2021-05-07 00:26:34 -07:00
debug printk changes for 5.13 2021-04-27 18:09:44 -07:00
dma Merge branch 'stable/for-linus-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2021-05-04 10:58:49 -07:00
entry A trivial cleanup of typo fixes. 2021-04-26 09:41:15 -07:00
events Merge branch 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-05-21 06:12:52 -10:00
gcov gcov: clang: drop support for clang-10 and older 2021-05-07 00:26:32 -07:00
irq gpio updates for v5.13 2021-05-05 12:39:29 -07:00
kcsan kcsan: Fix debugfs initcall return type 2021-05-18 10:58:02 -07:00
livepatch Livepatching changes for 5.13 2021-04-27 18:14:38 -07:00
locking locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal 2021-05-18 12:53:51 +02:00
power PM: sleep: fix typos in comments 2021-04-08 19:37:21 +02:00
printk kernel/printk.c: Fixed mundane typos 2021-03-30 15:34:17 +02:00
rcu Merge branches 'bitmaprange.2021.03.08a', 'fixes.2021.03.15a', 'kvfree_rcu.2021.03.08a', 'mmdumpobj.2021.03.08a', 'nocb.2021.03.15a', 'poll.2021.03.24a', 'rt.2021.03.08a', 'tasks.2021.03.08a', 'torture.2021.03.08a' and 'torturescript.2021.03.22a' into HEAD 2021-03-24 17:20:18 -07:00
sched sched/fair: Fix clearing of has_idle_cores flag in select_idle_cpu() 2021-05-12 10:41:28 +02:00
time alarmtimer: Check RTC features instead of ops 2021-05-11 21:28:04 +02:00
trace bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks 2021-06-02 21:59:22 +02:00
.gitignore .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
acct.c kernel/acct.c: use #elif instead of #end and #elif 2020-12-15 22:46:15 -08:00
async.c kernel/async.c: remove async_unregister_domain() 2021-05-07 00:26:33 -07:00
audit.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
audit.h audit: avoid -Wempty-body warning 2021-03-24 12:11:48 -04:00
audit_fsnotify.c audit_alloc_mark(): don't open-code ERR_CAST() 2021-02-23 10:25:27 -05:00
audit_tree.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
auditfilter.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
auditsc.c audit/stable-5.13 PR 20210426 2021-04-27 13:50:58 -07:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
cfi.c add support for Clang CFI 2021-04-08 16:04:20 -07:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu.c cpumask/hotplug: Fix cpu_dying() state tracking 2021-04-21 13:55:43 +02:00
cpu_pm.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
crash_core.c kdump: append uts_namespace.name offset to VMCOREINFO 2020-12-15 22:46:18 -08:00
crash_dump.c
cred.c kernel/cred.c: make init_groups static 2021-05-06 19:24:11 -07:00
delayacct.c
dma.c
exec_domain.c
exit.c do_wait: make PIDTYPE_PID case O(1) instead of O(n) 2021-05-06 19:24:13 -07:00
extable.c
fail_function.c fault-injection: handle EI_ETYPE_TRUE 2020-12-15 22:46:19 -08:00
fork.c kernel/fork.c: fix typos 2021-05-06 19:24:13 -07:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-03-27 14:09:10 -06:00
futex.c futex: Make syscall entry points less convoluted 2021-05-06 20:19:04 +02:00
gen_kheaders.sh kbuild: redo fake deps at include/config/*.h 2021-04-25 05:26:10 +09:00
groups.c groups: simplify struct group_info allocation 2021-02-26 09:41:03 -08:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work: record irq_work_queue() call stack 2021-04-30 11:20:42 -07:00
jump_label.c static_call: Fix static_call_update() sanity check 2021-03-19 13:16:44 +01:00
kallsyms.c kallsyms: strip ThinLTO hashes from static functions 2021-04-08 16:04:21 -07:00
kcmp.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt preempt: Introduce CONFIG_PREEMPT_DYNAMIC 2021-02-17 14:12:24 +01:00
kcov.c kernel: make kcov_common_handle consider the current context 2020-11-02 18:00:20 -08:00
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kexec_core.c kexec: dump kmessage before machine_kexec 2021-05-07 00:26:32 -07:00
kexec_elf.c
kexec_file.c kernel: kexec_file: fix error return code of kexec_calculate_store_digests() 2021-05-07 00:26:32 -07:00
kexec_internal.h kexec: move machine_kexec_post_load() to public interface 2021-02-22 12:33:26 +00:00
kheaders.c
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c kprobes: Fix to delay the kprobes jump optimization 2021-02-19 14:57:12 -05:00
ksysfs.c
kthread.c Scheduler updates for this cycle are: 2021-04-28 13:33:57 -07:00
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-02 00:43:35 +09:00
module-internal.h
module.c module: check for exit sections in layout_sections() instead of module_init_section() 2021-05-17 09:48:24 +02:00
module_signature.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module_signing.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c fixes-v5.11 2020-12-14 16:40:27 -08:00
padata.c padata: fix possible padata_works_lock deadlock 2020-09-04 17:51:55 +10:00
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c Modules updates for v5.11 2020-12-17 13:01:31 -08:00
pid.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
pid_namespace.c fixes-v5.11 2020-12-14 16:40:27 -08:00
profile.c kernel: Initialize cpumask before parsing 2021-04-10 13:35:54 +02:00
ptrace.c ptrace: make ptrace() fail if the tracee changed its pid unexpectedly 2021-05-12 10:45:22 -07:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c Revert "PM: ACPI: reboot: Use S5 for reboot" 2021-03-18 16:58:02 +01:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relay: allow the use of const callback structs 2020-12-15 22:46:18 -08:00
resource.c kernel/resource: fix return code check in __request_free_mem_region 2021-05-14 19:41:32 -07:00
resource_kunit.c resource: provide meaningful MODULE_LICENSE() in test suite 2020-11-25 18:52:35 +01:00
rseq.c rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs() 2021-04-14 18:04:09 +02:00
scftorture.c scftorture: Add debug output for wrong-CPU warning 2021-01-04 13:53:41 -08:00
scs.c scs: switch to vmapped shadow stacks 2020-12-01 10:30:28 +00:00
seccomp.c Merge branch 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-03 11:05:28 -07:00
signal.c Merge branch 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-05-21 06:12:52 -10:00
smp.c smp: Fix smp_call_function_single_async prototype 2021-05-06 15:33:49 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-01-22 15:09:42 +01:00
smpboot.h
softirq.c RCU changes for this cycle were: 2021-04-28 12:00:13 -07:00
stackleak.c stackleak: let stack_erasing_sysctl take a kernel pointer buffer 2020-09-19 13:13:39 -07:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-04-09 13:22:12 +02:00
stop_machine.c stop_machine: Add caller debug info to queue_stop_cpus_work 2021-03-23 16:01:58 +01:00
sys.c kernel/sys.c: fix typo 2021-05-07 00:26:34 -07:00
sys_ni.c Add Landlock, a new LSM from Mickaël Salaün <mic@linux.microsoft.com> 2021-05-01 18:50:44 -07:00
sysctl-test.c
sysctl.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2021-05-11 16:05:56 -07:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
test_kprobes.c
torture.c torture: Replace torture_init_begin string with %s 2021-03-08 14:22:28 -08:00
tracepoint.c tracepoints: Code clean up 2021-02-09 12:27:29 -05:00
tsacct.c
ucount.c fanotify: configurable limits via sysfs 2021-03-16 16:49:31 +01:00
uid16.c
uid16.h
umh.c kernel/umh.c: fix some spelling mistakes 2021-05-07 00:26:34 -07:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user-return-notifier.c
user.c user: Use generic ns_common::count 2020-08-19 14:14:12 +02:00
user_namespace.c kernel/user_namespace.c: fix typos 2021-05-07 00:26:34 -07:00
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-19 22:23:19 +01:00
utsname.c uts: Use generic ns_common::count 2020-08-19 14:13:20 +02:00
utsname_sysctl.c
watch_queue.c watch_queue: rectify kernel-doc for init_watch() 2021-01-26 11:16:34 +00:00
watchdog.c watchdog: reliable handling of timestamps 2021-05-22 15:09:07 -10:00
watchdog_hld.c
workqueue.c wq: handle VM suspension in stall detection 2021-05-20 12:58:30 -04:00
workqueue_internal.h