Commit graph

996159 commits

Author SHA1 Message Date
Rong Chen
93ca696376 scripts/recordmcount.pl: support big endian for ARCH sh
The kernel test robot reported the following issue:

    CC [M]  drivers/soc/litex/litex_soc_ctrl.o
  sh4-linux-objcopy: Unable to change endianness of input file(s)
  sh4-linux-ld: cannot find drivers/soc/litex/.tmp_gl_litex_soc_ctrl.o: No such file or directory
  sh4-linux-objcopy: 'drivers/soc/litex/.tmp_mx_litex_soc_ctrl.o': No such file

The problem is that the format of input file is elf32-shbig-linux, but
sh4-linux-objcopy wants to output a file which format is elf32-sh-linux:

  $ sh4-linux-objdump -d drivers/soc/litex/litex_soc_ctrl.o | grep format
  drivers/soc/litex/litex_soc_ctrl.o:     file format elf32-shbig-linux

Link: https://lkml.kernel.org/r/20210210150435.2171567-1-rong.a.chen@intel.com
Link: https://lore.kernel.org/linux-mm/202101261118.GbbYSlHu-lkp@intel.com
Signed-off-by: Rong Chen <rong.a.chen@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-13 11:42:40 -08:00
Mike Rapoport
3c62cfdd10 m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMU
Recent changes that obsoleted DISCONTIGMEM on m68k switched the MMU
variant to use generic definitions of __pfn_to_phys() and __phys_to_pfn(),
but missed the !MMU variant which caused a build failure:

   drivers/media/common/videobuf2/videobuf2-dma-contig.c: In function 'vb2_dc_get_userptr':
   drivers/media/common/videobuf2/videobuf2-dma-contig.c:509:5: error: implicit declaration of function '__pfn_to_phys' [-Werror=implicit-function-declaration]
     509 |     __pfn_to_phys(nums[0]), size, buf->dma_dir, 0);
         |     ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

Enable __pfn_to_phys() and __phys_to_pfn() on !MMU builds.

Link: https://lkml.kernel.org/r/20210211232202.GS299309@linux.ibm.com
Fixes: 4bfc848e09 ("m68k/mm: enable use of generic memory_model.h for !DISCONTIGMEM")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-13 11:42:40 -08:00
Julian Wiedmann
2223318c28 s390/qdio: remove 'merge_pending' mechanism
For non-QEBSM devices, get_buf_states() merges PENDING and EMPTY buffers
into a single group of finished buffers. To allow the upper-layer driver
to differentiate between the two states, qdio_check_pending() looks at
each buffer's state again and sets the sbal_state flag to
QDIO_OUTBUF_STATE_FLAG_PENDING accordingly.

So effectively we're spending overhead on _every_ Output Queue
inspection, just to avoid some additional TX completion calls in case
a group of buffers has completed with mixed EMPTY / PENDING state.
Given that PENDING buffers should rarely occur, this is a bad trade-off.
In particular so as the additional checks in get_buf_states() affect
_all_ device types (even those that don't use the PENDING state).

Rip it all out, and just report the PENDING completions separately as
we already do for QEBSM devices.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:55 +01:00
Julian Wiedmann
7940eaf2e9 s390/qdio: improve handling of PENDING buffers for QEBSM devices
For QEBSM devices the 'merge_pending' mechanism in get_buf_states()
doesn't apply, and we can actually get SLSB_P_OUTPUT_PENDING returned.

So for this case propagating the PENDING state to the driver via the
queue's sbal_state doesn't make sense and creates unnecessary overhead.
Instead introduce a new QDIO_ERROR_* flag that gets passed to the
driver, and triggers the same processing as if the buffers were flagged
as QDIO_OUTBUF_STATE_FLAG_PENDING.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:55 +01:00
Julian Wiedmann
540936df44 s390/qdio: rework q->qdio_error indication
When inspecting a queue, any error is currently returned back through
the queue's qdio_error field. Turn this into a proper variable that gets
passed through the call chain, so that the lifetime is clear and the
error state can be accessed along the way.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:55 +01:00
Julian Wiedmann
3bf526e036 s390/qdio: inline qdio_kick_handler()
We don't kick the handler for Input Queues anymore. Move the remaining
code into its only caller.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:55 +01:00
Heiko Carstens
7ef37dd7bb s390/time: remove get_tod_clock_ext()
Remove get_tod_clock_ext() and the STORE_CLOCK_EXT_SIZE define. This
enforces all users of the existing low level functions to use a union
tod_clock.

This way there is now a compile time check for the correct time and
therefore also if the size of the argument matches what will be
written to by the STORE CLOCK EXTENDED instruction.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:55 +01:00
Heiko Carstens
fc4a925f77 s390/crypto: use store_tod_clock_ext()
Use store_tod_clock_ext() in order to be able to get rid
get_tod_clock_ext().

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:55 +01:00
Heiko Carstens
01f224b9d7 s390/hypfs: use store_tod_clock_ext()
Use store_tod_clock_ext() instead of get_tod_clock_ext().
Unfortunately one usage has to be converted to a cast, since
otherwise a uapi header file would have to be changed.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:55 +01:00
Heiko Carstens
d1deda6f2b s390/debug: use union tod_clock
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
2cfd7b73f5 s390/kvm: use union tod_clock
Use union tod_clock and get rid of the kvm specific struct
kvm_s390_tod_clock_ext which apparently was introduced for the same
purpose.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
169ceac429 s390/vdso: use union tod_clock
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
f8d8977a3d s390/time: convert tod_clock_base to union
Convert tod_clock_base to union tod_clock. This simplifies quite a bit
of code and also fixes a bug in read_persistent_clock64();

void read_persistent_clock64(struct timespec64 *ts)
{
        __u64 delta;

        delta = initial_leap_seconds + TOD_UNIX_EPOCH;
        get_tod_clock_ext(clk);
        *(__u64 *) &clk[1] -= delta;
        if (*(__u64 *) &clk[1] > delta)
                clk[0]--;
        ext_to_timespec64(clk, ts);
}

Assume &clk[1] == 3 and delta == 2; then after the substraction the if
condition becomes true and the epoch part of the clock is decremented
by one because of an assumed overflow, even though there is none.

Fix this by using 128 bit arithmetics and let the compiler do the
right thing:

void read_persistent_clock64(struct timespec64 *ts)
{
        union tod_clock clk;
        u64 delta;

        delta = initial_leap_seconds + TOD_UNIX_EPOCH;
        store_tod_clock_ext(&clk);
        clk.eitod -= delta;
        ext_to_timespec64(&clk, ts);
}

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
cc2c7db28f s390/time: introduce new store_tod_clock_ext()
Introduce new store_tod_clock_ext() function, which is the same like
store_tod_clock_ext_cc() except that it doesn't return a condition
code.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
530f639f1e s390/time: rename store_tod_clock_ext() and use union tod_clock
Rename store_tod_clock_ext() to store_tod_clock_ext_cc() to reflect
that it returns a condition code and also use union tod_clock as
parameter.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
e4101be56c s390/time: introduce union tod_clock
Introduce union tod_clock which is supposed to be used to decode and
access various fields of the result of STORE CLOCK EXTENDED.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:54 +01:00
Heiko Carstens
96c0a6a72d s390,alpha: switch to 64-bit ino_t
s390 and alpha are the only 64 bit architectures with a 32-bit ino_t.
Since this is quite unusual this causes bugs from time to time.

See e.g. commit ebce3eb2f7 ("ceph: fix inode number handling on
arches with 32-bit ino_t") for an example.

This (obviously) also prevents s390 and alpha to use 64-bit ino_t for
tmpfs. See commit b85a7a8bb5 ("tmpfs: disallow CONFIG_TMPFS_INODE64
on s390").

Therefore switch both s390 and alpha to 64-bit ino_t. This should only
have an effect on the ustat system call. To prevent ABI breakage
define struct ustat compatible to the old layout and change
sys_ustat() accordingly.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
efa5473590 s390: split cleanup_sie
The current code uses the address in %r11 to figure out whether
it was called from the machine check handler or from a normal
interrupt handler. Instead of doing this implicit logic (which
is mostly a leftover from the old critical cleanup approach)
just add a second label and use that.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
33ea04872d s390: use r13 in cleanup_sie as temp register
Instead of thrashing r11 which is normally our pointer to struct
pt_regs on the stack, use r13 as temporary register in the BR_EX
macro. r13 is already used in cleanup_sie, so no need to thrash
another register.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
26521412ae s390: fix kernel asce loading when sie is interrupted
If a machine check is coming in during sie, the PU saves the
control registers to the machine check save area. Afterwards
mcck_int_handler is called, which loads __LC_KERNEL_ASCE into
%cr1. Later the code restores %cr1 from the machine check area,
but that is wrong when SIE was interrupted because the machine
check area still contains the gmap asce. Instead it should return
with either __KERNEL_ASCE in %cr1 when interrupted in SIE or
the previous %cr1 content saved in the machine check save area.

Fixes: 87d5986345 ("s390/mm: remove set_fs / rework address space handling")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
b61b159512 s390: add stack for machine check handler
The previous code used the normal kernel stack for machine checks.
This is problematic when a machine check interrupts a system call
or interrupt handler right at the beginning where registers are set up.

Assume system_call is interrupted at the first instruction and a machine
check is triggered. The machine check handler is called, checks the PSW
to see whether it is coming from user space, notices that it is already
in kernel mode but %r15 still contains the user space stack. This would
lead to a kernel crash.

There are basically two ways of fixing that: Either using the 'critical
cleanup' approach which compares the address in the PSW to see whether
it is already at a point where the stack has been set up, or use an extra
stack for the machine check handler.

For simplicity, we will go with the second approach and allocate an extra
stack. This adds some memory overhead for large systems, but usually large
system have plenty of memory so this isn't really a concern. But it keeps
the mchk stack setup simple and less error prone.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
64985c3a22 s390: use WRITE_ONCE when re-allocating async stack
The code does:

S390_lowcore.async_stack = new + STACK_INIT_OFFSET;

But the compiler is free to first assign one value and
add the other value later. If a IRQ would be coming in
between these two operations, it would run with an invalid
stack. Prevent this by using WRITE_ONCE.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Sven Schnelle
b0d31159a4 s390: open code SWITCH_KERNEL macro
This is a preparation patch for two later bugfixes. In the past both
int_handler and machine check handler used SWITCH_KERNEL to switch to
the kernel stack. However, SWITCH_KERNEL doesn't work properly in machine
check context. So instead of adding more complexity to this macro, just
remove it.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-02-13 17:17:53 +01:00
Jens Axboe
41be53e94f io_uring: kill cached requests from exiting task closing the ring
Be nice and prune these upfront, in case the ring is being shared and
one of the tasks is going away. This is a bit more important now that
we account the allocations.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-13 09:11:04 -07:00
Jens Axboe
9a4fdbd8ee io_uring: add helper to free all request caches
We have three different ones, put it in a helper for easy calling. This
is in preparation for doing it outside of ring freeing as well. With
that in mind, also ensure that we do the proper locking for safe calling
from a context where the ring it still live.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-13 09:09:44 -07:00
Jens Axboe
68e68ee6e3 io_uring: allow task match to be passed to io_req_cache_free()
No changes in this patch, just allows a caller to pass in a targeted
task that we must match for freeing requests in the cache.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-13 09:00:02 -07:00
Tiezhu Yang
8fbf1d2759 MAINTAINERS: Add git tree for KVM/mips
There is no git tree for KVM/mips in MAINTAINERS, it is not
convinent to rebase, add it.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 10:14:51 +01:00
Jinyang He
b306c5f560 MIPS: Use common way to parse elfcorehdr
"elfcorehdr" can be parsed at kernel/crash_dump.c

Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:57:22 +01:00
Thomas Bogendoerfer
f1b0bf577f MIPS: Simplify EVA cache handling
protected_cache_op is only used for flushing user addresses, so
we only need to define protected_cache_op different in EVA mode and
be done with it.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:53:23 +01:00
Thomas Bogendoerfer
b1468f3071 Revert "MIPS: kernel: {ftrace,kgdb}: Set correct address limit for cache flushes"
This reverts commit 6ebda44f36.

All icache flushes in this code paths are done via flush_icache_range(),
which only uses normal cache instruction. And this is the correct thing
for EVA mode, too. So no need to do set_fs(KERNEL_DS) here.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-02-13 09:53:08 +01:00
Christoph Hellwig
4e0664416c MIPS: remove CONFIG_DMA_PERDEV_COHERENT
Just select DMA_NONCOHERENT and ARCH_HAS_SETUP_DMA_OPS from the
MIPS_GENERIC platform instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:51:46 +01:00
Christoph Hellwig
a86497d66d MIPS: remove CONFIG_DMA_MAYBE_COHERENT
CONFIG_DMA_MAYBE_COHERENT just guards two early init options now.  Just
enable them unconditionally for CONFIG_DMA_NONCOHERENT.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:51:45 +01:00
Christoph Hellwig
6d4e9a8efe driver core: lift dma_default_coherent into common code
Lift the dma_default_coherent variable from the mips architecture code
to the driver core.  This allows an architecture to sdefault all device
to be DMA coherent at run time, even if the kernel is build with support
for DMA noncoherent device.  By allowing device_initialize to set the
->dma_coherent field to this default the amount of arch hooks required
for this behavior can be greatly reduced.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:51:45 +01:00
Christoph Hellwig
14ac09a65e MIPS: refactor the runtime coherent vs noncoherent DMA indicators
Replace the global coherentio enum, and the hw_coherentio (fake) boolean
variables with a single boolean dma_default_coherent flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:51:45 +01:00
Christoph Hellwig
3440caf5f2 MIPS/alchemy: factor out the DMA coherent setup
Factor out a alchemy_dma_coherent helper that determines if the platform
is DMA coherent.  Also stop initializing the hw_coherentio variable, given
that is only ever set to a non-zero value by the malta setup code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:51:45 +01:00
Christoph Hellwig
04e4783fcc MIPS/malta: simplify plat_setup_iocoherency
Given that plat_mem_setup runs before earlyparams are handled and malta
selects CONFIG_DMA_MAYBE_COHERENT, coherentio can only be set to
IO_COHERENCE_DEFAULT at this point.  So remove the checking for other
options and merge plat_enable_iocoherency into plat_setup_iocoherency
to simplify the code a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:51:45 +01:00
Tiezhu Yang
7c86ff9925 MIPS: Add basic support for ptrace single step
In the current code, arch_has_single_step() is not defined on MIPS,
that means MIPS does not support instruction single-step for user mode.

Delve is a debugger for the Go programming language, the ptrace syscall
PtraceSingleStep() failed [1] on MIPS and then the single step function
can not work well, we can see that PtraceSingleStep() definition returns
ptrace(PTRACE_SINGLESTEP) [2].

So it is necessary to support ptrace single step on MIPS.

At the beginning, we try to use the Debug Single Step exception on the
Loongson 3A4000 platform, but it has no effect when set CP0_DEBUG SSt
bit, this is because CP0_DEBUG NoSSt bit is 1 which indicates no
single-step feature available [3], so this way which is dependent on the
hardware is almost impossible.

With further research, we find out there exists a common way used with
break instruction in arch/alpha/kernel/ptrace.c, it is workable.

For the above analysis, define arch_has_single_step(), add the common
function user_enable_single_step() and user_disable_single_step(), set
flag TIF_SINGLESTEP for child process, use break instruction to set
breakpoint.

We can use the following testcase to test it:
tools/testing/selftests/breakpoints/step_after_suspend_test.c

 $ make -C tools/testing/selftests TARGETS=breakpoints
 $ cd tools/testing/selftests/breakpoints

Without this patch:

 $ ./step_after_suspend_test -n
 TAP version 13
 1..4
 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
 ok 1 # SKIP CPU 0
 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
 ok 2 # SKIP CPU 1
 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
 ok 3 # SKIP CPU 2
 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
 ok 4 # SKIP CPU 3
 # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:4 error:0

With this patch:

 $ ./step_after_suspend_test -n
 TAP version 13
 1..4
 ok 1 CPU 0
 ok 2 CPU 1
 ok 3 CPU 2
 ok 4 CPU 3
 # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

[1] https://github.com/go-delve/delve/blob/master/pkg/proc/native/threads_linux.go#L50
[2] https://github.com/go-delve/delve/blob/master/vendor/golang.org/x/sys/unix/syscall_linux.go#L1573
[3] http://www.t-es-t.hu/download/mips/md00047f.pdf

Reported-by: Guoqi Chen <chenguoqi@loongson.cn>
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-02-13 09:49:19 +01:00
David S. Miller
773dc50d71 Merge branch 'Xilinx-axienet-updates'
Robert Hancock says:

====================
Xilinx axienet updates

Updates to the Xilinx AXI Ethernet driver to add support for an additional
ethtool operation, and to support dynamic switching between 1000BaseX and
SGMII interface modes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:38:53 -08:00
Robert Hancock
6c8f06bb2e net: axienet: Support dynamic switching between 1000BaseX and SGMII
Newer versions of the Xilinx AXI Ethernet core (specifically version 7.2 or
later) allow the core to be configured with a PHY interface mode of "Both",
allowing either 1000BaseX or SGMII modes to be selected at runtime. Add
support for this in the driver to allow better support for applications
which can use both fiber and copper SFP modules.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:38:53 -08:00
Robert Hancock
eceac9d259 dt-bindings: net: xilinx_axienet: add xlnx,switch-x-sgmii attribute
Document the new xlnx,switch-x-sgmii attribute which is used to indicate
that the Ethernet core supports dynamic switching between 1000BaseX and
SGMII.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:38:53 -08:00
Robert Hancock
66b51663cd net: axienet: hook up nway_reset ethtool operation
Hook up the nway_reset ethtool operation to the corresponding phylink
function so that "ethtool -r" can be supported.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:38:53 -08:00
Alexei Starovoitov
5e1d40b75e Merge branch 'Add support of pointer to struct in global'
Dmitrii Banshchikov says:

====================

This patchset adds support of pointers to type with known size among
global function arguments.

The motivation is to overcome the limit on the maximum number of allowed
arguments and avoid tricky and unoptimal ways of passing arguments.

A referenced type may contain pointers but access via such pointers
cannot be veirified currently.

v2 -> v3
 - Fix reg ID generation
 - Fix commit description
 - Fix typo
 - Fix tests

v1 -> v2:
 - Allow pointer to any type with known size rather than struct only
 - Allow pointer in global functions only
 - Add more tests
 - Fix wrapping and v1 comments
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-12 17:37:23 -08:00
Dmitrii Banshchikov
8b08807d03 selftests/bpf: Add unit tests for pointers in global functions
test_global_func9  - check valid pointer's scenarios
test_global_func10 - check that a smaller type cannot be passed as a
                     larger one
test_global_func11 - check that CTX pointer cannot be passed
test_global_func12 - check access to a null pointer
test_global_func13 - check access to an arbitrary pointer value
test_global_func14 - check that an opaque pointer cannot be passed
test_global_func15 - check that a variable has an unknown value after
		     it was passed to a global function by pointer
test_global_func16 - check access to uninitialized stack memory

test_global_func_args - check read and write operations through a pointer

Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-5-me@ubique.spb.ru
2021-02-12 17:37:23 -08:00
Dmitrii Banshchikov
e5069b9c23 bpf: Support pointers in global func args
Add an ability to pass a pointer to a type with known size in arguments
of a global function. Such pointers may be used to overcome the limit on
the maximum number of arguments, avoid expensive and tricky workarounds
and to have multiple output arguments.

A referenced type may contain pointers but indirect access through them
isn't supported.

The implementation consists of two parts.  If a global function has an
argument that is a pointer to a type with known size then:

  1) In btf_check_func_arg_match(): check that the corresponding
register points to NULL or to a valid memory region that is large enough
to contain the expected argument's type.

  2) In btf_prepare_func_args(): set the corresponding register type to
PTR_TO_MEM_OR_NULL and its size to the size of the expected type.

Only global functions are supported because allowance of pointers for
static functions might break validation. Consider the following
scenario. A static function has a pointer argument. A caller passes
pointer to its stack memory. Because the callee can change referenced
memory verifier cannot longer assume any particular slot type of the
caller's stack memory hence the slot type is changed to SLOT_MISC.  If
there is an operation that relies on slot type other than SLOT_MISC then
verifier won't be able to infer safety of the operation.

When verifier sees a static function that has a pointer argument
different from PTR_TO_CTX then it skips arguments check and continues
with "inline" validation with more information available. The operation
that relies on the particular slot type now succeeds.

Because global functions were not allowed to have pointer arguments
different from PTR_TO_CTX it's not possible to break existing and valid
code.

Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-4-me@ubique.spb.ru
2021-02-12 17:37:23 -08:00
Dmitrii Banshchikov
4ddb74165a bpf: Extract nullable reg type conversion into a helper function
Extract conversion from a register's nullable type to a type with a
value. The helper will be used in mark_ptr_not_null_reg().

Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-3-me@ubique.spb.ru
2021-02-12 17:37:23 -08:00
Dmitrii Banshchikov
feb4adfad5 bpf: Rename bpf_reg_state variables
Using "reg" for an array of bpf_reg_state and "reg[i + 1]" for an
individual bpf_reg_state is error-prone and verbose. Use "regs" for the
former and "reg" for the latter as other code nearby does.

Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-2-me@ubique.spb.ru
2021-02-12 17:37:23 -08:00
Robert Hancock
57baf8cc70 net: axienet: Handle deferred probe on clock properly
This driver is set up to use a clock mapping in the device tree if it is
present, but still work without one for backward compatibility. However,
if getting the clock returns -EPROBE_DEFER, then we need to abort and
return that error from our driver initialization so that the probe can
be retried later after the clock is set up.

Move clock initialization to earlier in the process so we do not waste as
much effort if the clock is not yet available. Switch to use
devm_clk_get_optional and abort initialization on any error reported.
Also enable the clock regardless of whether the controller is using an MDIO
bus, as the clock is required in any case.

Fixes: 09a0354cad ("net: axienet: Use clock framework to get device clock rate")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:36:31 -08:00
David S. Miller
762d17b991 Merge branch 'tcp-mem-pressure-vs-SO_RCVLOWAT'
Eric Dumazet says:

====================
tcp: mem pressure vs SO_RCVLOWAT

First patch fixes an issue for applications using SO_RCVLOWAT
to reduce context switches.

Second patch is a cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:28:26 -08:00
Eric Dumazet
05dc72aba3 tcp: factorize logic into tcp_epollin_ready()
Both tcp_data_ready() and tcp_stream_is_readable() share the same logic.

Add tcp_epollin_ready() helper to avoid duplication.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:28:26 -08:00
Eric Dumazet
f969dc5a88 tcp: fix SO_RCVLOWAT related hangs under mem pressure
While commit 24adbc1676 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint,
it missed to address issue caused by memory pressure.

1) If we are under memory pressure and socket receive queue is empty.
First incoming packet is allowed to be queued, after commit
76dfa60820 ("tcp: allow one skb to be received per socket under memory pressure")

But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat
is bigger than skb length.

2) Then, when next packet comes, it is dropped, and we directly
call sk->sk_data_ready().

3) If application is using poll(), tcp_poll() will then use
tcp_stream_is_readable() and decide the socket receive queue is
not yet filled, so nothing will happen.

Even when sender retransmits packets, phases 2) & 3) repeat
and flow is effectively frozen, until memory pressure is off.

Fix is to consider tcp_under_memory_pressure() to take care
of global memory pressure or memcg pressure.

Fixes: 24adbc1676 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun Roy <arjunroy@google.com>
Suggested-by: Wei Wang <weiwan@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:28:26 -08:00