Commit graph

753663 commits

Author SHA1 Message Date
Chuck Lever
8154ef2776 NFSD: Clean up legacy NFS WRITE argument XDR decoders
Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).

In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).

The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.

Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.

I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.

Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:16 -04:00
Chuck Lever
fff4080b2f nfsd: Trace NFSv4 COMPOUND execution
This helps record the identity and timing of the ops in each NFSv4
COMPOUND, replacing dprintk calls that did much the same thing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever
87c5942e8f nfsd: Add I/O trace points in the NFSv4 read proc
NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not invoked for NFSv4 reads.

Move the NFSD READ trace points to common helpers so that NFSv4
reads are captured.

Also, record any local I/O error that occurs, the total count of
bytes that were actually returned, and whether splice or vectored
read was used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever
d890be159a nfsd: Add I/O trace points in the NFSv4 write path
NFSv4 write compound processing invokes nfsd_vfs_write directly. The
trace points currently in nfsd_write are not effective for NFSv4
writes.

Move the trace points into the shared nfsd_vfs_write() helper.

After the I/O, we also want to record any local I/O error that
might have occurred, and the total count of bytes that were actually
moved (rather than the requested number).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever
f394b62b7b nfsd: Add "nfsd_" to trace point names
Follow naming convention used in client and in sunrpc layers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:14 -04:00
Chuck Lever
79e0b4e247 nfsd: Record request byte count, not count of vectors
Byte count is more helpful to know than vector count.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:14 -04:00
Chuck Lever
afa720a091 nfsd: Fix NFSD trace points
nfsd-1915  [003] 77915.780959: write_opened:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780960: write_io_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780964: write_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1

Byte swapping and knfsd_fh_hash() are not available in "trace-cmd
report", where the print format string is actually used. These
data transformations have to be done during the TP_fast_assign step.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:13 -04:00
Chuck Lever
55f5088c22 svc: Report xprt dequeue latency
Record the time between when a rqstp is enqueued on a transport
and when it is dequeued. This includes how long the rqstp waits on
the queue and how long it takes the kernel scheduler to wake a
nfsd thread to service it.

The svc_xprt_dequeue trace point is altered to include the number
of microseconds between xprt_enqueue and xprt_dequeue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:13 -04:00
Chuck Lever
aaba72cd4e sunrpc: Report per-RPC execution stats
Introduce a mechanism to report the server-side execution latency of
each RPC. The goal is to enable user space to filter the trace
record for latency outliers, build histograms, etc.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:12 -04:00
Chuck Lever
0b9547bf6b sunrpc: Re-purpose trace_svc_process
Currently, trace_svc_process has two call sites:

1. Just after a call to svc_send. svc_send already invokes
   trace_svc_send with the same arguments just before returning

2. Just before a call to svc_drop. svc_drop already invokes
   trace_svc_drop with the same arguments just after it is called

Therefore trace_svc_process does not provide any additional
information not already provided by these other trace points.

However, it would be useful to record the incoming RPC procedure.
So reuse trace_svc_process for this purpose.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:12 -04:00
Chuck Lever
ece200ddd5 sunrpc: Save remote presentation address in svc_xprt for trace events
TP_printk defines a format string that is passed to user space for
converting raw trace event records to something human-readable.

My user space's printf (Oracle Linux 7), however, does not have a
%pI format specifier. The result is that what is supposed to be an
IP address in the output of "trace-cmd report" is just a string that
says the field couldn't be displayed.

To fix this, adopt the same approach as the client: maintain a pre-
formated presentation address for occasions when %pI is not
available.

The location of the trace_svc_send trace point is adjusted so that
rqst->rq_xprt is not NULL when the trace event is recorded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:11 -04:00
Chuck Lever
41f306d0c2 sunrpc: Simplify trace_svc_recv
There doesn't seem to be a lot of value in calling trace_svc_recv
in the failing case.

1. There are two very common cases: one is the transport is not
ready, and the other is shutdown. Neither is terribly interesting.

2. The trace record for the failing case contains nothing but
the status code.

Therefore the trace point call site in the error exit is removed.
Since the trace point is now recording a length instead of a
status, rename the status field and remove the case that records a
zero XID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:11 -04:00
Chuck Lever
7dbb53baed sunrpc: Simplify do_enqueue tracing
There are three cases where svc_xprt_do_enqueue() returns without
waking an nfsd thread:

1. There is no work to do

2. The transport is already busy

3. There are no available nfsd threads

Only 3. is truly interesting. Move the trace point so it records
that there was work to do and either an nfsd thread was awoken, or
a free one could not found.

As an additional clean up, remove a redundant comment and a couple
of dprintk call sites.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:11 -04:00
Chuck Lever
caa3e106dc sunrpc: Move trace_svc_xprt_dequeue()
Reduce the amount of noise generated by trace_svc_xprt_dequeue by
moving it to the end of svc_get_next_xprt. This generates exactly
one trace event when a ready xprt is found, rather than spurious
events when there is no work to do. The empty events contain no
information that can't be obtained simply by tracing function calls
to svc_xprt_dequeue.

A small additional benefit is simplification of the svc_xprt_event
trace class, which no longer has to handle the case when the @xprt
parameter is NULL.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:10 -04:00
Chuck Lever
03edb90f57 sunrpc: Update show_svc_xprt_flags() to include recently added flags
XPT_KILL_TEMP was added by commit 546125d161 ("sunrpc: don't call
sleeping functions from the notifier block callbacks"), and
XPT_CONG_CTRL was added by commit 362142b258 ("sunrpc: flag
transports as having congestion control") .

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:10 -04:00
Chuck Lever
989f881ebf svc: Simplify ->xpo_secure_port
Clean up: Instead of returning a value that is used to set or clear
a bit, just make ->xpo_secure_port mangle that bit, and return void.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:09 -04:00
Chuck Lever
63a1b15693 sunrpc: Remove unneeded pointer dereference
Clean up: Noticed during code inspection that there is already a
local automatic variable "xprt" so dereferencing rqst->rq_xprt
again is unnecessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:09 -04:00
Stefan Agner
47299f79ea nfsd: use correct enum type in decode_cb_op_status
Use enum nfs_cb_opnum4 in decode_cb_op_status. This fixes warnings
seen with clang:
  fs/nfsd/nfs4callback.c:451:36: warning: implicit conversion from
      enumeration type 'enum nfs_cb_opnum4' to different enumeration
      type 'enum nfs_opnum4' [-Wenum-conversion]
        status = decode_cb_op_status(xdr, OP_CB_SEQUENCE, &cb->cb_seq_status);
                 ~~~~~~~~~~~~~~~~~~~      ^~~~~~~~~~~~~~

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:08 -04:00
Fengguang Wu
51d87bc2bf nfsd: fix boolreturn.cocci warnings
fs/nfsd/nfs4state.c:926:8-9: WARNING: return of 0/1 in function 'nfs4_delegation_exists' with return type bool
fs/nfsd/nfs4state.c:2955:9-10: WARNING: return of 0/1 in function 'nfsd4_compound_in_session' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 68b18f5294 ("nfsd: make nfs4_get_existing_delegation less confusing")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[bfields: also fix -EAGAIN]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:08 -04:00
Linus Torvalds
75dcc7ef95 spi: SPI updates for v4.17
A quiet release for SPI, some fixes and small updates for individual
 drivers with one bigger change from Linus Walleij which coverts the
 bitbanging SPI driver to use the GPIO descriptor API from Linus Walleij.
 Since GPIO descriptors were used by platform data this means there's a
 few changes in arch/ making relevant updates for a few platforms and one
 misc driver that are affected.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAlrCYDYTHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0DXhB/9Hueo2D1Pku0iAwq+yIAPvk2N0fyGP
 YWJWbDQGngSZEnU3dUzdfaOHh4MMiMgs0q8hxOYnmkIhhTs7cXsPj8x6rlZP8bee
 TtSEEgHSuHlVWyADquFIel8OCq8BWd7p4aQKqP3LaojCH/wqswxtn4I+RH/xw1Rd
 LIPgU12fxGUpRMO7uGNiMhqqkQDH1ZBFzCUYRPqQk2E+5hyZraQYoROgZkEBPzb1
 EWjLxLxytj/Zj4N5eQkW4BWfX5HKJb9CpwKIIdyNuybr14yC5FR83VxOf9wW4/pG
 39DIn8LtGTkuMRF4Yw5rPlC9KqnLpUDTnQCcOW4lt+z8QtZQ+LP3boxE
 =xNln
 -----END PGP SIGNATURE-----

Merge tag 'spi-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull SPI updates from Mark Brown:
 "A quiet release for SPI, some fixes and small updates for individual
  drivers with one bigger change from Linus Walleij which coverts the
  bitbanging SPI driver to use the GPIO descriptor API from Linus
  Walleij.

  Since GPIO descriptors were used by platform data this means there's a
  few changes in arch/ making relevant updates for a few platforms and
  one misc driver that are affected"

* tag 'spi-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (24 commits)
  MAINTAINERS: update Andi's e-mail
  spi: spi-atmel: Use correct enum for DMA transfer direction
  spi: sh-msiof: Document R-Car M3-N support
  spi: sh-msiof: Use correct enum for DMA transfer direction
  spi: sprd: Add the support of restarting the system
  spi: sprd: Simplify the transfer function
  spi: Fix unregistration of controller with fixed SPI bus number
  spi: rspi: use correct enum for DMA transfer direction
  spi: jcore: disable ref_clk after getting its rate
  spi: bcm-qspi: fIX some error handling paths
  spi: pxa2xx: Disable runtime PM if controller registration fails
  spi: tegra20-slink: use true and false for boolean values
  spi: Fix scatterlist elements size in spi_map_buf
  spi: atmel: init FIFOs before spi enable
  spi: orion: Prepare space for per-child options
  spi: orion: Make the error message greppable
  spi: orion: Rework GPIO CS handling
  spi: bcm2835aux: Avoid 64-bit arithmetic in xfer len calc
  spi: spi-gpio: Augment device tree bindings
  spi: spi-gpio: Rewrite to use GPIO descriptors
  ...
2018-04-03 12:06:21 -07:00
Arnaldo Carvalho de Melo
520d3f01ea perf annotate stdio2: Print more descriptive event information header
To match the recently added event header information to --tui, e.g.:

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 128  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 48617682
  _raw_spin_lock_irqsave() /proc/kcore
    0.78        nop
    7.03        push   %rbx
    3.12        pushfq
    6.25        pop    %rax
                nop
                mov    %rax,%rbx
    3.12        cli
                nop
                xor    %eax,%eax
                mov    $0x1,%edx
   79.69        lock   cmpxchg %edx,(%rdi)
                test   %eax,%eax
              ↓ jne    2b
                mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
              → callq  *ffffffffb30eaed0
                mov    %rbx,%rax
                pop    %rbx
              ← retq
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ujy46x7cldyhyxelyf2b9quy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03 16:05:13 -03:00
Patrik Torstensson
843f38d382 dm verity: add 'check_at_most_once' option to only validate hashes once
This allows platforms that are CPU/memory contrained to verify data
blocks only the first time they are read from the data device, rather
than every time.  As such, it provides a reduced level of security
because only offline tampering of the data device's content will be
detected, not online tampering.

Hash blocks are still verified each time they are read from the hash
device, since verification of hash blocks is less performance critical
than data blocks, and a hash block will not be verified any more after
all the data blocks it covers have been verified anyway.

This option introduces a bitset that is used to check if a block has
been validated before or not.  A block can be validated more than once
as there is no thread protection for the bitset.

These changes were developed and tested on entry-level Android Go
devices.

Signed-off-by: Patrik Torstensson <totte@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:29 -04:00
Mikulas Patocka
45354f1eb6 dm bufio: don't embed a bio in the dm_buffer structure
The bio structure consumes a substantial part of dm_buffer.  The bio
structure is only needed when doing I/O on the buffer, thus we don't
have to embed it in the buffer.

Allocate the bio structure only when doing I/O.

We don't need to create a bio_set because, in case of allocation
failure, dm-bufio falls back to using dm-io (which keeps its own
bio_set).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:29 -04:00
Mikulas Patocka
f51f2e0a7f dm bufio: support non-power-of-two block sizes
Support block sizes that are not a power-of-two (but they must be a
multiple of 512b).  As always, a slab cache is used for allocations.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:28 -04:00
Mikulas Patocka
359dbf19ab dm bufio: use slab cache for dm_buffer structure allocations
kmalloc padded to the next power of two, using a slab cache avoids this.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:27 -04:00
Mikulas Patocka
03b0293959 dm bufio: reorder fields in dm_buffer structure
Reorder fields in dm_buffer structure to improve packing and reduce
structure size.  The compiler allocates 32-bit integer for field 'enum
data_mode', so change it to unsigned char.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:26 -04:00
Mikulas Patocka
6b5e718cc1 dm bufio: relax alignment constraint on slab cache
The I/O buffer doesn't have to be aligned on block size granularity,
relax alignment to ARCH_KMALLOC_MINALIGN (required to allow DMA from
slab cache memory on some architectures).

Also, set SLAB_RECLAIM_ACCOUNT so that the memory allocated from the
cache is accounted as reclaimable and doesn't inflate the 'used' entry
in the free command.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:25 -04:00
Mikulas Patocka
21bb132767 dm bufio: remove code that merges slab caches
All slab allocators can merge duplicate caches.  So dm-bufio doesn't
need extra slab merging logic.  Instead it can just allocate one slab
cache per client and let the allocator merge them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:25 -04:00
Mikulas Patocka
eeb67a0ba0 dm bufio: get rid of slab cache name allocations
dm-bufio keeps the dm_bufio_cache_names array that holds names of the
slab caches.

Since the commit db265eca77 ("mm/sl[aou]b: Move duping of slab name to
slab_common.c"), the kernel automatically duplicates the slab cache name
when creating the slab cache, so we no longer have to keep the name
allocated.

Remove the code that allocates the slab names and keeps them around.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:24 -04:00
Mikulas Patocka
afa53df869 dm bufio: move dm-bufio.h to include/linux/
Move dm-bufio.h to include/linux/ so that external GPL'd DM target
modules can use it.

It is better to allow the use of dm-bufio than force external modules
to implement the equivalent buffered IO mechanism in some new way.  The
hope is this will encourage the use of dm-bufio; which will then make it
easier for a GPL'd external DM target module to be included upstream.

A couple dm-bufio EXPORT_SYMBOL exports have also been updated to use
EXPORT_SYMBOL_GPL.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:23 -04:00
Mikulas Patocka
1f013174b3 dm bufio: delete outdated comment
This comment was true when dm-bufio was written but, since 4.3, bios can
now have arbitrary size and the driver splits them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:22 -04:00
Denis Semakin
00716545c8 dm: add support for secure erase forwarding
Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's
data devices support secure erase.

Also, add support for secure erase to both the linear and striped
targets.

Signed-off-by: Denis Semakin <d.semakin@omprussia.ru>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:21 -04:00
Mike Snitzer
0519c71e8d dm: backfill abnormal IO support to non-splitting IO submission
Otherwise, these abnormal IOs would be sent to the DM target
regardless of whether the target advertised support for them.

Factor out __process_abnormal_io() from __split_and_process_non_flush()
so that discards, write same, etc may be conditionally processed.

Fixes: 978e51ba3 ("dm: optimize bio-based NVMe IO submission")
Cc: stable@vger.kernel.org # 4.16
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:20 -04:00
Heinz Mauelshagen
880bcce0dc dm raid: fix nosync status
Fix a race for "nosync" activations providing "aa.." device health
characters and "0/N" sync ratio rather than "AA..." and "N/N".  Occurs
when status for the raid set is retrieved during resume before the MD
sync thread starts and clears the MD_RECOVERY_NEEDED flag.

Cc: stable@vger.kernel.org # 4.16+
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:19 -04:00
Wang Sheng-Hui
8192a0cd76 dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:19 -04:00
Tycho Andersen
706dd22f12 dm stripe: get rid of a Variable Length Array (VLA)
Ideally, we'd like to get rid of all VLAs in the kernel and add -Wvla to
the build args: https://lkml.org/lkml/2018/3/7/621

This one is a simple case, since we don't actually need the VLA at all: we
can just iterate over the stripes twice, once to emit their names, and the
second time to emit status (i.e. trade memory for time). Since the number
of stripes is probably low, this is hopefully not that expensive.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:18 -04:00
Qu Wenruo
e5c4cb9b1b dm log writes: record metadata flag for better flags record
So developer could distinguish data and metadata bios easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:17 -04:00
Milan Broz
e16b4f99f0 dm integrity: fail early if required HMAC key is not available
Since crypto API commit 9fa68f6200 ("crypto: hash - prevent using keyed
hashes without setting key") dm-integrity cannot use keyed algorithms
without the key being set.

The dm-integrity recognizes this too late (during use of HMAC), so it
allows creation and formatting of superblock, but the device is in fact
unusable.

Fix it by detecting the key requirement in integrity table constructor.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:16 -04:00
Wang Sheng-Hui
2d77dafe23 dm: remove unused macro DM_MOD_NAME_SIZE
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:15 -04:00
Heinz Mauelshagen
afac6bd6d1 dm unstripe: remove unnecessary header includes
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:15 -04:00
Heinz Mauelshagen
91e065d8f2 dm unstripe: remove superfluous module init error path message
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Scott Bauer <Scott.Bauer@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:14 -04:00
Heinz Mauelshagen
ba5dfbb712 dm unstripe: add "dm-unstriped" module alias
This target's kernel module being named dm-unstripe.ko doesn't allow
lvm2's DM module autoload capability to load the dm-unstripe.ko
because lvm2 looks for dm-unstriped.ko due to the target name being
"unstriped".

Add the "dm-unstriped" module alias to resolve this oversight.

NOTE: this isn't needed for the "striped" target, despite its source
file being named dm-stripe.c, because it is part of dm-mod.ko.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:13 -04:00
Heinz Mauelshagen
2ae600cd15 dm unstripe: support non-power-of-2 chunk size
Address "FIXME: must support non power of 2 chunk_size, dm-stripe.c does".

Bump target version to indicate change.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Tested-by: Scott Bauer <Scott.Bauer@intel.com>
Reviewed-by: Scott Bauer <Scott.Bauer@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:12 -04:00
Mikulas Patocka
5059353df8 dm crypt: limit the number of allocated pages
dm-crypt consumes an excessive amount memory when the user attempts to
zero a dm-crypt device with "blkdiscard -z". The command "blkdiscard -z"
calls the BLKZEROOUT ioctl, it goes to the function __blkdev_issue_zeroout,
__blkdev_issue_zeroout sends a large amount of write bios that contain
the zero page as their payload.

For each incoming page, dm-crypt allocates another page that holds the
encrypted data, so when processing "blkdiscard -z", dm-crypt tries to
allocate the amount of memory that is equal to the size of the device.
This can trigger OOM killer or cause system crash.

Fix this by limiting the amount of memory that dm-crypt allocates to 2%
of total system memory. This limit is system-wide and is divided by the
number of active dm-crypt devices and each device receives an equal
share.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:11 -04:00
Mike Snitzer
1eb5fa849f dm: allow targets to return output from messages they are sent
Could be useful for a target to return stats or other information.
If a target does DMEMIT() anything to @result from its .message method
then it must return 1 to the caller.

Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:10 -04:00
Linus Torvalds
b51c4354df regulator: Updates for v4.17
A very small set of updates for the regulator API this time around,
 there's a few bug fixes and also:
 
  - Conversion of the regulator API to use GPIO descriptors rather than
    numbers from Linus Walleij.
  - New drivers for Marvell 88PG86x and Qualcomm PM8998 and PMI8998.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAlrCWysTHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0GRNB/46x233PDi9o0GWsVYyCy5BLwINq9xC
 K1pPNoTWN9BS6VkDwaQIXOXS0qIjf5OCMCFLcAc1wncGVQDEUUNb8ndDYb+zLksn
 Ra5n8vX5qmWHdZHyecy8gFqHzLbsNUvR0E62vtghsGSuM2EeKNcHOCnd852TkcF6
 Bh9GKHXIDvQcos1bHBTbkRfb/CM4uF9rzN38NdTiyxYVqvhdBbo4h33tcuKHrvHn
 vO3tkrnhUvDdX9YNbhIal/qecouX3KsZZuLdHt/l8ygSx2eoVEzdoA9uyvU9rywe
 FZkc45DIYYH7Yz+faFHVRMU4oBN8jwalVjlGTWBD8Iyje+BT2w0eFUCJ
 =/rLA
 -----END PGP SIGNATURE-----

Merge tag 'regulator-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator updates from Mark Brown:
 "A very small set of updates for the regulator API this time around,
  there's a few bug fixes and also:

   - Conversion of the regulator API to use GPIO descriptors rather than
     numbers from Linus Walleij.

   - New drivers for Marvell 88PG86x and Qualcomm PM8998 and PMI8998"

* tag 'regulator-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: qcom: smd: Add pm8998 and pmi8998 regulators
  regulator: core: Add missing blank line between functions
  regulator: qcom_smd: Drop regulator/{machine,of_regulator} includes
  regulator: giving regulator controlling gpios a non-empty label when used through the devicetree.
  regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'
  regulator: 88pg86x: new i2c dual regulator chip
  regulator: 88pg86x: add DT bindings document
  regulator: da9211: Pass descriptors instead of GPIO numbers
  regulator: da9055: Pass descriptor instead of GPIO number
  regulator: core: Support passing an initialized GPIO enable descriptor
  regulator: dt: regulator-name is required property
  regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
2018-04-03 11:52:16 -07:00
Dan Williams
243f29fe44 libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'
For debug, it is useful for bus providers to be able to retrieve the
'struct device' associated with an nd_region instance that it
registered. We already have to_nd_region() to perform the reverse cast
operation, in fact its duplicate declaration can be removed from the
private drivers/nvdimm/nd.h header.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-03 11:51:42 -07:00
Dan Williams
78727137fd nfit, address-range-scrub: fix scrub in-progress reporting
There is a small window whereby ARS scan requests can schedule work that
userspace will miss when polling scrub_show. Hold the init_mutex lock
over calls to report the status to close this potential escape. Also,
make sure that requests to cancel the ARS workqueue are treated as an
idle event.

Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Fixes: 37b137ff8c ("nfit, libnvdimm: allow an ARS scrub...")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-03 11:51:42 -07:00
Linus Torvalds
ffd776bf56 regmap: Updates for v4.17
This is a fairly large set of updates for regmap, mainly bugfixes.  The
 biggest bit of this is some fixes for the bulk operations code which
 had issues in some use cases, Charles Keepax has sorted them out.  We
 also gained the ability to use debugfs with syscon regmaps and to
 specify the clock to be used with MMIO regmaps.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAlrCVS8THGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0I8DCACDPubZk2MzaXHIa/jCNAtJty+lEYuK
 X7Q/dOqrUxOoLcpMQNH8cHc9UtQ01xrthOkcOIOwVandOf6VHea7NpNzqZIZwCh9
 K5kqpHMOs2Gmx39Kg3ng4lNmzDf/XUeheI8qPDyqAKb9fFQjD+F3bOVsiU7a3VOh
 oCGk81fdLEb5Xp2qKaIM3viZ7AemEJABRdKrIs/hghNv4DaHV2auIh8FJ9WFzGwK
 s+JAP2RIX9N6HR4mmnDUtfavEDQwrDXN3IDxDjxkgo2CT62e5D1wZ9ixUWM+q1RA
 74n3Mc5ZoMxAssNiuEOPRLWTbsPpGtJhKcN0QbUdViM8UMKwHSRvpFAS
 =CqUP
 -----END PGP SIGNATURE-----

Merge tag 'regmap-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "This is a fairly large set of updates for regmap, mainly bugfixes.

  The biggest bit of this is some fixes for the bulk operations code
  which had issues in some use cases, Charles Keepax has sorted them
  out. We also gained the ability to use debugfs with syscon regmaps and
  to specify the clock to be used with MMIO regmaps"

* tag 'regmap-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (21 commits)
  regmap: debugfs: Improve warning message on debugfs_create_dir() failure
  regmap: debugfs: Free map->debugfs_name when debugfs_create_dir() failed
  regmap: debugfs: Don't leak dummy names
  regmap: debugfs: Disambiguate dummy debugfs file name
  regmap: mmio: Add function to attach a clock
  regmap: Merge redundant handling in regmap_bulk_write
  regmap: Tidy up regmap_raw_write chunking code
  regmap: Move the handling for max_raw_write into regmap_raw_write
  regmap: Remove unnecessary printk for failed allocation
  regmap: Format data for raw write in regmap_bulk_write
  regmap: use debugfs even when no device
  regmap: Allow missing device in regmap_name_read_file()
  regmap: Use _regmap_read in regmap_bulk_read
  regmap: Tidy up regmap_raw_read chunking code
  regmap: Move the handling for max_raw_read into regmap_raw_read
  regmap: Use helper function for register offset
  regmap: Don't use format_val in regmap_bulk_read
  regmap: Correct comparison in regmap_cached
  regmap: Correct offset handling in regmap_volatile_range
  regmap-i2c: Off by one in regmap_i2c_smbus_i2c_read/write()
  ...
2018-04-03 11:46:38 -07:00
Arnaldo Carvalho de Melo
6920e2854e perf annotate browser: Show extra title line with event information
So at the top we'll have two lines, like this, from 'perf report':

  # perf report --group --ignore-vmlinux
=====================================================================================================
Samples: 46  of events 'cycles', 4000 Hz, Event count (approx.): 5154895
_raw_spin_lock_irqsave  /proc/kcore
Percent              │      nop
                     │      push   %rbx
  0.00  14.29   0.00 │      pushfq
  9.09   0.00   0.00 │      pop    %rax
  9.09   0.00  20.00 │      nop
                     │      mov    %rax,%rbx
                     │      cli
  4.55   7.14   0.00 │      nop
                     │      xor    %eax,%eax
                     │      mov    $0x1,%edx
                     │      lock   cmpxchg %edx,(%rdi)
 77.27  78.57  70.00 │      test   %eax,%eax
                     │    ↓ jne    2b
                     │      mov    %rbx,%rax
  0.00   0.00  10.00 │      pop    %rbx
                     │    ← retq
                     │2b:   mov    %eax,%esi
                     │    → callq  queued_spin_lock_slowpath
                     │      mov    %rbx,%rax
                     │      pop    %rbx
Press 'h' for help on│key bindings
=====================================================================================================

 9.09 + 9.09 + 4.55 + 77.27 = 100
14.29 + 7.14 + 78.57 = 100
20 + 70 + 10 = 100

We can do the math by using 't' to toggle from 'percent' to nr

=====================================================================================================
Samples: 46  of events 'cycles', 4000 Hz, Event count (approx.): 5154895
_raw_spin_lock_irqsave  /proc/kcore
Period                              │      nop
                                    │      push   %rbx
          0       79273           0 │      pushfq
     190455           0           0 │      pop    %rax
     198038           0        3045 │      nop
                                    │      mov    %rax,%rbx
                                    │      cli
     217233       32562           0 │      nop
                                    │      xor    %eax,%eax
                                    │      mov    $0x1,%edx
                                    │      lock   cmpxchg %edx,(%rdi)
    3421649      979174       28273 │      test   %eax,%eax
                                    │    ↓ jne    2b
                                    │      mov    %rbx,%rax
          0           0        5193 │      pop    %rbx
                                    │    ← retq
                                    │2b:   mov    %eax,%esi
                                    │    → callq  queued_spin_lock_slowpath
                                    │      mov    %rbx,%rax
                                    │      pop    %rbx
Press 'h' for help on│key bindings
=====================================================================================================

79273 + 190455 + 198038 + 3045 + 217233 + 32562 + 3421649 + 979174 + 28273 + 5193 = 5154895

Or number of samples:

=====================================================================================================
ooSamples: 46  of events 'cycles', 4000 Hz, Event count (approx.): 5154895
_raw_spin_lock_irqsave  /proc/kcore
Samples              │      nop
                     │      push   %rbx
     0      2      0 │      pushfq
     2      0      0 │      pop    %rax
     2      0      2 │      nop
                     │      mov    %rax,%rbx
                     │      cli
     1      1      0 │      nop
                     │      xor    %eax,%eax
                     │      mov    $0x1,%edx
                     │      lock   cmpxchg %edx,(%rdi)
    17     11      7 │      test   %eax,%eax
                     │    ↓ jne    2b
                     │      mov    %rbx,%rax
     0      0      1 │      pop    %rbx
                     │    ← retq
                     │2b:   mov    %eax,%esi
                     │    → callq  queued_spin_lock_slowpath
                     │      mov    %rbx,%rax
                     │      pop    %rbx
Press 'h' for help on key bindings
=====================================================================================================

2 + 2 + 2 + 2 + 1 + 1 + 17 + 11 + 7 + 1 = 46

Suggested-by: Martin Liška <mliska@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935
Link: https://lkml.kernel.org/n/tip-ezccyxld50wtwyt66np6aomo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03 15:23:11 -03:00