Now that the iterator can handle a concurrent writer, do not disable writing
to the ring buffer when there is an iterator present.
Link: http://lkml.kernel.org/r/20200317213416.759770696@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
With the introduction of 55c7c06210 ("ARM: dts: bcm283x: Fix vc4's
firmware bus DMA limitations") the firmware bus has to comply with
/soc's DMA limitations. Ultimately linking both buses to a same
dma-ranges property. The patch (and author) missed the fact that a bus'
#address-cells and #size-cells properties are not inherited, but set to
a fixed value which, in this case, doesn't match /soc's. This, although
not breaking Linux's DMA mapping functionality, generates ugly dtc
warnings.
Fix the issue by adding the correct address and size cells properties
under the firmware bus.
Fixes: 55c7c06210 ("ARM: dts: bcm283x: Fix vc4's firmware bus DMA limitations")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Link: https://lore.kernel.org/r/20200326134413.12298-1-nsaenzjulienne@suse.de
Move from requesting only full file layout segments, to requesting
layout segments that match our I/O size. This means the server is
still free to return a full file layout, but we will no longer
error out if it does not.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When starting to read or write with a layout segment, check that the
range matches our request.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Check that the number of mirrors, and the mirror information matches
before deciding to merge layout segments in pNFS/flexfiles.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fix up pnfs_layout_mark_request_commit() to alway reschedule the write
if the layout segment is invalid. Also minor cleanup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Move the pNFS commit related operations into a separate structure
that can be carried by the pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Lift filelayout_search_commit_reqs() into the generic pnfs/nfs code,
and add support for commit arrays.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Enable adding and lookup of per-layout segment commits in filelayout
and flexfilelayout.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add support for scanning the full list of per-layout segment commit
arrays to nfs_clear_pnfs_ds_commit_verifiers().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Instead of trying to save the commit verifiers and checking them against
previous writes, adopt the same strategy as for buffered writes, of
just checking the verifiers at commit time.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add a pNFS callback to allow the O_DIRECT code to release the DS
commitinfo when freeing the dreq.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_commit_pagelist().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_recover_commit_reqs().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_scan_commit_lists()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When we have multiple layout segments with different lists of mirrored
data, we need to track the commits on a per layout segment basis.
This patch adds a list to support this tracking in struct
pnfs_ds_commit_info.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When the ring buffer becomes writable for even when the trace file is read,
it must still not be resized. But since tracers can be activated while the
trace file is being read, the irqsoff tracer can modify the per CPU buffers,
and this can cause the reader of the trace file to update the wrong buffer's
resize disable bit, as the irqsoff tracer swaps out cpu buffers.
By making the resize disable per cpu_buffer, it makes the update follow the
per cpu_buffer even if it's swapped out with the snapshot buffer and keeps
the release of the trace file modifying the same data as the open did.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
With the help of previously added tracepoints we can now trace
report-zones, zone-write and zone-mgmt ops in null_blk_zoned.c.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds two new tracpoints for null_blk_zoned.c that allows us
to trace report-zones, zone-mgmt-op and zone-write operations which has
direct effect on the zone condition state machine.
Also, we update drivers/block/Makefile so that new null_blk related
tracefiles can be compiled.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a helper to stringify the zone conditions. We use this helper in the
next patch to track zone conditions in tracepoints.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use a different markup for the ERR_PTR, as %FOO doesn't work
if there are parenthesis. So, use, instead:
``ERR_PTR(-EINVAL)``
This fixes the following warning:
./drivers/gpio/gpiolib.c:139: WARNING: Inline literal start-string without end-string.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/51197e3568f073e22c280f0584bfa20b44436708.1584456635.git.mchehab+huawei@kernel.org
Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Function gfs2_recover_func grabs the sd_log_flush_lock rw_semaphore in
write mode. This is unnecessary because we only need to prevent log flush
from using sd_log_bio bio while it does. Therefore, a read lock will be
enough. This is a small step in cleaning up log flush.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, if the ail1 flush got stuck for some reason, there
were no clues as to why. This patch introduces a check for getting
stuck for more than a minute, and if it happens, it dumps the items
still remaining on the ail1 list.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
In function try_rgrp_unlink, we added a temporary lock of the
sd_log_flush_lock while searching the bitmaps. This protected us from
problems in which dinodes being freed were still in a state of flux
because the rgrp was in an active transaction. It was a kludge.
Now that we've straightened out the code for inode eviction, deletes,
and all the recovery mess, we no longer need this kludge.
This patch removes it, and should improve performance.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
We now get the quota data structure when opening a file writable and put it
when closing that writable file descriptor, so there no longer is a need for
gfs2_qa_{get,put} while we're holding a writable file descriptor.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Keeping reservations and quotas separate helps reviewing the code.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.
This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, multiple callers called gfs2_rsqa_alloc to force
the existence of a reservations structure and a quota data structure
if needed. However, now the reservations are handled separately, so
the quota data is only the quota data. So we eliminate the one in
favor of just calling gfs2_qa_alloc directly.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Replace open-coded versions of list_first_entry and list_last_entry with those
functions.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
When allocating a new inode, mark the iopen glock holder as uninitialized to
make sure gfs2_evict_inode won't fail after an incomplete create or lookup. In
gfs2_evict_inode, allow the inode glock to be NULL and remove the duplicate
iopen glock teardown code. In gfs2_inode_lookup, don't tear down things that
gfs2_evict_inode will already tear down.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
dm_clone_nr_of_hydrated_regions() returns the number of regions that
have been hydrated so far. In order to do so it employs bitmap_weight().
Until now, the return type of dm_clone_nr_of_hydrated_regions() was
unsigned long.
Because bitmap_weight() returns an int, in case BITS_PER_LONG == 64 and
the return value of bitmap_weight() is 2^31 (the maximum allowed number
of regions for a device), the result is sign extended from 32 bits to 64
bits and an incorrect value is displayed, in the status output of
dm-clone, as the number of hydrated regions.
Fix this by having dm_clone_nr_of_hydrated_regions() return an unsigned
int.
Fixes: 7431b7835f ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add missing casts when converting from regions to sectors.
In case BITS_PER_LONG == 32, the lack of the appropriate casts can lead
to overflows and miscalculation of the device sector.
As a result, we could end up discarding and/or copying the wrong parts
of the device, thus corrupting the device's data.
Fixes: 7431b7835f ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add overflow check for clone->nr_regions variable, which holds the
number of regions of the target.
The overflow can occur with sufficiently large devices, if BITS_PER_LONG
== 32. E.g., if the region size is 8 sectors (4K), the overflow would
occur for device sizes > 34359738360 sectors (~16TB).
This could result in multiple device sectors wrongly mapping to the same
region number, due to the truncation from 64 bits to 32 bits, which
would lead to data corruption.
Fixes: 7431b7835f ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There is a bug in the way dm-clone handles discards, which can lead to
discarding the wrong blocks or trying to discard blocks beyond the end
of the device.
This could lead to data corruption, if the destination device indeed
discards the underlying blocks, i.e., if the discard operation results
in the original contents of a block to be lost.
The root of the problem is the code that calculates the range of regions
covered by a discard request and decides which regions to discard.
Since dm-clone handles the device in units of regions, we don't discard
parts of a region, only whole regions.
The range is calculated as:
rs = dm_sector_div_up(bio->bi_iter.bi_sector, clone->region_size);
re = bio_end_sector(bio) >> clone->region_shift;
, where 'rs' is the first region to discard and (re - rs) is the number
of regions to discard.
The bug manifests when we try to discard part of a single region, i.e.,
when we try to discard a block with size < region_size, and the discard
request both starts at an offset with respect to the beginning of that
region and ends before the end of the region.
The root cause is the following comparison:
if (rs == re)
// skip discard and complete original bio immediately
, which doesn't take into account that 'rs' might be greater than 're'.
Thus, we then issue a discard request for the wrong blocks, instead of
skipping the discard all together.
Fix the check to also take into account the above case, so we don't end
up discarding the wrong blocks.
Also, add some range checks to dm_clone_set_region_hydrated() and
dm_clone_cond_set_range(), which update dm-clone's region bitmap.
Note that the aforementioned bug doesn't cause invalid memory accesses,
because dm_clone_is_range_hydrated() returns True for this case, so the
checks are just precautionary.
Fixes: 7431b7835f ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Initializing a dm-writecache device can take a long time when the
persistent memory device is large. Add cond_resched() to a few loops
to avoid warnings that the CPU is stuck.
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Sorry for the last minute patches, but a few things fell through the cracks
recently. I was on the fence about sending a late PR just for the M-mode
fixes, as we don't really have any users, but the last patch fixes the build
for Fedora which I consider pretty important. Given that the M-mode fixes
should be very low risk, I figured it's worth sending them along as well.
This passes my standard "boot in QEMU" test.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl59ZeUTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiRk7D/9lYDNBOu8SR+eWW+7RSSUHsEX3Ctqq
Q07RFSPCyjTXise/HINCuycEE10A1Ij1DxH3hRqiiMefQQ23PLNGETYsfF4/ojQj
6ngq0AlW7UBv0/iVzsUNUYeaMFG1R34ayG3+wbF0h0U3MWTBEBSTzUP+H8Dsh3QU
jPafwhPIVkweMiXWXf3M1TlpyGTWIH/VckPVB6PZEwe7UEFiVX82pz3hSBztUmwZ
sT9i/OaRtpLmNw7UHfh/mqok6ILLnfPLeF/+fbBY29yM49AAd1HL2QSiWdshYZ+s
SxwTqaKSait7WBrKlJBLTA33hNwaASMGe7Tnburmaz6Fy+DgOwskXsCBXOSTzRE6
LEd7XBc+jGygL5W6QX9Umi585Zr58qp1Ck2Wg3VDp8SjygGoDteHnpdBOOAg588t
UbGj6Qz/N5g3S3trFa8pYK9nPXNZIrqJQQUMXaRq/c/CX5qY8/i1PyBSlOytRU2o
/hAZQXN4j0/f3kermvJKX4/6RKX1zCOjclP/IOssobHTMiGwHR49YEAJZfFxHIyw
AAJ8aKcoI/QOSmugGvCQbf2/lheEOghREFqK5rWn6+IRe5temlzxx3oT09iD5FvH
Dtdm/ptjl/XTtJGE0EBr60kPLA13puXCRxjzO6BB4WIt6DAQ4HN5oo7yE8myYcBA
0pMQemyUlDyB4g==
=ech4
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"Sorry for the last minute patches, but a few things fell through the
cracks recently. I was on the fence about sending a late pull request
just for the M-mode fixes, as we don't really have any users, but the
last patch fixes the build for Fedora which I consider pretty
important.
Given that the M-mode fixes should be very low risk, I figured it's
worth sending them along as well.
Thhis passes my standard 'boot in QEMU' test"
* tag 'riscv-for-linus-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Move all address space definition macros to one place
RISC-V: Only select essential drivers for SOC_VIRT config
riscv: fix the IPI missing issue in nommu mode
riscv: uaccess should be used in nommu mode
The bio_map_* helpers are just the low-level helpers for the
blk_rq_map_* APIs. Move them together for better logical grouping,
as no there isn't much overlap with other code in bio.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A single fix for building dtc with GCC 10.
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl5+DuwQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhwzSBD/9Fot/YLyru3Wzl+sAYzZ8XdBaGnMdC3u7d
PJcgJ3ozCYclCUU7VATLgxW3y4vPw0zDLyBgGu2EpCL5I0cml8ONbBV6rJq/W5yi
33gjlc0LmAx7tGNcV/ENFeykV8yTOA+pB1Sxn/f36Gjj8oKLKkhRBmZUGmoFuDlr
HOIfQp0pnq5V/tc/i6wc+s5WGun1df3EzwNDX9PuLHKMPDPSzF5A4bGfnMqNge03
gEmbwCMKYsAdlKzf5R4sMXCc1mmWxZvlC7xVzBi3yDc6VZjYoQO3BwnrxKrs/9n0
IW4WisL0/ACkud8MwLx0Sw82qJqbhRQ+eiRXvDORKk3fxldHBvaGsNk8qVTSDXBI
C0X5YpMBFGNIzTnRglLqgSh4w9Q36ZSAfbcSrYLOu1FIKUi2l62eS1lUNKZi27NY
RQkbaown2Uu7U1kVrVtLsNliN6LajQy5cEycoNOntEAakr0WYgxWG7LDx/MIVrAk
d0C/cf7NfR93CuMV3epzejI2iaaxN/lR8zE/fQlqt/zMaVwVvkyK2TFw09emsT43
d7jhw57JrOyV8Ci1YWtalj86lJVAJ2ovu2S0ozg6Dma2YqxVAZXrt96MbP+os+ro
6IhOpSoydb+hrcuP0ZkUk229/EXVmnL7UFIsge/ZOQFKkBBLbQKL/C6rtlQwWBVm
7+d9WJGxXg==
=HNrU
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree fix from Rob Herring:
"A single fix for building dtc with GCC 10"
* tag 'devicetree-fixes-for-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
scripts/dtc: Remove redundant YYLOC global declaration
- Fix defconfig build when using Clang's integrated assembler
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl592dUQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNKUFCACQw9B0AHizig0dzL+jAw3fZApcjTWGzmv8
Blb0BtdYcEYcMvrbNnc21nDAghUw3lt4ncqbjcd6H9+jISN+D5yMTooqBOa6N7eG
3bGEsap7QsxdqbiFdyPB7woyTa8PR+YSmg9eCtKVX82R2HuPZar69RhDFi1D/iSn
lbt2ltVTfMTuzEETHoCoixn3r+BFu5HHaFx6JuK6bOT29WpXmdseB8NhkuVE3kEu
O1nUFW69vGpQvalwYrspd5CSiyT9BuFP9vqz9xYQVCwWLN+O5VuGLoF264SHKVKM
O4qk18Vu/LG0hU/2V/xBcPQ4CMc9BsR3Z8AyqdfULgBWjqGQf396
=TROL
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"Fix defconfig build when using Clang's integrated assembler"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: alternative: fix build with clang integrated assembler
It clarifies the code slightly to use SMB2_SIGNATURE_SIZE
define rather than 16.
Suggested-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
To allow offload commands to execute in parallel, create workqueue
for flow table offload, and use a work entry per offload command.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently flow offload threads are synchronized by the flow block mutex.
Use rw lock instead to increase flow insertion (read) concurrency.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It is safe to traverse &net->nft.tables with &net->nft.commit_mutex
held using list_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false
positive,
WARNING: suspicious RCU usage
net/netfilter/nf_tables_api.c:523 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by iptables/1384:
#0: ffffffff9745c4a8 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x25/0x60 [nf_tables]
Call Trace:
dump_stack+0xa1/0xea
lockdep_rcu_suspicious+0x103/0x10d
nft_table_lookup.part.0+0x116/0x120 [nf_tables]
nf_tables_newtable+0x12c/0x7d0 [nf_tables]
nfnetlink_rcv_batch+0x559/0x1190 [nfnetlink]
nfnetlink_rcv+0x1da/0x210 [nfnetlink]
netlink_unicast+0x306/0x460
netlink_sendmsg+0x44b/0x770
____sys_sendmsg+0x46b/0x4a0
___sys_sendmsg+0x138/0x1a0
__sys_sendmsg+0xb6/0x130
__x64_sys_sendmsg+0x48/0x50
do_syscall_64+0x69/0xf4
entry_SYSCALL_64_after_hwframe+0x49/0xb3
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The indirect block setup should use TC_SETUP_FT as the type instead of
TC_SETUP_BLOCK. Adjust existing users of the indirect flow block
infrastructure.
Fixes: b5140a36da ("netfilter: flowtable: add indr block setup support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>