Commit graph

900375 commits

Author SHA1 Message Date
David S. Miller
16b25d1a96 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

This batch contains Netfilter updates for net-next:

1) Add nft_setelem_parse_key() helper function.

2) Add NFTA_SET_ELEM_KEY_END to specify a range with one single element.

3) Add NFTA_SET_DESC_CONCAT to describe the set element concatenation,
   from Stefano Brivio.

4) Add bitmap_cut() to copy n-bits from source to destination,
   from Stefano Brivio.

5) Add set to match on arbitrary concatenations, from Stefano Brivio.

6) Add selftest for this new set type. An extract of Stefano's
   description follows:

"Existing nftables set implementations allow matching entries with
interval expressions (rbtree), e.g. 192.0.2.1-192.0.2.4, entries
specifying field concatenation (hash, rhash), e.g. 192.0.2.1:22,
but not both.

In other words, none of the set types allows matching on range
expressions for more than one packet field at a time, such as ipset
does with types bitmap:ip,mac, and, to a more limited extent
(netmasks, not arbitrary ranges), with types hash:net,net,
hash:net,port, hash:ip,port,net, and hash:net,port,net.

As a pure hash-based approach is unsuitable for matching on ranges,
and "proxying" the existing red-black tree type looks impractical as
elements would need to be shared and managed across all employed
trees, this new set implementation intends to fill the functionality
gap by employing a relatively novel approach.

The fundamental idea, illustrated in deeper detail in patch 5/9, is to
use lookup tables classifying a small number of grouped bits from each
field, and map the lookup results in a way that yields a verdict for
the full set of specified fields.

The grouping bit aspect is loosely inspired by the Grouper algorithm,
by Jay Ligatti, Josh Kuhn, and Chris Gage (see patch 5/9 for the full
reference).

A reference, stand-alone implementation of the algorithm itself is
available at:
        https://pipapo.lameexcu.se

Some notes about possible future optimisations are also mentioned
there. This algorithm reduces the matching problem to, essentially,
a repetitive sequence of simple bitwise operations, and is
particularly suitable to be optimised by leveraging SIMD instruction
sets."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27 10:17:15 +01:00
Mike Rapoport
8121fbc4eb parisc: map_pages(): cleanup page table initialization
The current code uses '#if PTRS_PER_PMD == 1' to distinguish 2 vs 3 levels,
setup, it casts pgd to pgd to cope with page table folding and converts
addresses of page table entries from physical to virtual and back for no
good reason.

Simplify the accesses to the page table entries using proper unfolding of
the upper layers and replacing '#if PTRS_PER_PMD' with explicit
'#if CONFIG_PGTABLE_LEVELS == 3'

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2020-01-27 09:43:15 +01:00
Stefano Brivio
611973c1e0 selftests: netfilter: Introduce tests for sets with range concatenation
This test covers functionality and stability of the newly added
nftables set implementation supporting concatenation of ranged
fields.

For some selected set expression types, test:
- correctness, by checking that packets match or don't
- concurrency, by attempting races between insertion, deletion, lookup
- timeout feature, checking that packets don't match expired entries

and (roughly) estimate matching rates, comparing to baselines for
simple drop on netdev ingress hook and for hash and rbtrees sets.

In order to send packets, this needs one of sendip, netcat or bash.
To flood with traffic, iperf3, iperf and netperf are supported. For
performance measurements, this relies on the sample pktgen script
pktgen_bench_xmit_mode_netif_receive.sh.

If none of the tools suitable for a given test are available, specific
tests will be skipped.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Stefano Brivio
3c4287f620 nf_tables: Add set type for arbitrary concatenation of ranges
This new set type allows for intervals in concatenated fields,
which are expressed in the usual way, that is, simple byte
concatenation with padding to 32 bits for single fields, and
given as ranges by specifying start and end elements containing,
each, the full concatenation of start and end values for the
single fields.

Ranges are expanded to composing netmasks, for each field: these
are inserted as rules in per-field lookup tables. Bits to be
classified are divided in 4-bit groups, and for each group, the
lookup table contains 4^2 buckets, representing all the possible
values of a bit group. This approach was inspired by the Grouper
algorithm:
	http://www.cse.usf.edu/~ligatti/projects/grouper/

Matching is performed by a sequence of AND operations between
bucket values, with buckets selected according to the value of
packet bits, for each group. The result of this sequence tells
us which rules matched for a given field.

In order to concatenate several ranged fields, per-field rules
are mapped using mapping arrays, one per field, that specify
which rules should be considered while matching the next field.
The mapping array for the last field contains a reference to
the element originally inserted.

The notes in nft_set_pipapo.c cover the algorithm in deeper
detail.

A pure hash-based approach is of no use here, as ranges need
to be classified. An implementation based on "proxying" the
existing red-black tree set type, creating a tree for each
field, was considered, but deemed impractical due to the fact
that elements would need to be shared between trees, at least
as long as we want to keep UAPI changes to a minimum.

A stand-alone implementation of this algorithm is available at:
	https://pipapo.lameexcu.se
together with notes about possible future optimisations
(in pipapo.c).

This algorithm was designed with data locality in mind, and can
be highly optimised for SIMD instruction sets, as the bulk of
the matching work is done with repetitive, simple bitwise
operations.

At this point, without further optimisations, nft_concat_range.sh
reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
L2$):

TEST: performance
  net,port                                                      [ OK ]
    baseline (drop from netdev hook):              10190076pps
    baseline hash (non-ranged entries):             6179564pps
    baseline rbtree (match on first field only):    2950341pps
    set with  1000 full, ranged entries:            2304165pps
  port,net                                                      [ OK ]
    baseline (drop from netdev hook):              10143615pps
    baseline hash (non-ranged entries):             6135776pps
    baseline rbtree (match on first field only):    4311934pps
    set with   100 full, ranged entries:            4131471pps
  net6,port                                                     [ OK ]
    baseline (drop from netdev hook):               9730404pps
    baseline hash (non-ranged entries):             4809557pps
    baseline rbtree (match on first field only):    1501699pps
    set with  1000 full, ranged entries:            1092557pps
  port,proto                                                    [ OK ]
    baseline (drop from netdev hook):              10812426pps
    baseline hash (non-ranged entries):             6929353pps
    baseline rbtree (match on first field only):    3027105pps
    set with 30000 full, ranged entries:             284147pps
  net6,port,mac                                                 [ OK ]
    baseline (drop from netdev hook):               9660114pps
    baseline hash (non-ranged entries):             3778877pps
    baseline rbtree (match on first field only):    3179379pps
    set with    10 full, ranged entries:            2082880pps
  net6,port,mac,proto                                           [ OK ]
    baseline (drop from netdev hook):               9718324pps
    baseline hash (non-ranged entries):             3799021pps
    baseline rbtree (match on first field only):    1506689pps
    set with  1000 full, ranged entries:             783810pps
  net,mac                                                       [ OK ]
    baseline (drop from netdev hook):              10190029pps
    baseline hash (non-ranged entries):             5172218pps
    baseline rbtree (match on first field only):    2946863pps
    set with  1000 full, ranged entries:            1279122pps

v4:
 - fix build for 32-bit architectures: 64-bit division needs
   div_u64() (kbuild test robot <lkp@intel.com>)
v3:
 - rework interface for field length specification,
   NFT_SET_SUBKEY disappears and information is stored in
   description
 - remove scratch area to store closing element of ranges,
   as elements now come with an actual attribute to specify
   the upper range limit (Pablo Neira Ayuso)
 - also remove pointer to 'start' element from mapping table,
   closing key is now accessible via extension data
 - use bytes right away instead of bits for field lengths,
   this way we can also double the inner loop of the lookup
   function to take care of upper and lower bits in a single
   iteration (minor performance improvement)
 - make it clearer that set operations are actually atomic
   API-wise, but we can't e.g. implement flush() as one-shot
   action
 - fix type for 'dup' in nft_pipapo_insert(), check for
   duplicates only in the next generation, and in general take
   care of differentiating generation mask cases depending on
   the operation (Pablo Neira Ayuso)
 - report C implementation matching rate in commit message, so
   that AVX2 implementation can be compared (Pablo Neira Ayuso)
v2:
 - protect access to scratch maps in nft_pipapo_lookup() with
   local_bh_disable/enable() (Florian Westphal)
 - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
   already implied (Florian Westphal)
 - explain why partial allocation failures don't need handling
   in pipapo_realloc_scratch(), rename 'm' to clone and update
   related kerneldoc to make it clear we're not operating on
   the live copy (Florian Westphal)
 - add expicit check for priv->start_elem in
   nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
   with a NULL start element, and also zero it out in every
   operation that might make it invalid, so that insertion
   doesn't proceed with an invalid element (Florian Westphal)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Stefano Brivio
2092767168 bitmap: Introduce bitmap_cut(): cut bits and shift remaining
The new bitmap function bitmap_cut() copies bits from source to
destination by removing the region specified by parameters first
and cut, and remapping the bits above the cut region by right
shifting them.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Stefano Brivio
f3a2181e16 netfilter: nf_tables: Support for sets with multiple ranged fields
Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.

This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.

In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.

While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.

For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:

  NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
  NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25

and NFTA_SET_DESC_CONCAT attribute would contain:

  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		4
  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		2

v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Pablo Neira Ayuso
7b225d0b5c netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute
Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.

This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; add corresponding
  nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Pablo Neira Ayuso
20a1452c35 netfilter: nf_tables: add nft_setelem_parse_key()
Add helper function to parse the set element key netlink attribute.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; use NFT_DATA_VALUE_MAXLEN
  instead of sizeof(*key) in helper, value can be longer than that;
  rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27 08:54:30 +01:00
Dexuan Cui
54e19d3401 hv_utils: Add the support of hibernation
Add util_pre_suspend() and util_pre_resume() for some hv_utils devices
(e.g. kvp/vss/fcopy), because they need special handling before
util_suspend() calls vmbus_close().

For kvp, all the possible pending work items should be cancelled.

For vss and fcopy, some extra clean-up needs to be done, i.e. fake a
THAW message for hv_vss_daemon and fake a CANCEL_FCOPY message for
hv_fcopy_daemon, otherwise when the VM resums back, the daemons
can end up in an inconsistent state (i.e. the file systems are
frozen but will never be thawed; the file transmitted via fcopy
may not be complete). Note: there is an extra patch for the daemons:
"Tools: hv: Reopen the devices if read() or write() returns errors",
because the hv_utils driver can not guarantee the whole transaction
finishes completely once util_suspend() starts to run (at this time,
all the userspace processes are frozen).

util_probe() disables channel->callback_event to avoid the race with
the channel callback.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26 22:10:17 -05:00
Dexuan Cui
ffd1d4a493 hv_utils: Support host-initiated hibernation request
Update the Shutdown IC version to 3.2, which is required for the host to
send the hibernation request.

The user is expected to create the below udev rule file, which is applied
upon the host-initiated hibernation request:

root@localhost:~# cat /usr/lib/udev/rules.d/40-vm-hibernation.rules
SUBSYSTEM=="vmbus", ACTION=="change", DRIVER=="hv_utils", ENV{EVENT}=="hibernate", RUN+="/usr/bin/systemctl hibernate"

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26 22:10:16 -05:00
Dexuan Cui
3e9c72056e hv_utils: Support host-initiated restart request
The hv_utils driver currently supports a "shutdown" operation initiated
from the Hyper-V host. Newer versions of Hyper-V also support a "restart"
operation. So add support for the updated protocol version that has
"restart" support, and perform a clean reboot when such a message is
received from Hyper-V.

To test the restart functionality, run this PowerShell command on the
Hyper-V host:

Restart-VM  <vmname>  -Type Reboot

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26 22:10:16 -05:00
Dexuan Cui
9fc3c01a1f Tools: hv: Reopen the devices if read() or write() returns errors
The state machine in the hv_utils driver can run out of order in some
corner cases, e.g. if the kvp daemon doesn't call write() fast enough
due to some reason, kvp_timeout_func() can run first and move the state
to HVUTIL_READY; next, when kvp_on_msg() is called it returns -EINVAL
since kvp_transaction.state is smaller than HVUTIL_USERSPACE_REQ; later,
the daemon's write() gets an error -EINVAL, and the daemon will exit().

We can reproduce the issue by sending a SIGSTOP signal to the daemon, wait
for 1 minute, and send a SIGCONT signal to the daemon: the daemon will
exit() quickly.

We can fix the issue by forcing a reset of the device (which means the
daemon can close() and open() the device again) and doing extra necessary
clean-up.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26 22:10:10 -05:00
Wei Hu
3a6fb6c425 video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.
On Hyper-V, Generation 1 VMs can directly use VM's physical memory for
their framebuffers. This can improve the efficiency of framebuffer and
overall performence for VM. The physical memory assigned to framebuffer
must be contiguous. We use CMA allocator to get contiguouse physicial
memory when the framebuffer size is greater than 4MB. For size under
4MB, we use alloc_pages to achieve this.

To enable framebuffer memory allocation from CMA, supply a kernel
parameter to give enough space to CMA allocator at boot time. For
example:
    cma=130m
This gives 130MB memory to CAM allocator that can be allocated to
framebuffer. If this fails, we fall back to the old way of using
mmio for framebuffer.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Wei Hu <weh@microsoft.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26 21:22:19 -05:00
Vincent Whitchurch
f1f27ad745 CIFS: Fix task struct use-after-free on reconnect
The task which created the MID may be gone by the time cifsd attempts to
call the callbacks on MIDs from cifs_reconnect().

This leads to a use-after-free of the task struct in cifs_wake_up_task:

 ==================================================================
 BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
 Read of size 8 at addr ffff8880103e3a68 by task cifsd/630

 CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
 Call Trace:
  dump_stack+0x8e/0xcb
  print_address_description.constprop.5+0x1d3/0x3c0
  ? __lock_acquire+0x31a0/0x3270
  __kasan_report+0x152/0x1aa
  ? __lock_acquire+0x31a0/0x3270
  ? __lock_acquire+0x31a0/0x3270
  kasan_report+0xe/0x20
  __lock_acquire+0x31a0/0x3270
  ? __wake_up_common+0x1dc/0x630
  ? _raw_spin_unlock_irqrestore+0x4c/0x60
  ? mark_held_locks+0xf0/0xf0
  ? _raw_spin_unlock_irqrestore+0x39/0x60
  ? __wake_up_common_lock+0xd5/0x130
  ? __wake_up_common+0x630/0x630
  lock_acquire+0x13f/0x330
  ? try_to_wake_up+0xa3/0x19e0
  _raw_spin_lock_irqsave+0x38/0x50
  ? try_to_wake_up+0xa3/0x19e0
  try_to_wake_up+0xa3/0x19e0
  ? cifs_compound_callback+0x178/0x210
  ? set_cpus_allowed_ptr+0x10/0x10
  cifs_reconnect+0xa1c/0x15d0
  ? generic_ip_connect+0x1860/0x1860
  ? rwlock_bug.part.0+0x90/0x90
  cifs_readv_from_socket+0x479/0x690
  cifs_read_from_socket+0x9d/0xe0
  ? cifs_readv_from_socket+0x690/0x690
  ? mempool_resize+0x690/0x690
  ? rwlock_bug.part.0+0x90/0x90
  ? memset+0x1f/0x40
  ? allocate_buffers+0xff/0x340
  cifs_demultiplex_thread+0x388/0x2a50
  ? cifs_handle_standard+0x610/0x610
  ? rcu_read_lock_held_common+0x120/0x120
  ? mark_lock+0x11b/0xc00
  ? __lock_acquire+0x14ed/0x3270
  ? __kthread_parkme+0x78/0x100
  ? lockdep_hardirqs_on+0x3e8/0x560
  ? lock_downgrade+0x6a0/0x6a0
  ? lockdep_hardirqs_on+0x3e8/0x560
  ? _raw_spin_unlock_irqrestore+0x39/0x60
  ? cifs_handle_standard+0x610/0x610
  kthread+0x2bb/0x3a0
  ? kthread_create_worker_on_cpu+0xc0/0xc0
  ret_from_fork+0x3a/0x50

 Allocated by task 649:
  save_stack+0x19/0x70
  __kasan_kmalloc.constprop.5+0xa6/0xf0
  kmem_cache_alloc+0x107/0x320
  copy_process+0x17bc/0x5370
  _do_fork+0x103/0xbf0
  __x64_sys_clone+0x168/0x1e0
  do_syscall_64+0x9b/0xec0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 Freed by task 0:
  save_stack+0x19/0x70
  __kasan_slab_free+0x11d/0x160
  kmem_cache_free+0xb5/0x3d0
  rcu_core+0x52f/0x1230
  __do_softirq+0x24d/0x962

 The buggy address belongs to the object at ffff8880103e32c0
  which belongs to the cache task_struct of size 6016
 The buggy address is located 1960 bytes inside of
  6016-byte region [ffff8880103e32c0, ffff8880103e4a40)
 The buggy address belongs to the page:
 page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
 index:0xffff8880103e4c00 compound_mapcount: 0
 raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
 raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                           ^
  ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

This can be reliably reproduced by adding the below delay to
cifs_reconnect(), running find(1) on the mount, restarting the samba
server while find is running, and killing find during the delay:

  	spin_unlock(&GlobalMid_Lock);
  	mutex_unlock(&server->srv_mutex);

 +	msleep(10000);
 +
  	cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
  	list_for_each_safe(tmp, tmp2, &retry_list) {
  		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);

Fix this by holding a reference to the task struct until the MID is
freed.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26 19:24:17 -06:00
Chen Zhou
050d2a8b69 cifs: use PTR_ERR_OR_ZERO() to simplify code
PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR, just use
PTR_ERR_OR_ZERO directly.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2020-01-26 19:24:17 -06:00
Ronnie Sahlberg
8bd0d70144 cifs: add support for fallocate mode 0 for non-sparse files
RHBZ 1336264

When we extend a file we must also force the size to be updated.

This fixes an issue with holetest in xfs-tests which performs the following
sequence :
1, create a new file
2, use fallocate mode==0 to populate the file
3, mmap the file
4, touch each page by reading the mmapped region.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:17 -06:00
Ronnie Sahlberg
fe12926863 cifs: fix NULL dereference in match_prepath
RHBZ: 1760879

Fix an oops in match_prepath() by making sure that the prepath string is not
NULL before we pass it into strcmp().

This is similar to other checks we make for example in cifs_root_iget()

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:17 -06:00
Steve French
643fbceef4 smb3: fix default permissions on new files when mounting with modefromsid
When mounting with "modefromsid" mount parm most servers will require
that some default permissions are given to users in the ACL on newly
created files, files created with the new 'sd context' - when passing in
an sd context on create, permissions are not inherited from the parent
directory, so in addition to the ACE with the special SID which contains
the mode, we also must pass in an ACE allowing users to access the file
(GENERIC_ALL for authenticated users seemed like a reasonable default,
although later we could allow a mount option or config switch to make
it GENERIC_ALL for EVERYONE special sid).

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-By: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26 19:24:17 -06:00
Boris Protopopov
438471b679 CIFS: Add support for setting owner info, dos attributes, and create time
This is needed for backup/restore scenarios among others.

Add extended attribute "system.cifs_ntsd" (and alias "system.smb3_ntsd")
to allow for setting owner and DACL in the security descriptor. This is in
addition to the existing "system.cifs_acl" and "system.smb3_acl" attributes
that allow for setting DACL only. Add support for setting creation time and
dos attributes using set_file_info() calls to complement the existing
support for getting these attributes via query_path_info() calls.

Signed-off-by: Boris Protopopov <bprotopopov@hotmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:17 -06:00
YueHaibing
c4985c3d99 cifs: remove set but not used variable 'server'
fs/cifs/smb2pdu.c: In function 'SMB2_query_directory':
fs/cifs/smb2pdu.c:4444:26: warning:
 variable 'server' set but not used [-Wunused-but-set-variable]
  struct TCP_Server_Info *server;

It is not used, so remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:17 -06:00
Paulo Alcantara (SUSE)
0a5a98863c cifs: Fix memory allocation in __smb2_handle_cancelled_cmd()
__smb2_handle_cancelled_cmd() is called under a spin lock held in
cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC.

This issue was observed when running xfstests generic/028:

[ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5
[ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17
[ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6
[ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565
[ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd
[ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313
[ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 1723.048221] Call Trace:
[ 1723.048689]  dump_stack+0x97/0xe0
[ 1723.049268]  ___might_sleep.cold+0xd1/0xe1
[ 1723.050069]  kmem_cache_alloc_trace+0x204/0x2b0
[ 1723.051051]  __smb2_handle_cancelled_cmd+0x40/0x140 [cifs]
[ 1723.052137]  smb2_handle_cancelled_mid+0xf6/0x120 [cifs]
[ 1723.053247]  cifs_mid_q_entry_release+0x44d/0x630 [cifs]
[ 1723.054351]  ? cifs_reconnect+0x26a/0x1620 [cifs]
[ 1723.055325]  cifs_demultiplex_thread+0xad4/0x14a0 [cifs]
[ 1723.056458]  ? cifs_handle_standard+0x2c0/0x2c0 [cifs]
[ 1723.057365]  ? kvm_sched_clock_read+0x14/0x30
[ 1723.058197]  ? sched_clock+0x5/0x10
[ 1723.058838]  ? sched_clock_cpu+0x18/0x110
[ 1723.059629]  ? lockdep_hardirqs_on+0x17d/0x250
[ 1723.060456]  kthread+0x1ab/0x200
[ 1723.061149]  ? cifs_handle_standard+0x2c0/0x2c0 [cifs]
[ 1723.062078]  ? kthread_create_on_node+0xd0/0xd0
[ 1723.062897]  ret_from_fork+0x3a/0x50

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Fixes: 9150c3adbf ("CIFS: Close open handle after interrupted close")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26 19:24:17 -06:00
Paulo Alcantara (SUSE)
5739375ee4 cifs: Fix mount options set in automount
Starting from 4a367dc044, we must set the mount options based on the
DFS full path rather than the resolved target, that is, cifs_mount()
will be responsible for resolving the DFS link (cached) as well as
performing failover to any other targets in the referral.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reported-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com>
Fixes: 4a367dc044 ("cifs: Add support for failover in cifs_mount()")
Link: https://lore.kernel.org/linux-cifs/39643d7d-2abb-14d3-ced6-c394fab9a777@prodrive-technologies.com
Tested-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Steve French
463a7b457c cifs: fix unitialized variable poential problem with network I/O cache lock patch
static analysis with Coverity detected an issue with the following
commit:

 Author: Paulo Alcantara (SUSE) <pc@cjr.nz>
 Date:   Wed Dec 4 17:38:03 2019 -0300

    cifs: Avoid doing network I/O while holding cache lock

Addresses-Coverity: ("Uninitialized pointer read")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
YueHaibing
eecfc57130 cifs: Fix return value in __update_cache_entry
copy_ref_data() may return error, it should be
returned to upstream caller.

Fixes: 03535b72873b ("cifs: Avoid doing network I/O while holding cache lock")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Paulo Alcantara (SUSE)
742d8de018 cifs: Avoid doing network I/O while holding cache lock
When creating or updating a cache entry, we need to get an DFS
referral (get_dfs_referral), so avoid holding any locks during such
network operation.

To prevent that, do the following:
* change cache hashtable sync method from RCU sync to a read/write
  lock.
* use GFP_ATOMIC in memory allocations.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Paulo Alcantara (SUSE)
06d57378bc cifs: Fix potential deadlock when updating vol in cifs_reconnect()
We can't acquire volume lock while refreshing the DFS cache because
cifs_reconnect() may call dfs_cache_update_vol() while we are walking
through the volume list.

To prevent that, make vol_info refcounted, create a temp list with all
volumes eligible for refreshing, and then use it without any locks
held.

Besides, replace vol_lock with a spinlock and protect cache_ttl from
concurrent accesses or changes.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Paulo Alcantara (SUSE)
ff2f7fc082 cifs: Merge is_path_valid() into get_normalized_path()
Just do the trivial path validation in get_normalized_path().

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Paulo Alcantara (SUSE)
345c1a4a9e cifs: Introduce helpers for finding TCP connection
Add helpers for finding TCP connections that are good candidates for
being used by DFS refresh worker.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Paulo Alcantara (SUSE)
199c6bdfb0 cifs: Get rid of kstrdup_const()'d paths
The DFS cache API is mostly used with heap allocated strings.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Paulo Alcantara (SUSE)
185352ae61 cifs: Clean up DFS referral cache
Do some renaming and code cleanup.

No functional changes.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
David Howells
6629400a22 cifs: Don't use iov_iter::type directly
Don't use iov_iter::type directly, but rather use the new accessor
functions that have been added.  This allows the .type field to be split
and rearranged without the need to update the filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Ronnie Sahlberg
731b82bb17 cifs: set correct max-buffer-size for smb2_ioctl_init()
Fix two places where we need to adjust down the max response size for
ioctl when it is used together with compounding.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2020-01-26 19:24:16 -06:00
Ronnie Sahlberg
37478608f0 cifs: use compounding for open and first query-dir for readdir()
Combine the initial SMB2_Open and the first SMB2_Query_Directory in a compound.
This shaves one round-trip of each directory listing, changing it from 4 to 3
for small directories.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26 19:24:16 -06:00
Ronnie Sahlberg
af08f9e79c cifs: create a helper function to parse the query-directory response buffer
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26 19:24:16 -06:00
Ronnie Sahlberg
0a17799cc0 cifs: prepare SMB2_query_directory to be used with compounding
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26 19:24:16 -06:00
zhengbin
01d1bd76a1 fs/cifs/cifssmb.c: use true,false for bool variable
Fixes coccicheck warning:

fs/cifs/cifssmb.c:4622:3-22: WARNING: Assignment of 0/1 to bool variable
fs/cifs/cifssmb.c:4756:3-22: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
zhengbin
720aec0126 fs/cifs/smb2ops.c: use true,false for bool variable
Fixes coccicheck warning:

fs/cifs/smb2ops.c:807:2-36: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26 19:24:16 -06:00
Linus Torvalds
d5226fa6db Linux 5.5 2020-01-26 16:23:03 -08:00
Darrick J. Wong
cdbcf82b86 xfs: fix xfs_buf_ioerror_alert location reporting
Instead of passing __func__ to the error reporting function, let's use
the return address builtins so that the messages actually tell you which
higher level function called the buffer functions.  This was previously
true for the xfs_buf_read callers, but not for the xfs_trans_read_buf
callers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-26 14:32:27 -08:00
Darrick J. Wong
706b8c5bc7 xfs: remove unnecessary null pointer checks from _read_agf callers
Drop the null buffer pointer checks in all code that calls
xfs_alloc_read_agf and doesn't pass XFS_ALLOC_FLAG_TRYLOCK because
they're no longer necessary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:27 -08:00
Darrick J. Wong
f48e2df8a8 xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers
Refactor xfs_read_agf and xfs_alloc_read_agf to return EAGAIN if the
caller passed TRYLOCK and we weren't able to get the lock; and change
the callers to recognize this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
ee647f85cb xfs: remove the xfs_btree_get_buf[ls] functions
Remove the xfs_btree_get_bufs and xfs_btree_get_bufl functions, since
they're pretty trivial oneliners.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
ce92464c18 xfs: make xfs_trans_get_buf return an error code
Convert xfs_trans_get_buf() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
9676b54e6e xfs: make xfs_trans_get_buf_map return an error code
Convert xfs_trans_get_buf_map() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
0e3eccce5e xfs: make xfs_buf_read return an error code
Convert xfs_buf_read() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
2842b6db3d xfs: make xfs_buf_get_uncached return an error code
Convert xfs_buf_get_uncached() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
841263e933 xfs: make xfs_buf_get return an error code
Convert xfs_buf_get() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
4ed8e27b4f xfs: make xfs_buf_read_map return an error code
Convert xfs_buf_read_map() to return numeric error codes like most
everywhere else in xfs.  This involves moving the open-coded logic that
reports metadata IO read / corruption errors and stales the buffer into
xfs_buf_read_map so that the logic is all in one place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
3848b5f670 xfs: make xfs_buf_get_map return an error code
Convert xfs_buf_get_map() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:25 -08:00
Darrick J. Wong
32dff5e5d1 xfs: make xfs_buf_alloc return an error code
Convert _xfs_buf_alloc() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:25 -08:00