Commit graph

781637 commits

Author SHA1 Message Date
Jia He
d0823cb346 KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled
kvm_vgic_sync_hwstate is only called with IRQ being disabled.
There is thus no need to call spin_lock_irqsave/restore in
vgic_fold_lr_state and vgic_prune_ap_list.

This patch replace them with the non irq-safe version.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12 12:15:18 +01:00
Jia He
dc961e5395 KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h
DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3,
so let's move it to vgic.h

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12 12:14:08 +01:00
Marc Zyngier
3e8a8a50c7 KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses
In order to generate Group0 SGIs, let's add some decoding logic to
access_gic_sgi(), and pass the generating group accordingly.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12 12:06:35 +01:00
Marc Zyngier
03bd646d86 KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accesses
In order to generate Group0 SGIs, let's add some decoding logic to
access_gic_sgi(), and pass the generating group accordingly.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12 12:06:35 +01:00
Marc Zyngier
6249f2a479 KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs
Although vgic-v3 now supports Group0 interrupts, it still doesn't
deal with Group0 SGIs. As usually with the GIC, nothing is simple:

- ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1
  with KVM (as per 8.1.10, Non-secure EL1 access)

- ICC_SGI0R can only generate Group0 SGIs

- ICC_ASGI1R sees its scope refocussed to generate only Group0
  SGIs (as per the note at the bottom of Table 8-14)

We only support Group1 SGIs so far, so no material change.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12 12:06:34 +01:00
Marc Zyngier
e22fa39cd0 KVM: arm64: Remove non-existent AArch32 ICC_SGI1R encoding
ICC_SGI1R is a 64bit system register, even on AArch32. It is thus
pointless to have such an encoding in the 32bit cp15 array. Let's
drop it.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12 12:06:31 +01:00
Takashi Iwai
73b383141d Merge branch 'for-next' into for-linus
Preparation for 4.19 merge material.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-08-12 08:55:10 +02:00
David S. Miller
78cbac647e Merge branch 'ip-faster-in-order-IP-fragments'
Peter Oskolkov says:

====================
ip: faster in-order IP fragments

Added "Signed-off-by" in v2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 17:54:18 -07:00
Peter Oskolkov
a4fd284a1f ip: process in-order fragments efficiently
This patch changes the runtime behavior of IP defrag queue:
incoming in-order fragments are added to the end of the current
list/"run" of in-order fragments at the tail.

On some workloads, UDP stream performance is substantially improved:

RX: ./udp_stream -F 10 -T 2 -l 60
TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60

with this patchset applied on a 10Gbps receiver:

  throughput=9524.18
  throughput_units=Mbit/s

upstream (net-next):

  throughput=4608.93
  throughput_units=Mbit/s

Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 17:54:18 -07:00
Peter Oskolkov
353c9cb360 ip: add helpers to process in-order fragments faster.
This patch introduces several helper functions/macros that will be
used in the follow-up patch. No runtime changes yet.

The new logic (fully implemented in the second patch) is as follows:

* Nodes in the rb-tree will now contain not single fragments, but lists
  of consecutive fragments ("runs").

* At each point in time, the current "active" run at the tail is
  maintained/tracked. Fragments that arrive in-order, adjacent
  to the previous tail fragment, are added to this tail run without
  triggering the re-balancing of the rb-tree.

* If a fragment arrives out of order with the offset _before_ the tail run,
  it is inserted into the rb-tree as a single fragment.

* If a fragment arrives after the current tail fragment (with a gap),
  it starts a new "tail" run, as is inserted into the rb-tree
  at the end as the head of the new run.

skb->cb is used to store additional information
needed here (suggested by Eric Dumazet).

Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 17:54:18 -07:00
David S. Miller
6a92ef08a1 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-11 17:52:00 -07:00
Coly Li
eb2b3d0345 bcache: add the missing comments for smp_mb()/smp_wmb()
Checkpatch.pl warns there are 2 locations of smp_mb() and smp_wmb()
without code comment. This patch adds the missing code comments for
these memory barrier calls.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
d0c1b89a40 bcache: remove unnecessary space before ioctl function pointer arguments
This is warned by checkpatch.pl, this patch removes the extra space.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
87418ef9f0 bcache: add missing SPDX header
The SPDX header is missing fro closure.c, super.c and util.c, this
patch adds SPDX header for GPL-2.0 into these files.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
b3cf37bfa1 bcache: move open brace at end of function definitions to next line
This is not a preferred style to place open brace '{' at the end of
function definition, checkpatch.pl reports error for such coding
style. This patch moves them into the start of the next new line.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
e1f08f1bc0 bcache: add static const prefix to char * array declarations
This patch declares char * array with const prefix in sysfs.c,
which is suggested by checkpatch.pl.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
3be11dbab6 bcache: fix code comments style
This patch fixes 3 style issues warned by checkpatch.pl,
- Comment lines are not aligned
- Comments use "/*" on subsequent lines
- Comment lines use a trailing "*/"

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
3069211be3 bcache: do not check NULL pointer before calling kmem_cache_destroy
kmem_cache_destroy() is safe for NULL pointer as input, the NULL pointer
checking is unncessary. This patch just removes the NULL pointer checking
to make code simpler.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
bc81b47e82 bcache: prefer 'help' in Kconfig
Current bcache Kconfig uses '---help---' as header of help information,
for now 'help' is prefered. This patch fixes this style by replacing
'---help---' by 'help' in bcache Kconfig file.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
2b1edd23ec bcache: fix typo 'succesfully' to 'successfully'
This patch fixes typo 'succesfully' to correct 'successfully', which is
suggested by checkpatch.pl.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:42 -06:00
Coly Li
d9c61d30e8 bcache: replace '%pF' by '%pS' in seq_printf()
'%pF' and '%pf' are deprecated vsprintf pointer extensions, this patch
replace them by '%pS', which is suggested by checkpatch.pl.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Coly Li
c63ca7871a bcache: fix indent by replacing blank by tabs
bch_btree_insert_check_key() has unaligned indent, or indent by blank
characters. This patch makes the indent aligned and replace blank by
tabs.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Coly Li
6ae63e3501 bcache: replace printk() by pr_*() routines
There are still many places in bcache use printk to display kernel
message, which are suggested to be preplaced by pr_*() routines like
pr_err(), pr_info(), or pr_notice().

This patch replaces all printk() with a proper pr_*() routine for
bcache code.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Coly Li
958bf494ec bcache: replace Symbolic permissions by octal permission numbers
Symbolic permission names are used in bcache, for now octal permission
numbers are encouraged to use for readability. This patch replaces
all symbolic permissions by octal permission numbers.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Coly Li
b0d30981c0 bcache: style fixes for lines over 80 characters
This patch fixes the lines over 80 characters into more lines, to minimize
warnings by checkpatch.pl. There are still some lines exceed 80 characters,
but it is better to be a single line and I don't change them.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Coly Li
fc2d5988b5 bcache: add identifier names to arguments of function definitions
There are many function definitions do not have identifier argument names,
scripts/checkpatch.pl complains warnings like this,

 WARNING: function definition argument 'struct bcache_device *' should
  also have an identifier name
  #16735: FILE: writeback.h:120:
  +void bch_sectors_dirty_init(struct bcache_device *);

This patch adds identifier argument names to all bcache function
definitions to fix such warnings.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Coly Li
1fae7cf052 bcache: style fix to add a blank line after declarations
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Coly Li
6f10f7d1b0 bcache: style fix to replace 'unsigned' by 'unsigned int'
This patch fixes warning reported by checkpatch.pl by replacing 'unsigned'
with 'unsigned int'.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:46:41 -06:00
Bart Van Assche
b86d865cb1 blkcg: Make blkg_root_lookup() work for queues in bypass mode
For legacy queues the only call of blkg_root_lookup() happens after
bypass mode has been enabled. Since blkg_lookup() returns NULL for
queues in bypass mode, modify the blkg_root_lookup() such that it
no longer depends on bypass mode. Rename the function into
blk_queue_root_blkg() as suggested by Tejun.

Suggested-by: Tejun Heo <tj@kernel.org>
Fixes: 6bad9b210a ("blkcg: Introduce blkg_root_lookup()")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-11 15:41:25 -06:00
David S. Miller
9a95d9c642 Merge branch 'Remove-rtnl-lock-dependency-from-all-action-implementations'
Vlad Buslov says:

====================
Remove rtnl lock dependency from all action implementations

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a second step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
tc actions that prevent their control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked.

Action API is already prepared for parallel execution with previous
patch set, which means that action ops that use action API for their
implementation do not require additional modifications. (delete, search,
etc.) Action API implements concurrency-safe reference counting and
guarantees that cleanup/delete is called only once, after last reference
to action is released.

The goal of this change is to update specific actions APIs that access
action private state directly, in order to be independent from external
locking. General approach is to re-use existing tcf_lock spinlock (used
by some action implementation to synchronize control path with data
path) to protect action private state from concurrent modification. If
action has rcu-protected pointer, tcf spinlock is used to protect its
update code, instead of relying on rtnl lock.

Some actions need to determine rtnl mutex status in order to release it.
For example, ife action can load additional kernel modules(meta ops) and
must make sure that no locks are held during module load. In such cases
'rtnl_held' argument is used to conditionally release rtnl mutex.

Changes from V1 to V2:
- Patch 12:
  - new patch
- Patch 14:
  - refactor gen_new_estimator() to reuse stats_lock when re-assigning
    rate estimator statistics pointer
- Remove mirred and tunnel_key helper function changes. (to be submitted
  and standalone patch)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Vlad Buslov
e329bc4273 net: sched: act_police: remove dependency on rtnl lock
Use tcf spinlock to protect police action private data from concurrent
modification during dump. (init already uses tcf spinlock when changing
police action state)

Pass tcf spinlock as estimator lock argument to gen_replace_estimator()
during action init.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Vlad Buslov
51a9f5ae65 net: core: protect rate estimator statistics pointer with lock
Extend gen_new_estimator() to also take stats_lock when re-assigning rate
estimator statistics pointer. (to be used by unlocked actions)

Rename 'stats_lock' to 'lock' and change argument description to explain
that it is now also used for control path.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Vlad Buslov
4e232818bd net: sched: act_mirred: remove dependency on rtnl lock
Re-introduce mirred list spinlock, that was removed some time ago, in order
to protect it from concurrent modifications, instead of relying on rtnl
lock.

Use tcf spinlock to protect mirred action private data from concurrent
modification in init and dump. Rearrange access to mirred data in order to
be performed only while holding the lock.

Rearrange net dev access to always hold reference while working with it,
instead of relying on rntl lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Vlad Buslov
84a75b329b net: sched: extend action ops with put_dev callback
As a preparation for removing dependency on rtnl lock from rules update
path, all users of shared objects must take reference while working with
them.

Extend action ops with put_dev() API to be used on net device returned by
get_dev().

Modify mirred action (only action that implements get_dev callback):
- Take reference to net device in get_dev.
- Implement put_dev API that releases reference to net device.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Vlad Buslov
764e9a2448 net: sched: act_vlan: remove dependency on rtnl lock
Use tcf spinlock to protect vlan action private data from concurrent
modification during dump and init. Use rcu swap operation to reassign
params pointer under protection of tcf lock. (old params value is not used
by init, so there is no need of standalone rcu dereference step)

Remove rtnl assertion that is no longer necessary.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Vlad Buslov
729e012609 net: sched: act_tunnel_key: remove dependency on rtnl lock
Use tcf lock to protect tunnel key action struct private data from
concurrent modification in init and dump. Use rcu swap operation to
reassign params pointer under protection of tcf lock. (old params value is
not used by init, so there is no need of standalone rcu dereference step)

Remove rtnl lock assertion that is no longer required.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:10 -07:00
Vlad Buslov
c8814552fe net: sched: act_skbmod: remove dependency on rtnl lock
Move read of skbmod_p rcu pointer to be protected by tcf spinlock. Use tcf
spinlock to protect private skbmod data from concurrent modification during
dump.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
5e48180ed8 net: sched: act_simple: remove dependency on rtnl lock
Use tcf spinlock to protect private simple action data from concurrent
modification during dump. (simple init already uses tcf spinlock when
changing action state)

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
d772849566 net: sched: act_sample: remove dependency on rtnl lock
Use tcf spinlock to protect private sample action data from concurrent
modification during dump and init.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
67b0c1a3c9 net: sched: act_pedit: remove dependency on rtnl lock
Rearrange pedit init code to only access pedit action data while holding
tcf spinlock. Change keys allocation type to atomic to allow it to execute
while holding tcf spinlock. Take tcf spinlock in dump function when
accessing pedit action data.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
ff25276de9 net: sched: act_ipt: remove dependency on rtnl lock
Use tcf spinlock to protect ipt action private data from concurrent
modification during dump. Ipt init already takes tcf spinlock when
modifying ipt state.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
54d0d423a4 net: sched: act_ife: remove dependency on rtnl lock
Use tcf spinlock and rcu to protect params pointer from concurrent
modification during dump and init. Use rcu swap operation to reassign
params pointer under protection of tcf lock. (old params value is not used
by init, so there is no need of standalone rcu dereference step)

Ife action has meta-actions that are compiled as standalone modules. Rtnl
mutex must be released while loading a kernel module. In order to support
execution without rtnl mutex, propagate 'rtnl_held' argument to meta action
loading functions. When requesting meta action module, conditionally
release rtnl lock depending on 'rtnl_held' argument.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
e8917f4370 net: sched: act_gact: remove dependency on rtnl lock
Use tcf spinlock to protect gact action private state from concurrent
modification during dump and init. Remove rtnl assertion that is no longer
necessary.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
b6a2b971c0 net: sched: act_csum: remove dependency on rtnl lock
Use tcf lock to protect csum action struct private data from concurrent
modification in init and dump. Use rcu swap operation to reassign params
pointer under protection of tcf lock. (old params value is not used by
init, so there is no need of standalone rcu dereference step)

Remove rtnl assertion that is no longer necessary.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
Vlad Buslov
2142236b45 net: sched: act_bpf: remove dependency on rtnl lock
Use tcf spinlock to protect bpf action private data from concurrent
modification during dump and init. Remove rtnl lock assertion that is no
longer necessary.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:37:09 -07:00
David S. Miller
2b14e1ea21 Merge branch 'net-sctp-Avoid-allocating-high-order-memory-with-kmalloc'
Konstantin Khorenko says:

====================
net/sctp: Avoid allocating high order memory with kmalloc()

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) == 24,
  ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

v3 changes:
 Move type chages struct sctp_stream_out -> flex_array to next patch.
 Make sctp_stream_{in,out}() static incline and move them to a header.

Performance results (single stream):
====================================
  * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
  * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
          RAM: 32 Gb

  * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
  * netperf server and client are run on the same node
  * ip link set lo mtu 1500

The script used to run tests:
 # cat run_tests.sh
 #!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
  echo "TEST: $test";
  for i in `seq 1 3`; do
    echo "Iteration: $i";
    set -x
    netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
            -l 60 -- -m 1452;
    set +x
  done
done
================================================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992   1452    60.21	1125.52		1247.04
212992 212992   1452    60.20	1376.38		1149.95
212992 212992   1452    60.20	1131.40		1163.85
TEST: SCTP_STREAM_MANY
212992 212992   1452    60.00	1111.00		1310.05
212992 212992   1452    60.00	1188.55		1130.50
212992 212992   1452    60.00	1108.06		1162.50

===========
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00	45486.98	46089.43
212992 212992 1        1       60.00	45584.18	45994.21
212992 212992 1        1       60.00	45703.86	45720.84
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00	40.75		40.77
212992 212992 1        1       60.00	40.58		40.08
212992 212992 1        1       60.00	39.98		39.97

Performance results for many streams:
=====================================
   * Kernel: v4.18-rc8 - stock and with 2 patches v3
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * sctp_test: https://github.com/sctp/lksctp-tools
   * both server and client are run on the same node
   * ip link set lo mtu 1500
   * sysctl -w vm.max_map_count=65530000 (need it to make memory fragmented)

The script used to run tests:
=============================
 # cat run_sctp_test.sh
 #!/bin/bash

set -x

uname -r
ip link set lo mtu 1500
swapoff -a

free
cat /proc/buddyinfo

./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
sleep 3

time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null

killall -9 lt-sctp_test
===============================

Results (a bit reformatted to be more readable):

1) ms stock kernel v4.18-rc8, no memory fragmentation
	test 1		test 2		test 3
real    0m14.715s	0m14.593s	0m15.954s
user    0m0.954s	0m0.955s	0m0.854s
sys     0m13.388s	0m12.537s	0m13.749s

2) kernel with fixes, no memory fragmentation
	test 1		test 2		test 3
real    0m14.959s	0m14.693s	0m14.762s
user    0m0.948s	0m0.921s	0m0.929s
sys     0m13.538s	0m13.225s	0m13.217s

3) kernel with fixes, memory fragmented
'free':
               total        used        free      shared  buff/cache   available
Mem:       32906008    30555200      302740         764     2048068      266452
Mem:       32906008    30379948      541436         764     1984624      442376
Mem:       32906008    30717312      262380         764     1926316      109908

/proc/buddyinfo:
Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0

	test 1		test 2		test 3
real    0m14.159s	0m15.252s	0m15.826s
user    0m0.839s	0m1.004s	0m1.048s
sys     0m11.827s	0m14.240s	0m14.778s
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:26:48 -07:00
Konstantin Khorenko
0d493b4d0b net/sctp: Replace in/out stream arrays with flex_array
This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:25:15 -07:00
Konstantin Khorenko
05364ca03c net/sctp: Make wrappers for accessing in/out streams
This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:25:15 -07:00
Keara Leibovitz
b70f1f3af4 tc: Update README and add config
Updated README.

Added config file that contains the minimum required features enabled to
run the tests currently present in the kernel.
This must be updated when new unittests are created and require their own
modules.

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:20:59 -07:00
David S. Miller
3305f9a905 Merge branch 'l2tp-rework-pppol2tp-ioctl-handling'
Guillaume Nault says:

====================
l2tp: rework pppol2tp ioctl handling

The current ioctl() handling code can be simplified. It tests for
non-relevant conditions and uselessly holds sockets. Once useless
code is removed, it becomes even simpler to let pppol2tp_ioctl() handle
commands directly, rather than dispatch them to pppol2tp_tunnel_ioctl()
or pppol2tp_session_ioctl(). That is the approach taken by this series.

Patch #1 and #2 define helper functions aimed at simplifying the rest
of the patch set.

Patch #3 drops useless tests in pppol2p_ioctl() and avoid holding a
refcount on the socket.

Patches #4, #5 and #6 are the core of the series. They let
pppol2tp_ioctl() handle all ioctls and drop the tunnel and session
specific functions.

Then patch #6 brings a little bit of consolidation.

Finally, patch #7 takes advantage of the simplified code to make
pppol2tp sockets compatible with dev_ioctl(). Certainly not a killer
feature, but it is trivial and it is always nice to see l2tp getting
better integration with the rest of the stack.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11 12:13:49 -07:00