linux-xiaomi-chiron

Author	SHA1	Message	Date
Maxime Chevallier	436d4fdb20	net: mvpp2: allow setting RSS flow hash parameters with ethtool This commit allows setting the RSS hash generation parameters from ethtool. When setting parameters for a given flow type from ethtool (e.g. tcp4), all the corresponding flows in the flow table are updated, according to the supported hash parameters. For example, when configuring TCP over IPv4 hash parameters to be src/dst IP + src/dst port ("ethtool -N eth0 rx-flow-hash tcp4 sdfn"), we only set the "src/dst port" hash parameters on the non-fragmented TCP over IPv4 flows. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	d33ec45250	net: mvpp2: add an RSS classification step for each flow One of the classification action that can be performed is to compute a hash of the packet header based on some header fields, and lookup a RSS table based on this hash to determine the final RxQ. This is done by adding one lookup entry per flow per port, so that we can configure the hash generation parameters for each flow and each port. There are 2 possible engines that can be used for RSS hash generation : - C3HA, that generates a hash based on up to 4 header-extracted fields - C3HB, that does the same as c3HA, but also includes L4 info in the hash There are a lot of fields that can be extracted from the header. For now, we only use the ones that we can configure using ethtool : - DST MAC address - L3 info - Source IP - Destination IP - Source port - Destination port The C3HB engine is selected when we use L4 fields (src/dst port). Header parser Dec table Ingress pkt +-------------+ flow id +----------------------------+ ------------->\| TCAM + SRAM \|-------->\|TCP IPv4 w/ VLAN, not frag \| +-------------+ \|TCP IPv4 w/o VLAN, not frag \| \|TCP IPv4 w/ VLAN, frag \|--+ \|etc. \| \| +----------------------------+ \| \| Flow table \| +---------+ +------------+ +--------------------------+ \| \| RSS tbl \|<--\| Classifier \|<--------\| flow 0: C2 lookup \| \| +---------+ +------------+ \| C3 lookup port 0 \| \| \| \| \| C3 lookup port 1 \| \| +-----------+ +-------------+ \| ... \| \| \| C2 engine \| \| C3H engines \| \| flow 1: C2 lookup \|<--+ +-----------+ +-------------+ \| C3 lookup port 0 \| \| ... \| \| ... \| \| flow 51 : C2 lookup \| \| ... \| +--------------------------+ The C2 engine also gains the role of enabling and disabling the RSS table lookup for this packet. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	f9358e12a0	net: mvpp2: split ingress traffic into multiple flows The PPv2 classifier allows to perform classification operations on each ingress packet, based on the flow the packet is assigned to. The current code uses only 1 flow per port, and the only classification action consists of assigning the rx queue to the packet, depending on the port. In preparation for adding RSS support, we have to split all incoming traffic into different flows. Since RSS assigns a rx queue depending on the hash of some header fields, we have to make sure that the hash is generated in a consistent way for all packets in the same flow. What we call a "flow" is actually a set of attributes attached to a packet that depends on various L2/L3/L4 info. This patch introduces 52 flows, wich are a combination of various L2, L3 and L4 attributes : - Whether or not the packet has a VLAN tag - Whether the packet is IPv4, IPv6 or something else - Whether the packet is TCP, UDP or something else - Whether or not the packet is fragmented at L3 level. The flow is associated to a packet by the Header Parser. Each flow corresponds to an entry in the decoding table. This entry then points to the sequence of classification lookups to be performed by the classifier, represented in the flow table. For now, the only lookup we perform is a C2 lookup to set the default rx queue. Header parser Dec table Ingress pkt +-------------+ flow id +----------------------------+ ------------->\| TCAM + SRAM \|-------->\|TCP IPv4 w/ VLAN, not frag \| +-------------+ \|TCP IPv4 w/o VLAN, not frag \| \|TCP IPv4 w/ VLAN, frag \|--+ \|etc. \| \| +----------------------------+ \| \| Flow table \| +------------+ +---------------------+ \| To RxQ <---\| Classifier \|<-------\| flow 0: C2 lookup \|<--------+ +------------+ \| flow 1: C2 lookup \| \| \| ... \| +------------+ \| flow 51 : C2 lookup \| \| C2 engine \| +---------------------+ +------------+ Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	b1a962c62c	net: mvpp2: use classifier to assign default rx queue The PPv2 Controller has a classifier, that can perform multiple lookup operations for each packet, using different engines. One of these engines is the C2 engine, which performs TCAM based lookups on data extracted from the packet header. When a packet matches an entry, the engine sets various attributes, used to perform classification operations. One of these attributes is the rx queue in which the packet should be sent. The current code uses the lookup_id table (also called decoding table) to assign the rx queue. However, this only works if we use one entry per port in the decoding table, which won't be the case once we add RSS lookups. This patch uses the C2 engine to assign the rx queue to each packet. The C2 engine is used through the flow table, which dictates what classification operations are done for a given flow. Right now, we have one flow per port, which contains every ingress packet for this port. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	e6e21c0242	net: mvpp2: rename per-port RSS init function mvpp22_init_rss function configures the RSS parameters for each port, so rename it accordingly. Since this function relies on classifier configuration, move its call right after the classifier config. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	2a2f467daf	net: mvpp2: make sure we don't spread load on disabled CPUs When filling the RSS table, we have to make sure that the rx queue is attached to an online CPU. This patch is not a full support for cpu_hotplug, but rather a way to make sure that we don't break network on system booted with the maxcpus parameter. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Antoine Tenart	662ae3fe65	net: mvpp2: improve the distribution of packets on CPUs when using RSS This patch adds an extra indirection when setting the indirection table into the RSS hardware table to improve the packets distribution across CPUs. For example, if 2 queues are used on a multi-core system this new indirection will choose two queues on two different CPUs instead of the two first queues which are on the same first CPU. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Antoine Tenart	8179642b52	net: mvpp2: RSS indirection table support This patch adds the RSS indirection table support, allowing to use the ethtool -x and -X options to dump and set this table. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> [Maxime: Small warning fixes, use one table per port] Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	a27a254c26	net: mvpp2: use one RSS table per port PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table to be used is chosen according to the default rx queue that would be used for the packet. This default rx queue is set in the Lookup_id Table (also called Decoding Table), and is equal to the port->first_rxq. Since the Classifier itself isn't active at any time for the moment, this doesn't have a direct effect, the default rx queue at the moment is the one where all packets end-up into. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	4b86097be7	net: mvpp2: fix RSS register definitions There is no RSS_TABLE register in PPv2 Controller. The register 0x1510 which was specified is actually named "RSS_HASH_SEL", but isn't used by this driver at all. Based on how this register was used, it should have been the RXQ2RSS_TABLE register, which allows to select the RSS table that will be used for the incoming packet. The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE register. Since RSS tables are actually not used by the driver for now, this commit does not fix a runtime bug. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Antoine Tenart	132baa0378	net: mvpp2: fix a typo in the RSS code Cosmetic patch fixing a typo in one of the RSS comments. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	f8c6ba8424	net: mvpp2: use only one rx queue per port per CPU The number of receive queue per port is : - MVPP2_DEFAULT_RXQ if in single queue mode - MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode with MVPP2_DEFAULT_RXQ = 4. However, we don't use the extra rx queues at the moment, we really only need one per port per CPU, until some more advanced classification rules are implemented. Suggested-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	790d32c6d3	net: mvpp2: fix hardcoded number of rx queues There's a dedicated #define that indicates the number of rx queues per port per cpu, this commit removes a harcoded use of that value This doesn't fix any runtime bugs since the harcoded value matches the expected value. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Yan Markman	4c4a5686c4	net: mvpp2: use RSS only when using multi-queue mode Since RSS only applies when we have per-cpu rx queues, it should only be enabled when the driver is configured to make use of multi-queue mode. Signed-off-by: Yan Markman <ymarkman@marvell.com> [Maxime: Commit message] Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	3f6aaf7289	net: mvpp2: make multi queue mode the default mode The multi queue mode is needed to have RSS available, and offers some nice advantages, being able to have one rx queue vector per CPU. This mode has been usable through the use of a module parameter, this commit makes it the default value. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	1e27a628e3	net: mvpp2: make sure we use single queue mode on PPv2.1 The PPv2 driver defines 2 "queue_modes" : - QDIST_SINGLE_MODE, where each port share one rx queue vector between all CPUs - QDIST_MULTI_MODE, where each port has one rx queue vector per CPU. Multi queue mode isn't available on PPv2.1, make sure we fallback to single mode when running on this revision. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	0ad2f53906	net: mvpp2: define the number of RSS entries per table in mvpp2.h The size of the the RSS indirection tables should be defined in mvpp2.h, so that we can use it in all files of the PPv2 driver. This commit moves the define in mvpp2.h, and adds the missing #include in mvpp2_cls.h. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:47 -07:00
Maxime Chevallier	53a40025c0	net: mvpp2: fix include guards in mvpp2_prs.h Include guards should be put before #includes. This doesn't fix any bug, but prevent future compilation issues when adding new files in the mvpp2 driver The Header Parser init function needs the platform_device definition, and with the fixed include guards we need to add the missing include. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:47 -07:00
Prashant Bhole	68d2f84a13	net: gro: properly remove skb from list Following crash occurs in validate_xmit_skb_list() when same skb is iterated multiple times in the loop and consume_skb() is called. The root cause is calling list_del_init(&skb->list) and not clearing skb->next in `d4546c2509`. list_del_init(&skb->list) sets skb->next to point to skb itself. skb->next needs to be cleared because other parts of network stack uses another kind of SKB lists. validate_xmit_skb_list() uses such list. A similar type of bugfix was reported by Jesper Dangaard Brouer. https://patchwork.ozlabs.org/patch/942541/ This patch clears skb->next and changes list_del_init() to list_del() so that list->prev will maintain the list poison. [ 148.185511] ================================================================== [ 148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0 [ 148.190158] Read of size 8 at addr ffff8801e52eefc0 by task swapper/1/0 [ 148.192940] [ 148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25 [ 148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 [ 148.199129] Call Trace: [ 148.200565] <IRQ> [ 148.201911] dump_stack+0xc6/0x14c [ 148.203572] ? dump_stack_print_info.cold.1+0x2f/0x2f [ 148.205083] ? kmsg_dump_rewind_nolock+0x59/0x59 [ 148.206307] ? validate_xmit_skb+0x2c6/0x560 [ 148.207432] ? debug_show_held_locks+0x30/0x30 [ 148.208571] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.211144] print_address_description+0x6c/0x23c [ 148.212601] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.213782] kasan_report.cold.6+0x241/0x2fd [ 148.214958] validate_xmit_skb_list+0x4b/0xa0 [ 148.216494] sch_direct_xmit+0x1b0/0x680 [ 148.217601] ? dev_watchdog+0x4e0/0x4e0 [ 148.218675] ? do_raw_spin_trylock+0x10/0x120 [ 148.219818] ? do_raw_spin_lock+0xe0/0xe0 [ 148.221032] __dev_queue_xmit+0x1167/0x1810 [ 148.222155] ? sched_clock+0x5/0x10 [...] [ 148.474257] Allocated by task 0: [ 148.475363] kasan_kmalloc+0xbf/0xe0 [ 148.476503] kmem_cache_alloc+0xb4/0x1b0 [ 148.477654] __build_skb+0x91/0x250 [ 148.478677] build_skb+0x67/0x180 [ 148.479657] e1000_clean_rx_irq+0x542/0x8a0 [ 148.480757] e1000_clean+0x652/0xd10 [ 148.481772] net_rx_action+0x4ea/0xc20 [ 148.482808] __do_softirq+0x1f9/0x574 [ 148.483831] [ 148.484575] Freed by task 0: [ 148.485504] __kasan_slab_free+0x12e/0x180 [ 148.486589] kmem_cache_free+0xb4/0x240 [ 148.487634] kfree_skbmem+0xed/0x150 [ 148.488648] consume_skb+0x146/0x250 [ 148.489665] validate_xmit_skb+0x2b7/0x560 [ 148.490754] validate_xmit_skb_list+0x70/0xa0 [ 148.491897] sch_direct_xmit+0x1b0/0x680 [ 148.493949] __dev_queue_xmit+0x1167/0x1810 [ 148.495103] br_dev_queue_push_xmit+0xce/0x250 [ 148.496196] br_forward_finish+0x276/0x280 [ 148.497234] __br_forward+0x44f/0x520 [ 148.498260] br_forward+0x19f/0x1b0 [ 148.499264] br_handle_frame_finish+0x65e/0x980 [ 148.500398] NF_HOOK.constprop.10+0x290/0x2a0 [ 148.501522] br_handle_frame+0x417/0x640 [ 148.502582] __netif_receive_skb_core+0xaac/0x18f0 [ 148.503753] __netif_receive_skb_one_core+0x98/0x120 [ 148.504958] netif_receive_skb_internal+0xe3/0x330 [ 148.506154] napi_gro_complete+0x190/0x2a0 [ 148.507243] dev_gro_receive+0x9f7/0x1100 [ 148.508316] napi_gro_receive+0xcb/0x260 [ 148.509387] e1000_clean_rx_irq+0x2fc/0x8a0 [ 148.510501] e1000_clean+0x652/0xd10 [ 148.511523] net_rx_action+0x4ea/0xc20 [ 148.512566] __do_softirq+0x1f9/0x574 [ 148.513598] [ 148.514346] The buggy address belongs to the object at ffff8801e52eefc0 [ 148.514346] which belongs to the cache skbuff_head_cache of size 232 [ 148.517047] The buggy address is located 0 bytes inside of [ 148.517047] 232-byte region [ffff8801e52eefc0, ffff8801e52ef0a8) [ 148.519549] The buggy address belongs to the page: [ 148.520726] page:ffffea000794bb00 count:1 mapcount:0 mapping:ffff880106f4dfc0 index:0xffff8801e52ee840 compound_mapcount: 0 [ 148.524325] flags: 0x17ffffc0008100(slab\|head) [ 148.525481] raw: 0017ffffc0008100 ffff880106b938d0 ffff880106b938d0 ffff880106f4dfc0 [ 148.527503] raw: ffff8801e52ee840 0000000000190011 00000001ffffffff 0000000000000000 [ 148.529547] page dumped because: kasan: bad access detected Fixes: `d4546c2509` ("net: Convert GRO SKB handling to list_head.") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reported-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:00:35 -07:00
Willem de Bruijn	8f19f12bdc	selftests: in udpgso_bench do not test udp zerocopy The udpgso benchmark compares various configurations of UDP and TCP. Including one that is not upstream, udp zerocopy. This is a leftover from the earlier RFC patchset. The test is part of kselftests and run in continuous spinners. Remove the failing case to make the test start passing. Fixes: `3a687bef14` ("selftests: udp gso benchmark") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:56:50 -07:00
Joel Fernandes (Google)	f8494fa3dd	tracing: Reorder display of TGID to be after PID Currently ftrace displays data in trace output like so: _-----=> irqs-off / _----=> need-resched \| / _---=> hardirq/softirq \|\| / _--=> preempt-depth \|\|\| / delay TASK-PID CPU TGID \|\|\|\| TIMESTAMP FUNCTION \| \| \| \| \|\|\|\| \| \| bash-1091 [000] ( 1091) d..2 28.313544: sched_switch: However Android's trace visualization tools expect a slightly different format due to an out-of-tree patch patch that was been carried for a decade, notice that the TGID and CPU fields are reversed: _-----=> irqs-off / _----=> need-resched \| / _---=> hardirq/softirq \|\| / _--=> preempt-depth \|\|\| / delay TASK-PID TGID CPU \|\|\|\| TIMESTAMP FUNCTION \| \| \| \| \|\|\|\| \| \| bash-1091 ( 1091) [002] d..2 64.965177: sched_switch: From kernel v4.13 onwards, during which TGID was introduced, tracing with systrace on all Android kernels will break (most Android kernels have been on 4.9 with Android patches, so this issues hasn't been seen yet). From v4.13 onwards things will break. The chrome browser's tracing tools also embed the systrace viewer which uses the legacy TGID format and updates to that are known to be difficult to make. Considering this, I suggest we make this change to the upstream kernel and backport it to all Android kernels. I believe this feature is merged recently enough into the upstream kernel that it shouldn't be a problem. Also logically, IMO it makes more sense to group the TGID with the TASK-PID and the CPU after these. Link: http://lkml.kernel.org/r/20180626000822.113931-1-joel@joelfernandes.org Cc: jreck@google.com Cc: tkjos@google.com Cc: stable@vger.kernel.org Fixes: `441dae8f2f` ("tracing: Add support for display of tgid in trace output") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-12 19:56:25 -04:00
Willem de Bruijn	993675a310	packet: reset network header if packet shorter than ll reserved space If variable length link layer headers result in a packet shorter than dev->hard_header_len, reset the network header offset. Else skb->mac_len may exceed skb->len after skb_mac_reset_len. packet_sendmsg_spkt already has similar logic. Fixes: `b84bbaf7a6` ("packet: in packet_snd start writing at link layer allocation") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:55:59 -07:00
Willem de Bruijn	bab2c80e5a	nsh: set mac len based on inner packet When pulling the NSH header in nsh_gso_segment, set the mac length based on the encapsulated packet type. skb_reset_mac_len computes an offset to the network header, which here still points to the outer packet: > skb_reset_network_header(skb); > [...] > __skb_pull(skb, nsh_len); > skb_reset_mac_header(skb); // now mac hdr starts nsh_len == 8B after net hdr > skb_reset_mac_len(skb); // mac len = net hdr - mac hdr == (u16) -8 == 65528 > [..] > skb_mac_gso_segment(skb, ..) Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA@mail.gmail.com Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com Fixes: `c411ed8545` ("nsh: add GSO support") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:55:29 -07:00
David S. Miller	c8c81de96b	Merge branch 's390-qeth-updates' Julian Wiedmann says: ==================== s390/qeth: updates 2018-07-11 please apply this first batch of qeth patches for net-next. It brings the usual cleanups, and some performance improvements to the transmit paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:40 -07:00
Julian Wiedmann	fb321f25e5	s390/qeth: speed-up IPv4 OSA xmit Move the xmit of offload-eligible (ie IPv4) traffic on OSA over to the new, copy-free path. As with L2, we'll need to preserve the skb_orphan() behaviour of the old code path until TX completion is sufficiently fast. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:40 -07:00
Julian Wiedmann	a647a02512	s390/qeth: speed-up L3 IQD xmit This implements a new xmit path for L3 HiperSockets, which carves the HW header from skb headroom instead of allocating it from the hdr cache. It also adds NETIF_F_SG support. The delta in qeth_l3_xmit() is all just removal of IQD-specific code and some minor consolidation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:40 -07:00
Julian Wiedmann	ea1d4a0c7f	s390/qeth: add a L3 xmit wrapper In preparation for future work, move the high-level xmit work into a separate wrapper. This matches the L2 xmit code. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	371a1e7a07	s390/qeth: increase GSO max size for eligible L3 devices When a L3 device doesn't offer TSO, allow the stack to build full-size GSO skbs. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	09960b3a0a	s390/qeth: clean up exported symbols Remove some redundant EXPORTs. While at it, also move some L2-only prototypes into the proper header file. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	6d8769abe4	s390/qeth: consolidate ccwgroup driver definition Reshuffle the code a bit so that everything is in one place. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	86c0cdb9e0	s390/qeth: clean up Output Queue selection Consolidate duplicated code, fix the misuse of RTN_UNSPEC and simplify the handling of non-unicast traffic on IQD devices. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	9aa17df3b8	s390/qeth: fine-tune RX modesetting Changing a device's address lists (or its promisc mode) already triggers an RX modeset, there's no need to do it manually from the L2 driver's ndo_vlan_rx_kill_vid() hook. Also when setting a device online, dev_open() already calls dev_set_rx_mode(). So a manual modeset is only necessary from the recovery path. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	f67a43a73b	s390/qeth: remove unused buffer->aob pointer Except for tracing, the pointer is not used. At the same time, accessing it from qeth_qdio_output_handler() is racy: whenever qeth_qdio_cq_handler() gets control, its call to qeth_qdio_handle_aob() frees the AOB. So the AOB pointer that qeth_qdio_output_handler() stores into 'buffer' can go stale at any time, and trigger a use-after-free. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	3b346c1814	s390/qeth: various buffer management cleanups Use the new qeth_scrub_qdio_buffer() helper, remove an extra parameter from qeth_clear_output_buffer(), init the bufstates.user field just once (in qeth_flush_buffers()) and remove some noisy trace messages. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Jesper Dangaard Brouer	0761680d52	net: ipv4: fix listify ip_rcv_finish in case of forwarding In commit `5fa12739a5` ("net: ipv4: listify ip_rcv_finish") calling dst_input(skb) was split-out. The ip_sublist_rcv_finish() just calls dst_input(skb) in a loop. The problem is that ip_sublist_rcv_finish() forgot to remove the SKB from the list before invoking dst_input(). Further more we need to clear skb->next as other parts of the network stack use another kind of SKB lists for xmit_more (see dev_hard_start_xmit). A crash occurs if e.g. dst_input() invoke ip_forward(), which calls dst_output()/ip_output() that eventually calls __dev_queue_xmit() + sch_direct_xmit(), and a crash occurs in validate_xmit_skb_list(). This patch only fixes the crash, but there is a huge potential for a performance boost if we can pass an SKB-list through to ip_forward. Fixes: `5fa12739a5` ("net: ipv4: listify ip_rcv_finish") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:40:19 -07:00
Theodore Ts'o	8d5a803c6a	ext4: check for allocation block validity with block group locked With commit `044e6e3d74`: "ext4: don't update checksum of new initialized bitmaps" the buffer valid bit will get set without actually setting up the checksum for the allocation bitmap, since the checksum will get calculated once we actually allocate an inode or block. If we are doing this, then we need to (re-)check the verified bit after we take the block group lock. Otherwise, we could race with another process reading and verifying the bitmap, which would then complain about the checksum being invalid. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2018-07-12 19:08:05 -04:00
Paul E. McKenney	18952651da	Merge branches 'fixes1.2018.07.12b' and 'torture1.2018.07.12b' into HEAD fixes1.2018.07.12b: Post-gp_seq miscellaneous fixes torture1.2018.07.12b: Post-gp_seq torture-test updates	2018-07-12 15:42:41 -07:00
Joel Fernandes (Google)	bf5b64355a	rcutorture: Fix rcu_barrier successes counter The rcutorture test module currently increments both successes and error for the barrier test upon error, which results in misleading statistics being printed. This commit therefore changes the code to increment the success counter only when the test actually passes. This change was tested by by returning from the barrier callback without incrementing the callback counter, thus introducing what appeared to rcutorture to be rcu_barrier() failures. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:08 -07:00
Joel Fernandes (Google)	4babd855fd	rcutorture: Add support to detect if boost kthread prio is too low When rcutorture is built in to the kernel, an earlier patch detects that and raises the priority of RCU's kthreads to allow rcutorture's RCU priority boosting tests to succeed. However, if rcutorture is built as a module, those priorities must be raised manually via the rcutree.kthread_prio kernel boot parameter. If this manual step is not taken, rcutorture's RCU priority boosting tests will fail due to kthread starvation. One approach would be to raise the default priority, but that risks breaking existing users. Another approach would be to allow runtime adjustment of RCU's kthread priorities, but that introduces numerous "interesting" race conditions. This patch therefore instead detects too-low priorities, and prints a message and disables the RCU priority boosting tests in that case. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:08 -07:00
Arnd Bergmann	622be33fcb	rcutorture: Use monotonic timestamp for stall detection The get_seconds() call is deprecated because it overflows on 32-bit architectures. The algorithm in rcu_torture_stall() can deal with the overflow, but another problem here is that using a CLOCK_REALTIME stamp can lead to a false-positive stall warning when a settimeofday() happens concurrently. Using ktime_get_seconds() instead avoids those issues and will never overflow. The added cast to 'unsigned long' however is necessary to make ULONG_CMP_LT() work correctly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:07 -07:00
Joel Fernandes (Google)	3b745c8969	rcutorture: Make boost test more robust Currently, with RCU_BOOST disabled, I get no failures when forcing rcutorture to test RCU boost priority inversion. The reason seems to be that we don't check for failures if the callback never ran at all for the duration of the boost-test loop. Further, the 'rtb' and 'rtbf' counters seem to be used inconsistently. 'rtb' is incremented at the start of each test and 'rtbf' is incremented per-cpu on each failure of call_rcu. So its possible 'rtbf' > 'rtb'. To test the boost with rcutorture, I did following on a 4-CPU x86 machine: modprobe rcutorture test_boost=2 sleep 20 rmmod rcutorture With patch: rtbf: 8 rtb: 12 Without patch: rtbf: 0 rtb: 2 In summary this patch: - Increments failed and total test counters once per boost-test. - Checks for failure cases correctly. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:06 -07:00
Joel Fernandes (Google)	450efca718	rcutorture: Disable RT throttling for boost tests Currently rcutorture is not able to torture RCU boosting properly. This is because the rcutorture's boost threads which are doing the torturing may be throttled due to RT throttling. This patch makes rcutorture use the right torture technique (unthrottled rcutorture boost tasks) for torturing RCU so that the test fails correctly when no boost is available. Currently this requires accessing sysctl_sched_rt_runtime directly, but that should be Ok since rcutorture is test code. Such direct access is also only possible if rcutorture is used as a built-in so make it conditional on that. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:06 -07:00
Paul E. McKenney	bf1bef50be	rcutorture: Emphasize testing of single reader protection type For RCU implementations supporting multiple types of reader protection, rcutorture currently randomly selects the combinations of types of protection for each phase of each reader. The problem with this, for example, given the four kinds of protection for RCU-sched (local_irq_disable(), local_bh_disable(), preempt_disable(), and rcu_read_lock_sched()), the reader will be protected by a single mechanism only 25% of the time. We really heavier testing of single read-side mechanisms. This commit therefore uses only a single mechanism about 60% of the time, half of the time explicitly and one-eighth of the time by chance. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:05 -07:00
Paul E. McKenney	2397d072f7	rcutorture: Handle extended read-side critical sections This commit enables rcutorture to test whether RCU properly aggregates different types of read-side critical sections into a larger section covering the set. It does this by extending an initial read-side critical section randomly for a random number of extensions. There is a new rcu_torture_ops field ->extendable that specifies what extensions are permitted for a given flavor of RCU (for example, SRCU does not permit any extensions, while RCU-sched permits all types). Note that if a given operation (for example, local_bh_disable()) extends an RCU read-side critical section, then rcutorture feels free to also start and end the critical section with that operation's type of disabling. Disabling operations include local_bh_disable(), local_irq_disable(), and preempt_disable(). This commit also adds a new "busted_srcud" torture type, which verifies rcutorture's ability to detect extensions of RCU read-side critical sections that are not handled. Gotta test the test, after all! Note that it is not legal to invoke local_bh_disable() with interrupts disabled, and this transition is avoided by overriding the random-number generator when it wants to call local_bh_disable() while interrupts are disabled. The code instead leaves both interrupts and bh/softirq disabled in this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:05 -07:00
Paul E. McKenney	241b42522a	rcutorture: Make rcu_torture_timer() use rcu_torture_one_read() This commit saves a few lines of code by making rcu_torture_timer() invoke rcu_torture_one_read(), thus completing the consolidation of code between rcu_torture_timer() and rcu_torture_reader(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:04 -07:00
Paul E. McKenney	3025520ec4	rcutorture: Use per-CPU random state for rcu_torture_timer() Currently, the rcu_torture_timer() function uses a single global torture_random_state structure protected by a single global lock. This conflicts to some extent with performance and scalability, but even more with the goal of consolidating read-side testing with rcu_torture_reader(). This commit therefore creates a per-CPU torture_random_state structure for use by rcu_torture_timer() and eliminates the lock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Make rcu_torture_timer_rand static, per 0day Test Robot report. ]	2018-07-12 15:42:04 -07:00
Paul E. McKenney	8da9a59523	rcutorture: Use atomic increment for n_rcu_torture_timers Currently, rcu_torture_timer() relies on a lock to guard updates to n_rcu_torture_timers. Unfortunately, consolidating code with rcu_torture_reader() will dispense with this lock. This commit therefore makes n_rcu_torture_timers be an atomic_long_t and uses atomic_long_inc() to carry out the update. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:03 -07:00
Paul E. McKenney	6b06aa723e	rcutorture: Extract common code from rcu_torture_reader() This commit extracts the code executed on each pass through the loop in rcu_torture_reader() into a new rcu_torture_one_read() function. This new function will also be used by rcu_torture_timer(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:02 -07:00
Paul E. McKenney	2d3625841d	rcuperf: Remove unused torturing_tasks() function The torturing_tasks() function in rcuperf.c is not used, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:02 -07:00
Paul E. McKenney	6bea2cc5a9	rcu: Remove rcutorture test version and sequence number Back when RCU had a debugfs interface, there was a test version and sequence number that allowed associating debugfs data with a particular test run, where the test run started with modprobe and ended with rmmod, which was how tests were run back on the old ABAT system within IBM. But rcutorture testing no longer runs on ABAT, and there is no longer an RCU debugfs interface, so there is no longer any need for test versions and sequence numbers. This commit therefore removes the rcutorture_record_test_transition() and rcutorture_record_progress() functions, and along with them the rcutorture_testseq and rcutorture_vernum variables that they update. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:42:01 -07:00

... 177 178 179 180 181 ...

781771 commits