Commit graph

546427 commits

Author SHA1 Message Date
Dave Chinner
74278da9f7 inode: convert inode_sb_list_lock to per-sb
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:46 -04:00
Josef Bacik
cbedaac634 inode: add hlist_fake to avoid the inode hash lock in evict
Some filesystems don't use the VFS inode hash and fake the fact they
are hashed so that all the writeback code works correctly. However,
this means the evict() path still tries to remove the inode from the
hash, meaning that the inode_hash_lock() needs to be taken
unnecessarily. Hence under certain workloads the inode_hash_lock can
be contended even if the inode is never actually hashed.

To avoid this add hlist_fake to test if the inode isn't actually
hashed to avoid taking the hash lock on inodes that have never been
hashed.  Based on Dave Chinner's

inode: add IOP_NOTHASHED to avoid inode hash lock in evict

basd on Al's suggestions.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17 18:39:45 -04:00
Dave Chinner
d353d7587d writeback: plug writeback at a high level
Doing writeback on lots of little files causes terrible IOPS storms
because of the per-mapping writeback plugging we do. This
essentially causes imeediate dispatch of IO for each mapping,
regardless of the context in which writeback is occurring.

IOWs, running a concurrent write-lots-of-small 4k files using fsmark
on XFS results in a huge number of IOPS being issued for data
writes.  Metadata writes are sorted and plugged at a high level by
XFS, so aggregate nicely into large IOs. However, data writeback IOs
are dispatched in individual 4k IOs, even when the blocks of two
consecutively written files are adjacent.

Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
metadata CRCs enabled.

Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)

Test:

$ ./fs_mark  -D  10000  -S0  -n  10000  -s  4096  -L  120  -d
/mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d
/mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d
/mnt/scratch/6  -d  /mnt/scratch/7

Result:

		wall	sys	create rate	Physical write IO
		time	CPU	(avg files/s)	 IOPS	Bandwidth
		-----	-----	------------	------	---------
unpatched	6m56s	15m47s	24,000+/-500	26,000	130MB/s
patched		5m06s	13m28s	32,800+/-600	 1,500	180MB/s
improvement	-26.44%	-14.68%	  +36.67%	-94.23%	+38.46%

If I use zero length files, this workload at about 500 IOPS, so
plugging drops the data IOs from roughly 25,500/s to 1000/s.
3 lines of code, 35% better throughput for 15% less CPU.

The benefits of plugging at this layer are likely to be higher for
spinning media as the IO patterns for this workload are going make a
much bigger difference on high IO latency devices.....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2015-08-17 18:39:45 -04:00
David S. Miller
863960b4c5 Merge branch 'enic-devcmd2'
Govindarajulu Varadarajan says:

====================
enic: add devcmd2

This series adds new devcmd2 support. The first two patches are code
refactoring.

devcmd is an interface for driver to communicate with fw/adaptor. It
involves writing data to hardware registers and waiting for the result.
This mechanism does not scale well. The queuing of "no wait" devcmds is
done in firmware memory rather than on the host. Firmware memory is a
rather more scarce and valuable resource than host memory. A devcmd storm
from one vf can disrupt the service on other pf/vf. The lack of flow
control allows for possible denial of server from one VM to another.
Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
allows better flow control.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:25:30 -07:00
Govindarajulu Varadarajan
373fb0873d enic: add devcmd2
devcmd is an interface for driver to communicate with fw/adaptor. It
involves writing data to hardware registers and waiting for the result.
This mechanism does not scale well. The queuing of "no wait" devcmds is
done in firmware memory rather than on the host. Firmware memory is a
rather more scarce and valuable resource than host memory. A devcmd storm
from one vf can disrupt the service on other pf/vf. The lack of flow
control allows for possible denial of server from one VM to another.

Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
allows better flow control.

Initialize devcmd2, if fails we fall back to devcmd1.

Also change the driver version.

Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:25:29 -07:00
Govindarajulu Varadarajan
fda3f52bdb enic: add devcmd2 resources
Add devcmd resources to vnic_res_type. Add data types used by devcmd.

Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:25:29 -07:00
Govindarajulu Varadarajan
6a3c2f838c enic: use netdev_<foo> or dev_<foo> instead of pr_<foo>
pr_info does not give any details about the interface involved. This patch
uses netdev_info for printing the message. Use dev_info where netdev is not
ready.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:25:29 -07:00
Govindarajulu Varadarajan
8b89f3a19d enic: move struct definition from .c to .h file
Some of the structure definitions are in .c file to make them private to
that file. This patch moves the struct definition to .h file, So that their
definitions are accessible from other files.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:25:29 -07:00
Trond Myklebust
7e94d6c4ab NFS: Don't fsync twice for O_SYNC/IS_SYNC files
generic_file_write_iter() will already do an fsync on our behalf
if the file descriptor is O_SYNC or the file is marked as IS_SYNC.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 16:55:18 -05:00
Keith Busch
e824410ffc NVMe: Set queue max segments
This sets the queue's max segment size to match the device's
capabilities. The default of 128 is usable until a device's transfer
capability exceeds 512k, assuming a device page size of 4k. Many nvme
devices exceed that transfer limit, so this lets the block layer know what
kind of commands it to allow to form rather than unnecessarily split them.

One additional segment is added to account for a transfer that may start
in the middle of a page.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-17 15:52:23 -06:00
David S. Miller
2ea273d76a net: Export bpf_prog_create_from_user().
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:37:06 -07:00
Ian Morris
ec120da6f0 ipv6: trivial whitespace fix
Change brace placement to be in line with coding standards

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:34:48 -07:00
Phil Sutter
f4a3e90ba5 rhashtable-test: extend to test concurrency
After having tested insertion, lookup, table walk and removal, spawn a
number of threads running operations on the same rhashtable. Each of
them will:

1) insert it's own set of objects,
2) lookup every successfully inserted object and finally
3) remove objects in several rounds until all of them have been removed,
   making sure the remaining ones are still found after each round.

This should put a good amount of load onto the system and due to
synchronising thread startup via two semaphores also extensive
concurrent table access.

The default number of ten threads returned within half a second on my
local VM with two cores. Running 200 threads took about four seconds. If
slow systems suffer too much from this though, the default could be
lowered or even set to zero so this extended test does not run at all by
default.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:33:47 -07:00
David S. Miller
c1f066d4ee Included changes:
- avoid integer overflow in GW selection routine
 - prevent race condition by making capability bit changes atomic (use
   clear/set/test_bit)
 - fix synchronization issue in mcast tvlv handler
 - fix crash on double list removal of TT Request objects
 - fix leak by puring packets enqueued for sending upon iface removal
 - ensure network header pointer is set in skb
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVzlUGAAoJEOb/4TMchkvfy9EQAJ0DBKdTqiuWwNQzZ9ghCeLi
 Njgwyq4WdvZXp8vIj9cq0AR1JsFvH9fGp4BlLE8pGsQyxURiWQmR/Jurvb0UGMhM
 SPVvl/b0nnCrRu937UcVEiq6wRrFwEr1fYfL45fY0c1eTkXSu/54iEURDvWlkpyX
 COB4cuydoPY6O2LGiPRsGlOGTYbQ/u89G1HyxFr9Ch2IVRMPVXB6zuI7puuEg2ee
 EwgI2IGO/GHHaScbXQpVy+bL6XL/OcQ4nCrYeaNpwbOKAOfGabNzfVncSCLRKMSW
 jZh5D3cG3AKGUPeooEtd8yGgNG3SSVNTXt7CL+WOM7/2EKvYeNdJFaQzbdZQEtT0
 begOcUJ2K0yas/iExqYXtEPFCWXhjuzrb4u6wu5psGgu3zt1nltpCkplDEMcSEia
 C+dYE5r/JM2S1lsFclnjz+rBXu4ZfCUvV0q+50+lGiWPhfemM2M2XqJNcay0ioSm
 radSwMe549sh6r3ajwdxF5z66tU94PERMMxVxJ9Z3gmHUZ56phxEe6X0xBO91bhW
 aSCcFr6PQMxmfwanLj5Ydxm5MPHdDXF84YPMTFcGb4ya9ZWWURGNq7Sr85Ek7N6W
 5DIA8F8/vsgaXYUn011aBu4q1mm5ZsTs9nR3MAxchAuuu3sTBOkd/BecS4T7CEOE
 IHnbxsJxXReJ/aCQFoks
 =ZEuN
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- avoid integer overflow in GW selection routine
- prevent race condition by making capability bit changes atomic (use
  clear/set/test_bit)
- fix synchronization issue in mcast tvlv handler
- fix crash on double list removal of TT Request objects
- fix leak by puring packets enqueued for sending upon iface removal
- ensure network header pointer is set in skb
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:31:42 -07:00
Ivan Vecera
af19e68683 be2net: avoid vxlan offloading on multichannel configs
VxLAN offloading is not functional if the NIC is running in multichannel
mode (UMC, FLEX-10, VNIC...). Enabling this additionally kills whole
connectivity through the NIC and the device needs to be down and up to
restore it. The firmware should take care about it and does not allow
the conversion of interface to tunnel type (be_cmd_manage_iface) or should
support VxLAN offloading if multichannel config is enabled.
I have tested this on the latest available firmware (10.6.144.21).

Result:
[root@sm-04 ~]# ip link set enp5s0f0 up[root@sm-04 ~]# ip addr add 172.30.10.50/24 dev enp5s0f0
[root@sm-04 ~]# ping -c 3 172.30.10.254PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.317 ms
64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.187 ms
64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.188 ms

 --- 172.30.10.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.187/0.230/0.317/0.063 ms
[root@sm-04 ~]# ip link add link enp5s0f0 vxlan10 type vxlan id 10 remote 172.30.10.60 dstport 4789
[root@sm-04 ~]# ip link set vxlan10 up
[ 7900.442811] be2net 0000:05:00.0: Enabled VxLAN offloads for UDP port 4789
[ 7900.455722] be2net 0000:05:00.1: Enabled VxLAN offloads for UDP port 4789
[ 7900.468635] be2net 0000:05:00.2: Enabled VxLAN offloads for UDP port 4789
[ 7900.481553] be2net 0000:05:00.3: Enabled VxLAN offloads for UDP port 4789
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.

 --- 172.30.10.254 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms

[root@sm-04 ~]# ip link set vxlan10 down
[ 7959.434093] be2net 0000:05:00.0: Disabled VxLAN offloads for UDP port 4789
[ 7959.444792] be2net 0000:05:00.1: Disabled VxLAN offloads for UDP port 4789
[ 7959.455592] be2net 0000:05:00.2: Disabled VxLAN offloads for UDP port 4789
[ 7959.466416] be2net 0000:05:00.3: Disabled VxLAN offloads for UDP port 4789
[root@sm-04 ~]# ip link del vxlan10
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.

 --- 172.30.10.254 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms

[root@sm-04 ~]# ip link set enp5s0f0 down
[root@sm-04 ~]# ip link set enp5s0f0 up
[ 8071.019003] be2net 0000:05:00.0 enp5s0f0: Link is Up
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.318 ms
64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.196 ms
64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.194 ms

 --- 172.30.10.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.194/0.236/0.318/0.057 ms

Cc: Sathya Perla <sathya.perla@avagotech.com>
Cc: Ajit Khaparde <ajit.khaparde@avagotech.com>
Cc: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Ajit Khaparde <ajit.khaparde@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:29:58 -07:00
David S. Miller
1f979b117b Merge branch 'ipv6_percpu_rt_deadlock'
Martin KaFai Lau says:

====================
ipv6: Fix a potential deadlock when creating pcpu rt

v1 -> v2:
A minor change in the commit message of patch 2.

This patch series fixes a potential deadlock when creating a pcpu rt.
It happens when dst_alloc() decided to run gc. Something like this:

read_lock(&table->tb6_lock);
ip6_rt_pcpu_alloc()
=> dst_alloc()
=> ip6_dst_gc()
=> write_lock(&table->tb6_lock); /* oops */

Patch 1 and 2 are some prep works.
Patch 3 is the fix.

Original report: https://bugzilla.kernel.org/show_bug.cgi?id=102291
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:28:04 -07:00
Martin KaFai Lau
9c7370a166 ipv6: Fix a potential deadlock when creating pcpu rt
rt6_make_pcpu_route() is called under read_lock(&table->tb6_lock).
rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
calls dst_alloc().  dst_alloc() _may_ call ip6_dst_gc() which takes
the write_lock(&tabl->tb6_lock).  A visualized version:

read_lock(&table->tb6_lock);
rt6_make_pcpu_route();
=> ip6_rt_pcpu_alloc();
=> dst_alloc();
=> ip6_dst_gc();
=> write_lock(&table->tb6_lock); /* oops */

The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc().

A reported stack:

[141625.537638] INFO: rcu_sched self-detected stall on CPU { 27}  (t=60000 jiffies g=4159086 c=4159085 q=2139)
[141625.547469] Task dump for CPU 27:
[141625.550881] mtr             R  running task        0 22121  22081 0x00000008
[141625.558069]  0000000000000000 ffff88103f363d98 ffffffff8106e488 000000000000001b
[141625.565641]  ffffffff81684900 ffff88103f363db8 ffffffff810702b0 0000000008000000
[141625.573220]  ffffffff81684900 ffff88103f363de8 ffffffff8108df9f ffff88103f375a00
[141625.580803] Call Trace:
[141625.583345]  <IRQ>  [<ffffffff8106e488>] sched_show_task+0xc1/0xc6
[141625.589650]  [<ffffffff810702b0>] dump_cpu_task+0x35/0x39
[141625.595144]  [<ffffffff8108df9f>] rcu_dump_cpu_stacks+0x6a/0x8c
[141625.601320]  [<ffffffff81090606>] rcu_check_callbacks+0x1f6/0x5d4
[141625.607669]  [<ffffffff810940c8>] update_process_times+0x2a/0x4f
[141625.613925]  [<ffffffff8109fbee>] tick_sched_handle+0x32/0x3e
[141625.619923]  [<ffffffff8109fc2f>] tick_sched_timer+0x35/0x5c
[141625.625830]  [<ffffffff81094a1f>] __hrtimer_run_queues+0x8f/0x18d
[141625.632171]  [<ffffffff81094c9e>] hrtimer_interrupt+0xa0/0x166
[141625.638258]  [<ffffffff8102bf2a>] local_apic_timer_interrupt+0x4e/0x52
[141625.645036]  [<ffffffff8102c36f>] smp_apic_timer_interrupt+0x39/0x4a
[141625.651643]  [<ffffffff8140b9e8>] apic_timer_interrupt+0x68/0x70
[141625.657895]  <EOI>  [<ffffffff81346ee8>] ? dst_destroy+0x7c/0xb5
[141625.664188]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
[141625.670272]  [<ffffffff81082b45>] ? queue_write_lock_slowpath+0x60/0x6f
[141625.677140]  [<ffffffff8140aa33>] _raw_write_lock_bh+0x23/0x25
[141625.683218]  [<ffffffff813d4553>] __fib6_clean_all+0x40/0x82
[141625.689124]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
[141625.695207]  [<ffffffff813d6058>] fib6_clean_all+0xe/0x10
[141625.700854]  [<ffffffff813d60d3>] fib6_run_gc+0x79/0xc8
[141625.706329]  [<ffffffff813d0510>] ip6_dst_gc+0x85/0xf9
[141625.711718]  [<ffffffff81346d68>] dst_alloc+0x55/0x159
[141625.717105]  [<ffffffff813d09b5>] __ip6_dst_alloc.isra.32+0x19/0x63
[141625.723620]  [<ffffffff813d1830>] ip6_pol_route+0x36a/0x3e8
[141625.729441]  [<ffffffff813d18d6>] ip6_pol_route_output+0x11/0x13
[141625.735700]  [<ffffffff813f02c8>] fib6_rule_action+0xa7/0x1bf
[141625.741698]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
[141625.748043]  [<ffffffff81357c48>] fib_rules_lookup+0xb5/0x12a
[141625.754050]  [<ffffffff81141628>] ? poll_select_copy_remaining+0xf9/0xf9
[141625.761002]  [<ffffffff813f0535>] fib6_rule_lookup+0x37/0x5c
[141625.766914]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
[141625.773260]  [<ffffffff813d008c>] ip6_route_output+0x7a/0x82
[141625.779177]  [<ffffffff813c44c8>] ip6_dst_lookup_tail+0x53/0x112
[141625.785437]  [<ffffffff813c45c3>] ip6_dst_lookup_flow+0x2a/0x6b
[141625.791604]  [<ffffffff813ddaab>] rawv6_sendmsg+0x407/0x9b6
[141625.797423]  [<ffffffff813d7914>] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2
[141625.804464]  [<ffffffff8139d4b4>] inet_sendmsg+0x57/0x8e
[141625.810028]  [<ffffffff81329ba3>] sock_sendmsg+0x2e/0x3c
[141625.815588]  [<ffffffff8132be57>] SyS_sendto+0xfe/0x143
[141625.821063]  [<ffffffff813dd551>] ? rawv6_setsockopt+0x5e/0x67
[141625.827146]  [<ffffffff8132c9f8>] ? sock_common_setsockopt+0xf/0x11
[141625.833660]  [<ffffffff8132c08c>] ? SyS_setsockopt+0x81/0xa2
[141625.839565]  [<ffffffff8140ac17>] entry_SYSCALL_64_fastpath+0x12/0x6a

Fixes: d52d3997f8 ("pv6: Create percpu rt6_info")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:28:03 -07:00
Martin KaFai Lau
a73e419563 ipv6: Add rt6_make_pcpu_route()
It is a prep work for fixing a potential deadlock when creating
a pcpu rt.

The current rt6_get_pcpu_route() will also create a pcpu rt if one does not
exist.  This patch moves the pcpu rt creation logic into another function,
rt6_make_pcpu_route().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:28:03 -07:00
Martin KaFai Lau
ad70686289 ipv6: Remove un-used argument from ip6_dst_alloc()
After 4b32b5ad31 ("ipv6: Stop rt6_info from using inet_peer's metrics"),
ip6_dst_alloc() does not need the 'table' argument.  This patch
cleans it up.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:28:03 -07:00
Igor Plyatov
776829de90 net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
* Due to HW bug, LAN8700 sometimes does not detect presence of energy in the
  Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is
  set, the ENERGYON bit does not asserted sometimes). This is a common bug of
  LAN87xx family of PHY chips.
* The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous
  algorythm still not reliable on 100 % and sometimes skip cable plugging.

Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:27:23 -07:00
David S. Miller
2bd736fa0d Another pull request for the next cycle, this time with quite
a bit of content:
  * mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
  * TDLS higher bandwidth support (Arik)
  * OCB fixes from Bertold Van den Bergh
  * suspend/resume fixes from Eliad
  * dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
  * VHT bitrate mask support (Lorenzo Bianconi)
  * better regulatory support for 5/10 MHz channels (Matthias May)
  * basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
 along with a number of other cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVzg5bAAoJEDBSmw7B7bqr3PAP/1r8wyZXxtySzz6P5Z9k0+2I
 52NiSUISgmtnaQUyahf4n90eMU+gGJWQwPwIZFvMKg6bD4RW2XI4MdKmviKx8skU
 4sDlDxMFrVMfV/ySwiPDAONWPtwwgKllIt0IDDnKs6kPdDlUcbKOTEFYhzZ1HhTZ
 7Og4rJm7M90QpdMU7hmxmE5KRkp1hW0Yce1KPTW5U0j9yl9zbi4eLVWT+ac1WnZs
 GpItajd0BFtBy7DRHzX8RiRJ4pi+aWxhuYNqiSxUm0BqPWCzT7PP15M1kCGwrXtm
 /TTSVJl7WkLbOYI0PE0Y0XcJfZUg1c9aecCR3ubmRrQrGfOBFpN01jUANIRwqvZ3
 3QRq1RZNLac0+zlBPjoFdOHmoaVX6UcJQKSgOhcfuM1BcNFnXZEcHFN4/SaEUfvJ
 1ltybEeOEAckCMqqfHb1g/nVfJnlBjy811GzIrsHXqKqb7rRfGkfxmBxLrRzVknS
 PC970pbuhxICeeryKdVgK5BClWeT3TB1srt6OZ0QR1zlcfZbLZ8jqJlHJcy3szFi
 P43X9w8I6ZNTzkBU+lsCt9gbveYS+rSaJ+zm/SaF21ro33+FEdZ+p1ujjzp729Tz
 PnKobaOrku38Be7CSwJ760WvngC7gbZqGybGknBsws4dqDXJste0UjxulZeyaOkN
 nVmHDL45jc5rd8qjoPQV
 =kV1a
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Another pull request for the next cycle, this time with quite
a bit of content:
 * mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
 * TDLS higher bandwidth support (Arik)
 * OCB fixes from Bertold Van den Bergh
 * suspend/resume fixes from Eliad
 * dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
 * VHT bitrate mask support (Lorenzo Bianconi)
 * better regulatory support for 5/10 MHz channels (Matthias May)
 * basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
along with a number of other cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:25:04 -07:00
kbuild test robot
18df8a87ba dlm: sctp_accept_from_sock() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:23:09 -05:00
David S. Miller
90eb7fa51c Merge branch 'bpf_fanout'
Willem de Bruijn says:

====================
packet: add cBPF and eBPF fanout modes

Allow programmable fanout modes. Support both classical BPF programs
passed directly and extended BPF programs passed by file descriptor.

One use case is packet steering by deep packet inspection, for
instance for packet steering by application layer header fields.

Separate the configuration of the fanout mode and the configuration
of the program, to allow dynamic updates to the latter at runtime.

Changes
  v1 -> v2:
    - follow SO_LOCK_FILTER semantics on filter updates
    - only accept eBPF programs of type BPF_PROG_TYPE_SOCKET_FILTER
    - rename PACKET_FANOUT_BPF to PACKET_FANOUT_CBPF to match
      man 2 bpf usage: "classic" vs. "extended" BPF.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:22:48 -07:00
Willem de Bruijn
30da679e67 selftests/net: test extended BPF fanout mode
Test PACKET_FANOUT_EBPF by inserting a program into the the kernel
with bpf(), then attaching it to the fanout group. Observe the same
payload-based distribution as in the PACKET_FANOUT_CBPF test.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:22:48 -07:00
Willem de Bruijn
95e22792fa selftests/net: test classic bpf fanout mode
Test PACKET_FANOUT_CBPF by inserting a cBPF program that selects a
socket by payload. Requires modifying the test program to send
packets with multiple payloads.

Also fix a bug in testing the return value of mmap()

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:22:48 -07:00
Willem de Bruijn
f2e520956a packet: add extended BPF fanout mode
Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
program to select a socket.

Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:22:48 -07:00
Willem de Bruijn
47dceb8ecd packet: add classic BPF fanout mode
Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program
to select a socket.

This avoids having to keep adding special case fanout modes. One
example use case is application layer load balancing. The QUIC
protocol, for instance, encodes a connection ID in UDP payload.

Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data
associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the
only user so far.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:22:47 -07:00
Marcelo Ricardo Leitner
00dcffaebf dlm: fix reconnecting but not sending data
There are cases on which lowcomms_connect_sock() is called directly,
which caused the CF_WRITE_PENDING flag to not bet set upon reconnect,
specially on send_to_sock() error handling. On this last, the flag was
already cleared and no further attempt on transmitting would be done.

As dlm tends to connect when it needs to transmit something, it makes
sense to always mark this flag right after the connect.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:21 -05:00
Marcelo Ricardo Leitner
acee4e527d dlm: replace BUG_ON with a less severe handling
BUG_ON() is a severe action for this case, specially now that DLM with
SCTP will use 1 socket per association. Instead, we can just close the
socket on this error condition and return from the function.

Also move the check to an earlier stage as it won't change and thus we
can abort as soon as possible.

Although this issue was reported when still using SCTP with 1-to-many
API, this cleanup wouldn't be that simple back then because we couldn't
close the socket and making sure such event would cease would be hard.
And actually, previous code was closing the association, yet SCTP layer
is still raising the new data event. Probably a bug to be fixed in SCTP.

Reported-by: <tan.hu@zte.com.cn>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:21 -05:00
Marcelo Ricardo Leitner
ee44b4bc05 dlm: use sctp 1-to-1 API
DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
needed but this causes it to use sctp_do_peeloff() to mimic an
kernel_accept() and this causes a symbol dependency on sctp module.

By switching it to 1-to-1 API we can avoid this dependency and also
reduce quite a lot of SCTP-specific code in lowcomms.c.

The caveat is that now DLM won't always use the same src port. It will
choose a random one, just like TCP code. This allows the peers to
attempt simultaneous connections, which now are handled just like for
TCP.

Even more sharing between TCP and SCTP code on DLM is possible, but it
is intentionally left for a later commit.

Note that for using nodes with this commit, you have to have at least
the early fixes on this patchset otherwise it will trigger some issues
on old nodes.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:20 -05:00
Marcelo Ricardo Leitner
356344c4c3 dlm: fix not reconnecting on connecting error handling
If we don't clear that bit, lowcomms_connect_sock() will not schedule
another attempt, and no further attempt will be done.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:19 -05:00
Marcelo Ricardo Leitner
0d737a8cfd dlm: fix race while closing connections
When a connection have issues DLM may need to close it.  Therefore we
should also cancel pending workqueues for such connection at that time,
and not just when dlm is not willing to use this connection anymore.

Also, if we don't clear CF_CONNECT_PENDING flag, the error handling
routines won't be able to re-connect as lowcomms_connect_sock() will
check for it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:19 -05:00
Marcelo Ricardo Leitner
28926a0965 dlm: fix connection stealing if using SCTP
When using SCTP and accepting a new connection, DLM currently validates
if the peer trying to connect to it is one of the cluster nodes, but it
doesn't check if it already has a connection to it or not.

If it already had a connection, it will be overwritten, and the new one
will be used for writes, possibly causing the node to leave the cluster
due to communication breakage.

Still, one could DoS the node by attempting N connections and keeping
them open.

As said, but being explicit, both situations are only triggerable from
other cluster nodes, but are doable with only user-level perms.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2015-08-17 16:22:15 -05:00
Luiz Capitulino
75e3b37d05 hrtimer: Drop return code of hrtimer_switch_to_hres()
It's not checked by the caller.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Link: http://lkml.kernel.org/r/20150811164043.538241ef@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17 23:19:03 +02:00
Jiri Benc
a1c234f95c lwtunnel: rename ip lwtunnel attributes
We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
very similar, yet they serve very different purpose. This is confusing for
anyone trying to implement a user space tool supporting lwt.

As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
logical to have them in lwtunnel.h together with the encap enum.

Fixes: 3093fbe7ff ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:07:15 -07:00
Guenter Roeck
62ee783bf1 smsc911x: Fix crash seen if neither ACPI nor OF is configured or used
Commit 0b50dc4fc9 ("Convert smsc911x to use ACPI as well as DT") makes
the call to smsc911x_probe_config() unconditional, and no longer fails if
there is no device node. device_get_phy_mode() is called unconditionally,
and if there is no phy node configured returns an error code. This error
code is assigned to phy_interface, and interpreted elsewhere in the code
as valid phy mode. This in turn causes qemu to crash when running a
variant of realview_pb_defconfig.

	qemu: hardware error: lan9118_read: Bad reg 0x86

Fixes: 0b50dc4fc9 ("Convert smsc911x to use ACPI as well as DT")
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:06:16 -07:00
David S. Miller
c87acb2558 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-08-17

1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
   From Thomas Egerer.

2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
   From Andrzej Hajda.

3) Pass oif to the xfrm lookups so that it gets set on the flow
   and the resolver routines can match based on oif.
   From David Ahern.

4) Add documentation for the new xfrm garbage collector threshold.
   From Alexander Duyck.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:05:14 -07:00
David S. Miller
a27cc68b9b We have a single bugfix for an invalid memory read.
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVzc58AAoJEDBSmw7B7bqrhkQP/1bgnJHOtAthUygVCUDN3p6J
 rbmyfmszcWqPgYWMh4sfBRJQPWBpq4Q4ZPlVNCU8VkFoJRXemqWsOjnpMro0Vi8Y
 omTMPf1ph5WAUhCTjR6d+J7M5uHVzp1ougeaDEHY6QPTQ0rNTQ4W9gR+TFRSSTG2
 gVIuledUXQgkBW6nm3mT3bxSwMkK6PnPy6Dpkzm4VFt3JtO0ZpSV+RzMLifmNf1r
 25lYiUUyimWbJGvZCMV9O2qKWwb+9AIYrItW74yUV2dRqqBZmP+IqJQ2p2mTWcjE
 YBToZA5Yrd6iSfuKC/wd2kt3NkmsGdNGwFSy4tI6WjBkKKLygOKBvcpBzcNDgv06
 ZB4mcyfac+KPiDA1QFvwV+OrU1u63wWDyfjkGCVkoq6WJkUxUZ4JVOq/7LolrxK8
 d0mJojOMjTlr2Q+7KMINl9HDw0MhkYqY/fJv+AHOlcIGc2bVrJF0n9fWsc9gKdPu
 c+2lIJLxzuojapXJXj+XpF6iPjh92F83W6iPbXZfu6AzvYQwGwve87CckurwElYY
 bCDV0nXzmn5DYwEOuKtiflbmLW0qlXbjhX27B4yHmXsS3/Q7M/pMUy3mj4pchmxn
 Iwr8pVoCgn8Z1hDJvyYPb4l0ERjY/MZghBJg+Yp4Q7CEWGOmyIPdi0visA9FN/xk
 ekOYrjW9Jz3AlgMDPVCC
 =uehl
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
We have a single bugfix for an invalid memory read.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:03:10 -07:00
Bas Nieuwenhuizen
05906dec7d drm/amdgpu: wait on page directory changes. v2
Pagetables can be moved and therefore the page directory update can be necessary
for the current cs even if none of the the bo's are moved. In that scenario
there is no fence between the sdma0 and gfx ring, so we add one.

v2 (chk): rebased

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-17 16:51:23 -04:00
Thierry Reding
b325a789c7 drm/amdgpu: Select BACKLIGHT_LCD_SUPPORT
Explicitly select BACKLIGHT_LCD_SUPPORT to satisfy the direct dependency
of BACKLIGHT_CLASS_DEVICE.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-17 16:51:23 -04:00
Thierry Reding
3361052790 drm/radeon: Select BACKLIGHT_LCD_SUPPORT
Explicitly select BACKLIGHT_LCD_SUPPORT to satisfy the direct dependency
of BACKLIGHT_CLASS_DEVICE.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-17 16:51:22 -04:00
Christian König
432a4ff8b7 drm/amdgpu: cleanup sheduler rq handling v2
Rework run queue implementation, especially remove the odd list handling.

v2: cleanup the code only, no algorithem change.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
2015-08-17 16:51:22 -04:00
Chunming Zhou
c3b95d4f9e drm/amdgpu: move prepare work out of scheduler to cs_ioctl
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian K?nig <christian.koenig@amd.com>
2015-08-17 16:51:21 -04:00
Chunming Zhou
1c8f805af9 drm/amdgpu: fix unnecessary wake up
decrease CPU extra overhead.

Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Jammy Zhou <Jammy.Zhou@amd.com>
Reviewed-by: Christian K?nig <christian.koenig@amd.com>
2015-08-17 16:51:20 -04:00
monk.liu
6d1d0ef743 drm/amdgpu: fix duplicated mapping invoke bug
fix the bug that there is duplicated bo_update_mapping issued

Signed-off-by: monk.liu <monk.liu@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-08-17 16:51:20 -04:00
monk.liu
1939e3e265 drm/amdgpu: drop bo_list_clone when no scheduler
bo_list_clone() will take a lot of time when bo_list hold too much
elements, like above 7000

Signed-off-by: Monk.Liu <monk.liu@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Jammy Zhou <jammy.zhou@amd.com>
2015-08-17 16:51:19 -04:00
Alex Deucher
a895c222e7 drm/amdgpu: disable GPU reset by default
It's not validated yet and causes more harm than good.
Avoids spurious resets.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-17 16:51:19 -04:00
monk.liu
a8f5bf0b22 drm/amdgpu: fix type mismatch error
remaining timeout returned by amdgpu_fence_wait_any can be larger than
max int value, thus the truncated 32 bit value in r ends up being
negative while its original long value is positive.

Signed-off-by: monk.liu <monk.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jammy Zhou <jammy.zhou@amd.com>
2015-08-17 16:51:18 -04:00
Chunming Zhou
281b422301 drm/amdgpu: add reference for **fence
fix fence is released when pass to **fence sometimes.
add reference for it.

Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian K?nig <christian.koenig@amd.com>
2015-08-17 16:51:17 -04:00
Christian König
1ffd265243 drm/amdgpu: fix waiting for all fences before flipping
Otherwise we might see corruption.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-08-17 16:51:17 -04:00