Commit graph

427804 commits

Author SHA1 Message Date
Yuan Zhong
5514f0aadd f2fs: remove the needless parameter of f2fs_wait_on_page_writeback
"boo sync" parameter is never referenced in f2fs_wait_on_page_writeback.
We should remove this parameter.

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 17:45:54 +09:00
Roland Dreier
c30392ab5b IB/usnic: Fix typo "Ignorning" -> "Ignoring"
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:46 -08:00
Upinder Malhi
9f637f7936 IB/usnic: Expose flows via debugfs
Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:46 -08:00
Upinder Malhi
c5f855e08a IB/usnic: Use for_each_sg instead of a for-loop
Use for_each_sg() instead of an explicit for-loop to iterate over
scatter-gather list.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:46 -08:00
Upinder Malhi
6a54d9f9a0 IB/usnic: Remove superflous parentheses
Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:45 -08:00
Upinder Malhi
248567f793 IB/core: Add RDMA_TRANSPORT_USNIC_UDP
Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:45 -08:00
Upinder Malhi
e45e614e40 IB/usnic: Add UDP support in usnic_ib_qp_grp.[hc]
UDP support for qp_grps/qps.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:44 -08:00
Upinder Malhi
c7845bcafe IB/usnic: Add UDP support in u*verbs.c, u*main.c and u*util.h
Add supports for:
	1) Parsing the socket file descriptor pass down from userspace.
	2) IP notifiers
	3) Encoding the IP in the GID
	4) Other aux. changes to support UDP

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:44 -08:00
Upinder Malhi
6214105460 IB:usnic: Add UDP support to usnic_transport.[hc]
This patch provides API for rest of usNIC code to increment or decrement
socket's reference count. Auxiliary socket APIs are also provided.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:44 -08:00
Upinder Malhi
3f92bed3d6 IB/usnic: Add UDP support to usnic_fwd.[hc]
Add *ip field* to *struct usnic_fwd_dev* as well as new *functions* to
manipulate the *ip field.*  Furthermore, add new functions for
programming UDP flows in the forwarding device.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:43 -08:00
Upinder Malhi
b85caf479b IB/usnic: Update ABI and Version file for UDP support
Expand the kernel/userspace interface so userspace may push down
a socket file descriptor to usNIC.  Also, bump up the abi and version
numbers.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:43 -08:00
Upinder Malhi
60b215e8b2 IB/usnic: Port over sysfs to new usnic_fwd.h
This patch ports usnic_ib_sysfs.c to the new interface of
usnic_fwd.h.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:42 -08:00
Upinder Malhi
256d6a6ac5 IB/usnic: Port over usnic_ib_qp_grp.[hc] to new usnic_fwd.h
This patch ports usnic_ib_qp_grp.[hc] to the new interface
of usnic_fwd.h.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:42 -08:00
Upinder Malhi
8af94ac66a IB/usnic: Port over main.c and verbs.c to the usnic_fwd.h
This patch ports usnic_ib_main.c, usnic_ib_verbs.c and usnic_ib.h
to the new interface of usnic_fwd.h.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:42 -08:00
Upinder Malhi
2183b990b6 IB/usnic: Push all forwarding state to usnic_fwd.[hc]
Push all of the usnic device forwarding state - such as mtu, mac - to
usnic_fwd_dev.  Furthermore, usnic_fwd.h exposes a improved interface
for rest of the usnic code.  The primary improvement is that
usnic_fwd.h's flow management interface takes in high-level *filter*
and *action* structures now, instead of low-level paramaters such as
vnic_idx, rq_idx.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:41 -08:00
Upinder Malhi
301a0dd68e IB/usnic: Add struct usnic_transport_spec
Add *struct usnic_transport_spec* for passing around transport
specifications.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:41 -08:00
Upinder Malhi
8192d4acb5 IB/usnic: Change WARN_ON to lockdep_assert_held
usNIC calls WARN_ON(spin_is_locked..) at few places.  In some of these
instances, the call is made while holding a spinlock.  Change
all WARN_ON(spin_is_locked...) calls in usNIC to
lockdep_assert_held to make it fool-proof bc the latter can be
called while holding a spinlock and unlike spin_is_locked,
lockdep_assert_held also works correctly on UP.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:40 -08:00
Upinder Malhi
e3cf00d0a8 IB/usnic: Add Cisco VIC low-level hardware driver
This adds a driver that allows userspace to use UD-like QPs over a
proprietary Cisco transport with Cisco's Virtual Interface Cards (VICs),
including VIC 1240 and 1280 cards.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:28 -08:00
Ivaylo Dimitrov
e4998634dd OMAPDSS: DISPC: Fix 34xx overlay scaling calculation
commit 7faa92339b OMAPDSS: DISPC: Handle
synclost errors in OMAP3 introduces limits check to prevent SYNCLOST errors
on OMAP3. However, it misses the logic found in Nokia kernels that is
needed to correctly calculate whether 3 tap or 5 tap rescaler to be used as
well as the logic to fallback to 3 taps if 5 taps clock results in too
tight horizontal timings. Without that patch "horizontal timing too tight"
errors are seen when a video with resolution above 640x350 is tried to be
played. The patch is a forward-ported logic found in Nokia N900 and N9/50
kernels.

Signed-off-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2014-01-14 10:06:45 +02:00
WANG Cong
b86f81cca9 bridge: move br_net_exit() to br.c
And it can become static.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:42:39 -08:00
Bjørn Mork
fdc3452cd2 net: usbnet: fix SG initialisation
Commit 60e453a940 ("USBNET: fix handling padding packet")
added an extra SG entry in case padding is necessary, but
failed to update the initialisation of the list. This can
cause list traversal to fall off the end of the list,
resulting in an oops.

Fixes: 60e453a940 ("USBNET: fix handling padding packet")
Reported-by: Thomas Kear <thomas@kear.co.nz>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:32:04 -08:00
dingtianhong
ae237b3ede net: 3com: fix warning for incorrect type in argument
The commit c466a9b2b3
(net: 3com: slight optimization of addr compare)
cause a warning: "passing argument 1 of 'ether_addr_equal'
from incompatible pointer type", so fix it.

I think julia will convert ether_addr_equal to ether_addr_equal_64bits later.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:30:35 -08:00
dingtianhong
e2072cdfb5 net: qlcnic: fix warning for incorrect type in argument
The commit 6878f79a8b
(net: qlcnic: slight optimization of addr compare)
cause a warning "sparse: incorrect type in argument 2
(different type sizes)", so fix it.

I think julia will convert ether_addr_equal to ether_addr_equal_64bits later.

Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:30:35 -08:00
Sergei Shtylyov
090d560fc4 sh_eth: fix garbled TX error message
sh_eth_error() in case of a TX error tries to print a message using 2 dev_err()
calls with the first string not finished by '\n', so that the resulting message
would inevitably come out garbled, with something like "3net eth0: " inserted
in the middle.  Avoid that by merging 2 calls into one.

While at it, insert an empty line after the nearby declaration.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:28:48 -08:00
David S. Miller
aef2b45fe4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Steffen Klassert says:

====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.

The version from net-next can be used, like it is done in linux-next.

1) Checkpatch cleanups, from Weilong Chen.

2) Fix lockdep complaints when pktgen is used with IPsec,
   from Fan Du.

3) Update pktgen to allow any combination of IPsec transport/tunnel mode
   and AH/ESP/IPcomp type, from Fan Du.

4) Make pktgen_dst_metrics static, Fengguang Wu.

5) Compile fix for pktgen when CONFIG_XFRM is not set,
   from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:14:25 -08:00
Brian Norris
e2e6b7b7d6 mtd: nand: use __packed shorthand
To be consistent with the rest of include/linux/mtd/nand.h, we should
use the __packed shorthand instead of __attribute__((packed)).

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Huang Shijie <b32955@freescale.com>
2014-01-13 23:13:31 -08:00
Brian Norris
8429bb3975 mtd: nand: support Micron READ RETRY
Micron provides READ RETRY support via the ONFI vendor-specific
parameter block (to indicate how many read-retry modes are available)
and the ONFI {GET,SET}_FEATURES commands with a vendor-specific feature
address (to support reading/switching the current read-retry mode).

The recommended sequence is as follows:

  1. Perform PAGE_READ operation
  2. If no ECC error, we are done
  3. Run SET_FEATURES with feature address 89h, mode 1
  4. Retry PAGE_READ operation
  5. If ECC error and there are remaining supported modes, increment the
     mode and return to step 3. Otherwise, this is a true ECC error.
  6. Run SET_FEATURES with feature address 89h, mode 0, to return to the
     default state.

This patch implements the chip->setup_read_retry() callback for
Micron and fills in the chip->read_retries.

Tested on Micron MT29F32G08CBADA, which supports 8 read-retry modes.

The Micron vendor-specific table was checked against the datasheets for
the following Micron NAND:

Needs retry   Cell-type    Part number          Vendor revision    Byte 180
-----------   ---------    ----------------     ---------------    ------------
No            SLC          MT29F16G08ABABA      1                  Reserved (0)
No            MLC          MT29F32G08CBABA      1                  Reserved (0)
No            SLC          MT29F1G08AACWP       1                  0
Yes           MLC          MT29F32G08CBADA      1                  08h
Yes           MLC          MT29F64G08CBABA      2                  08h

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Huang Shijie <b32955@freescale.com>
2014-01-13 23:13:05 -08:00
Brian Norris
ba84fb5952 mtd: nand: add generic READ RETRY support
Modern MLC (and even SLC?) NAND can experience a large number of
bitflips (beyond the recommended correctability capacity) due to drifts
in the voltage threshold (Vt). These bitflips can cause ECC errors to
occur well within the expected lifetime of the flash. To account for
this, some manufacturers provide a mechanism for shifting the Vt
threshold after a corrupted read.

The generic pattern seems to be that a particular flash has N read retry
modes (where N = 0, traditionally), and after an ECC failure, the host
should reconfigure the flash to use the next available mode, then retry
the read operation. This process repeats until all bitfips can be
corrected or until the host has tried all available retry modes.

This patch adds the infrastructure support for a
vendor-specific/flash-specific callback, used for setting the read-retry
mode (i.e., voltage threshold).

For now, this patch always returns the flash to mode 0 (the default
mode) after a successful read-retry, according to the flowchart found in
Micron's datasheets. This may need to change in the future if it is
determined that eventually, mode 0 is insufficient for the majority of
the flash cells (and so for performance reasons, we should leave the
flash in mode 1, 2, etc.).

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Huang Shijie <b32955@freescale.com>
2014-01-13 23:12:58 -08:00
Brian Norris
6f0065b012 mtd: nand: add ONFI vendor block for Micron
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Huang Shijie <b32955@freescale.com>
2014-01-13 23:12:53 -08:00
Brian Norris
b72f3dfb8c mtd: nand: localize ECC failures per page
ECC failures can be tracked at the page level, not the do_read_ops level
(i.e., a potentially multi-page transaction).

This helps prepare for READ RETRY support.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Huang Shijie <b32955@freescale.com>
2014-01-13 23:12:38 -08:00
Neal Cardwell
70315d22d3 inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
(not just TIME_WAIT), and for such sockets the tw_substate field holds
the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.

This brings the inet_diag state-matching code in line with the field
it uses to populate idiag_state. This is also analogous to the info
exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
and get_timewait4_sock() exports tw->tw_substate.

Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
state time-wait" would also return sockets in state TCP_FIN_WAIT2.

This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:35:46 -08:00
David S. Miller
2afe02eed1 Merge branch 'bonding_rcu'
Veaceslav Falico says:

====================
bonding: fix bond_3ad RCU usage

While digging through bond_3ad.c I've found that the RCU usage there is
just wrong - it's used as a kind of mutex/spinlock instead of RCU.

v3->v4: remove useless goto and wrap __get_first_agg() in proper RCU.

v2->v3: make bond_3ad_set_carrier() use RCU read lock for the whole
function, so that all other functions will be protected by RCU as well.
This way we can use _rcu variants everywhere.

v1->v2: use generic primitives instead of _rcu ones cause we can hold RTNL
lock without RCU one, which is still safe.

This patchset is on top of bond_3ad.c cleanup:
http://www.spinics.net/lists/netdev/msg265447.html
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:22:35 -08:00
Veaceslav Falico
49b7624eda bonding: fix __get_active_agg() RCU logic
Currently, the implementation is meaningless - once again, we take the
slave structure and use it after we've exited RCU critical section.

Fix this by removing the rcu_read_lock() from __get_active_agg(), and
ensuring that all its callers are holding RCU.

Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:22:27 -08:00
Veaceslav Falico
768b954922 bonding: fix __get_first_agg RCU usage
Currently, the RCU read lock usage is just wrong - it gets the slave struct
under RCU and continues to use it when RCU lock is released.

However, it's still safe to do this cause we didn't need the
rcu_read_lock() initially - all of the __get_first_agg() callers are either
holding RCU read lock or the RTNL lock, so that we can't sync while in it.

Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:22:27 -08:00
Veaceslav Falico
c1bc9644ec bonding: fix bond_3ad_set_carrier() RCU usage
Currently, its usage is just plainly wrong. It first gets a slave under
RCU, and, after releasing the RCU lock, continues to use it - whilst it can
be freed.

Fix this by ensuring that bond_3ad_set_carrier() holds RCU till it uses its
slave (or its agg).

Fixes: be79bd048a ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:22:27 -08:00
Stephen Rothwell
f549ed1abc arch: Re-sort some Kbuild files to hopefully help avoid some conflicts
Checkin:

    93ea02bb84 arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h

... unfortunately left some Kbuild files out of order, which caused
unnecessary merge conflicts, in particular with checkin:

    e3fec2f74f lib: Add missing arch generic-y entries for asm-generic/hash.h

Put them back in order to make the upcoming merges cleaner.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20140114164420.d296fbcc4be3a5f126c86069@canb.auug.org.au
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Miller <davem@davemloft.net>
2014-01-13 21:56:54 -08:00
David S. Miller
853dc21bfe Included changes:
- drop dependency against CRC16
 - move to new release version
 - add size check at compile time for packet structs
 - update copyright years in every file
 - implement new bonding/interface alternation feature
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS0p53AAoJEEKTMo6mOh1VZ7sP/RDgh5PMXC5l1LNTG9gtsV5Z
 zzqs+kEfeTCivcMtONXhBhuli7wlW3eZehscB/Hn6VOKf40Ktwb0tNjrJZ+OMEHC
 hwR7i56ucNOKA5L1cswWwprIY3tV3dK+tI/Y7Oc0d/HAkhU2j3wHWdzCdUMa2yBp
 PurZZrRFXrqcKtIKP+AK1DkkQ2TyUZVNB6dZDf1AifMxhcfFf6Vxg1JGj6AKgvKF
 zf9q8SC5x33qGfvx4VML1b0JNChAwt01PecY/Eo154W3eOWHNXh9UxQEQBRoChbJ
 A/hCoo/LWGFeyqZwEeWBIjR+nGi/5zbCg310FGTgjWZaJ+BBD/mjKAjDhtszX2sI
 Lf0TJ7ytfCn/qij0Xmqg3R5RnmstJpb+weZ7gMqk63o9I06RtvV6/x58tPB+7+Y1
 AJ5egjL0yUypn7LtFUL3z1S7Np6m+FC9KuH47Yc1FyXR8tSwDWYszdlAloqPeYVI
 CDMw73T+6vsWr2UPBnXqudy0BtG3XT8LFXAUC9GMkz0k8GY8RZK7MeUun9Sax+ra
 c7MC+yZYDhwKgNSiEw3J86n0ozsIGVCheAQGuenmdicQdirJa8KImj1CZCoGqWBk
 kWsyNhw68SVLZFSfCiKPjJjbWijpVfQKM/TSB30Cmj2L1hyWeEtBn0McHig6srdX
 hf1Xh4tFh1yGwMYdCcAq
 =BKos
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- drop dependency against CRC16
- move to new release version
- add size check at compile time for packet structs
- update copyright years in every file
- implement new bonding/interface alternation feature

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 21:50:27 -08:00
Stephen Rothwell
6806afc9aa net: resort some Kbuild files to hopefully help avoid some conflicts
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 21:48:16 -08:00
NeilBrown
830778a180 md: ensure metadata is writen after raid level change.
level_store() currently does not make sure the metadata is
updates to reflect the new raid level.  It simply sets MD_CHANGE_DEVS.

Any level with a ->thread will quickly notice this and update the
metadata.  However RAID0 and Linear do not have a thread so no
metadata update happens until the array is stopped.  At that point the
metadata is written.

This is later that we would like.  While the delay doesn't risk any
data it can cause confusion.  So if there is no md thread, immediately
update the metadata after a level change.

Reported-by: Richard Michael <rmichael@edgeofthenet.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
NeilBrown
0b59bb6422 md/raid10: avoid fullsync when not necessary.
This is the raid10 equivalent of

commit 4f0a5e012c
    MD RAID1: Further conditionalize 'fullsync'

If a device in a newly assembled array is not fully recovered we
currently do a fully resync by don't need to.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
NeilBrown
7eb418851f md: allow a partially recovered device to be hot-added to an array.
When adding a new device into an array it is normally important to
clear any stale data from ->recovery_offset else the new device may
not be recovered properly.

However when re-adding a device which is known to be nearly in-sync,
this is not needed and can be detrimental.  The (bitmap-based)
resync will still happen, and further recovery is only needed from
where-ever it was already up to.

So if save_raid_disk is set, signifying a re-add, don't clear
->recovery_offset.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
NeilBrown
f466722ca6 md: Change handling of save_raid_disk and metadata update during recovery.
Since commit d70ed2e4fa
   MD: Allow restarting an interrupted incremental recovery.

we don't write out the metadata to devices while they are recovering.
This had a good reason, but has unfortunate consequences.  This patch
changes things to make them work better.

At issue is what happens if the array is shut down while a recovery is
happening, particularly a bitmap-guided recovery.
Ideally the recovery should pick up where it left off.
However the metadata cannot represent the state "A recovery is in
process which is guided by the bitmap".

Before the above mentioned commit, we wrote metadata to the device
which said "this is being recovered and it is up to <here>".  So after
a restart, a full recovery (not bitmap-guided) would happen from
where-ever it was up to.

After the commit the metadata wasn't updated so it still said "This
device is fully in sync with <this> event count".  That leads to a
bitmap-based recovery following the whole bitmap, which should be a
lot less work than a full recovery from some starting point.  So this
was an improvement.

However updates some metadata but not all leads to other problems.
In particular, the metadata written to the fully-up-to-date device
record that the array has all devices present (even though some are
recovering).  So on restart, mdadm wants to find all devices and
expects them to have current event counts.
Obviously it doesn't (some have old event counts) so (when assembling
with --incremental) it waits indefinitely for the rest of the expected
devices.

It really is wrong to not update all the metadata together.  Do that
is bound to cause confusion.
Instead, we should make it possible to record the truth in the
metadata.  i.e. we need to be able to record that a device is being
recovered based on the bitmap.
We already have a Feature flag to say that recovery is happening.  We
now add another one to say that it is a bitmap-based recovery.

With this we can remove the code that disables the write-out of
metadata on some devices.

So this patch:
 - moves the setting of 'saved_raid_disk' from add_new_disk to
   the validate_super methods.  This makes sure it is always set
   properly, both when adding a new device to an array, and when
   assembling an array from a collection of devices.
 - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
   used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
   bitmap-based recovery is allowed.
   This is only present in v1.x metadata. v0.90 doesn't support
   devices which are in the middle of recovery at all.
 - Only skips writing metadata to Faulty devices.

 - Also allows rdev state to be set to "-insync" via sysfs.
   This can be used for external-metadata arrays.  When the
   'role' is set the device is assumed to be in-sync.  If, after
   setting the role, we set the state to "-insync", the role is
   moved to saved_raid_disk which effectively says the device is
   partly in-sync with that slot and needs a bitmap recovery.

Cc: Andrei Warkentin <andreiw@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
NeilBrown
8313b8e57f md: fix problem when adding device to read-only array with bitmap.
If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.

If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.

However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update.  We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.

This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag.  If that is set,
then the device will not be added to a read-only array.

Cc: Andrei Warkentin <andreiw@vmware.com>
Fixes: d70ed2e4fa
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:08 +11:00
NeilBrown
e8b8491585 md/raid10: fix bug when raid10 recovery fails to recover a block.
commit e875ecea26
    md/raid10 record bad blocks as needed during recovery.

added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.

So move the freeing of r10bio down where it is safe.

Cc: stable@vger.kernel.org (v3.1+)
Fixes: e875ecea26
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:08 +11:00
NeilBrown
5af9bef72c md/raid5: fix a recently broken BUG_ON().
commit 6d183de407
    md/raid5: fix newly-broken locking in get_active_stripe.

simplified a BUG_ON, but removed too much so now it sometimes fires
when it shouldn't.

When the STRIPE_EXPANDING flag is set, the stripe_head might be on a
special list while multiple stripe_heads are collected, or it might
not be on any list, even a 'free' list when the refcount is zero.  As
long as STRIPE_EXPANDING is set, it will be found and added back to a
list eventually.

So both of the BUG_ONs which test for the ->lru being empty or not
need to avoid the case where STRIPE_EXPANDING is set.

The patch which broke this was marked for -stable, so this patch needs
to be applied to any branch that received 6d183de4

Fixes: 6d183de407
Cc: stable@vger.kernel.org (any release to which above was applied)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
NeilBrown
41a336e011 md/raid1: fix request counting bug in new 'barrier' code.
The new iobarrier implementation in raid1 (which keeps normal writes
and resync activity separate) counts every request what is not before
the current resync point in either next_window_requests or
current_window_requests.
It flags that the request is counted by setting ->start_next_window.

allow_barrier follows this model exactly and decrements one of the
*_window_requests if and only if ->start_next_window is set.

However wait_barrier(), which increments *_window_requests uses a
slightly different test for setting -.start_next_window (which is set
from the return value of this function).
So there is a possibility of the counts getting out of sync, and this
leads to the resync hanging.

So change wait_barrier() to return a non-zero value in exactly the
same cases that it increments *_window_requests.

But was introduced in 3.13-rc1.

Reported-by: Bruno Wolff III <bruno@wolff.to>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68061
Fixes: 79ef3a8aa1
Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
NeilBrown
b50c259e25 md/raid10: fix two bugs in handling of known-bad-blocks.
If we discover a bad block when reading we split the request and
potentially read some of it from a different device.

The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
   seen by comparison with raid1.c

This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.

Cc: stable@vger.kernel.org (v3.1+)
Fixes: 856e08e237
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
NeilBrown
1cc03eb932 md/raid5: Fix possible confusion when multiple write errors occur.
commit 5d8c71f9e5
    md: raid5 crash during degradation

Fixed a crash in an overly simplistic way which could leave
R5_WriteError or R5_MadeGood set in the stripe cache for devices
for which it is no longer relevant.
When those devices are removed and spares added the flags are still
set and can cause incorrect behaviour.

commit 14a75d3e07
    md/raid5: preferentially read from replacement device if possible.

Fixed the same bug if a more effective way, so we can now revert
the original commit.

Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: stable@vger.kernel.org (3.2+ - 3.2 will need a different fix though)
Fixes: 5d8c71f9e5
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
AKASHI Takahiro
06bdadd763 audit: correct a type mismatch in audit_syscall_exit()
audit_syscall_exit() saves a result of regs_return_value() in intermediate
"int" variable and passes it to __audit_syscall_exit(), which expects its
second argument as a "long" value.  This will result in truncating the
value returned by a system call and making a wrong audit record.

I don't know why gcc compiler doesn't complain about this, but anyway it
causes a problem at runtime on arm64 (and probably most 64-bit archs).

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:38:19 -05:00
Eric Paris
1ce319f11c audit: reorder AUDIT_TTY_SET arguments
An admin is likely to want to see old and new values next to each other.
Putting all of the old values followed by all of the new values is just
hard to read as a human.

Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:33:41 -05:00