Commit graph

89223 commits

Author SHA1 Message Date
David S. Miller
7c92d61eca Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, most relevantly they are:

1) Extend nft_exthdr to allow to match TCP options bitfields, from
   Manuel Messner.

2) Allow to check if IPv6 extension header is present in nf_tables,
   from Phil Sutter.

3) Allow to set and match conntrack zone in nf_tables, patches from
   Florian Westphal.

4) Several patches for the nf_tables set infrastructure, this includes
   cleanup and preparatory patches to add the new bitmap set type.

5) Add optional ruleset generation ID check to nf_tables and allow to
   delete rules that got no public handle yet via NFTA_RULE_ID. These
   patches add the missing kernel infrastructure to support rule
   deletion by description from userspace.

6) Missing NFT_SET_OBJECT flag to select the right backend when sets
   stores an object map.

7) A couple of cleanups for the expectation and SIP helper, from Gao
   feng.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12 22:11:43 -05:00
Pablo Neira Ayuso
1a94e38d25 netfilter: nf_tables: add NFTA_RULE_ID attribute
This new attribute allows us to uniquely identify a rule in transaction.
Robots may trigger an insertion followed by deletion in a batch, in that
scenario we still don't have a public rule handle that we can use to
delete the rule. This is similar to the NFTA_SET_ID attribute that
allows us to refer to an anonymous set from a batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12 14:45:13 +01:00
Pablo Neira Ayuso
8c4d4e8b56 netfilter: nfnetlink: allow to check for generation ID
This patch allows userspace to specify the generation ID that has been
used to build an incremental batch update.

If userspace specifies the generation ID in the batch message as
attribute, then nfnetlink compares it to the current generation ID so
you make sure that you work against the right baseline. Otherwise, bail
out with ERESTART so userspace knows that its changeset is stale and
needs to respin. Userspace can do this transparently at the cost of
taking slightly more time to refresh caches and rework the changeset.

This check is optional, if there is no NFNL_BATCH_GENID attribute in the
batch begin message, then no check is performed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12 14:45:11 +01:00
Julian Anastasov
c16ec18599 net: rename dst_neigh_output back to neigh_output
After the dst->pending_confirm flag was removed, we do not
need anymore to provide dst arg to dst_neigh_output.
So, rename it to neigh_output as before commit 5110effee8
("net: Do delayed neigh confirmation.").

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 21:25:18 -05:00
Sainath Grandhi
9a393b5d59 tap: tap as an independent module
This patch makes tap a separate module for other types of virtual interfaces, for example,
ipvlan to use.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
d9f1f61c08 tap: Extending tap device create/destroy APIs
Extending tap APIs get/free_minor and create/destroy_cdev to handle more than one
type of virtual interface.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
6fe3faf867 tap: Abstract type of virtual interface from tap implementation
macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
ebc05ba7e8 tap: Tap character device creation/destroy API
This patch provides tap device create/destroy APIs in tap.c.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
635b8c8ecd tap: Renaming tap related APIs, data structures, macros
Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* to tap_.*

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Sainath Grandhi
a8e0469873 tap: Refactoring macvtap.c
macvtap module has code for tap/queue management and link management. This patch splits
the code into macvtap_main.c for link management and tap.c for tap/queue management.
Functionality in tap.c can be re-used for implementing tap on other virtual interfaces.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
David S. Miller
35eeacf182 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
Linus Torvalds
1ee18329fa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If the timing is wrong we can indefinitely stop generating new ipv6
    temporary addresses, from Marcus Huewe.

 2) Don't double free per-cpu stats in ipv6 SIT tunnel driver, from Cong
    Wang.

 3) Put protections in place so that AF_PACKET is not able to submit
    packets which don't even have a link level header to drivers. From
    Willem de Bruijn.

 4) Fix memory leaks in ipv4 and ipv6 multicast code, from Hangbin Liu.

 5) Don't use udp_ioctl() in l2tp code, UDP version expects a UDP socket
    and that doesn't go over very well when it is passed an L2TP one.
    Fix from Eric Dumazet.

 6) Don't crash on NULL pointer in phy_attach_direct(), from Florian
    Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  l2tp: do not use udp_ioctl()
  xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
  NET: mkiss: Fix panic
  net: hns: Fix the device being used for dma mapping during TX
  net: phy: Initialize mdio clock at probe function
  igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()
  xen-netfront: Improve error handling during initialization
  sierra_net: Skip validating irrelevant fields for IDLE LSIs
  sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
  kcm: fix 0-length case for kcm_sendmsg()
  xen-netfront: Rework the fix for Rx stall during OOM and network stress
  net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
  net: thunderx: Fix PHY autoneg for SGMII QLM mode
  net: dsa: Do not destroy invalid network devices
  ping: fix a null pointer dereference
  packet: round up linear to header len
  net: introduce device min_header_len
  sit: fix a double free on error path
  lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled
  ipv6: addrconf: fix generation of new temporary addresses
2017-02-10 14:44:49 -08:00
Linus Torvalds
a9dbf5c8d4 Third round of -rc fixes for 4.10 kernel
- Two security related issues in the rxe driver
 - One compile issue in the RDMA uapi header
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYm1pNAAoJELgmozMOVy/dszkQAKJN8JedNNuKLyF2BbjXR7Dv
 KsxVuXmdVpF8FPxrPlVUA76o26yEWfxVtJwEHi4hEVLmTXKHcd8EkL2o7geR7hTB
 +j2J6HuH7e4y6ATX9H2o78fg1SRIWZJij4JdXilZRQ3pKj5DcmYynklBqqpMJ8Su
 c2ceE5nbJdpbIhPD3yulmGVF/89zntaBVbh07D1O6rdbaSNwuc+wv0XdfJ+KUXju
 ZfylACJBnMsIjxyuqZPV4djs91CI/tbgArZmh5tvgF+V9Gx6Vocbv90kS+BtbKH0
 srX9MyBrSycY8eELhbFAg7XBJXsNk4Rk0yMuMEhF2IWjwzaa7plIgeCv7NA1NWqq
 EKW0lzC0e7VV5ttjKHqe6iO+8JIC9QEi/36IqTgBBNqw0Cphocazq/mVY5fAu1uo
 qWdxbeYz3owWu47NNJ15TvaEkMbMX8ACEu4KhaT0FA+jit4czJ3PeyCLqe8aD5Pa
 AK4e2Lnj+CZb2aJN2Knh4Wu6tK9M2P0vuzHElrf0D3qe37HxaRQuZWLC9kOKplWZ
 DGrYoM94EaeTZScZ4Lo7BSol7yuYXXFkE42/TarIZuT67GNM+qss9HRDtWzDnSuD
 zX4EdJ/0kjX3SU2Em4g+7MelA4TMX74sLlEmU6iSUEWm1/pX+X9SYwT0iEmy2tR+
 SXEti5uW/K/P7e2RmzQO
 =zGBQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 "Third round of -rc fixes for 4.10 kernel:

   - two security related issues in the rxe driver

   - one compile issue in the RDMA uapi header"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  RDMA: Don't reference kernel private header from UAPI header
  IB/rxe: Fix mem_check_range integer overflow
  IB/rxe: Fix resid update
2017-02-10 14:41:16 -08:00
Jakub Kicinski
1697599ee3 bitfield.h: add FIELD_FIT() helper
Add a helper for checking at runtime that a value will fit inside
a specified field/mask.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 15:52:24 -05:00
Jiri Pirko
adf200f31c devlink: fix the name of eswitch commands
The eswitch_[gs]et command is supposed to be similar to port_[gs]et
command - for multiple eswitch attributes. However, when it was introduced
by 08f4b5918b ("net/devlink: Add E-Switch mode control") it was wrongly
named with the word "mode" in it. So fix this now, make the oririnal
enum value existing but obsolete.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 14:43:00 -05:00
David S. Miller
0d2164af26 Some more updates:
* use shash in mac80211 crypto code where applicable
  * some documentation fixes
  * pass RSSI levels up in change notifications
  * remove unused rfkill-regulator
  * various other cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYnHuZAAoJEGt7eEactAAdorYP/iIfEZjLblLZNui+93OHpmsq
 QgXeQ9Vl/Xgq8g6vEb6jpHE6pvT/xoW/jt0s5rpYpPDzP7WRgkFQI144Jflx9gE+
 7KRHYc7k9XqugbFGFakjT+DyduxrGkEhZ7Vpd39D+zlPaf+p/31Jk6C4MNwk2oG3
 AA7ARUJfOKGpct3+l+cpnWQPUYQQKrSnjnMIwHL3Tu5pvGPDapdDWXbfn6T75Aof
 3oVQSNtbEtdzW/Ty+HgDn/boOCRXzXlOCxFlE7OiH2AXfe2mqGU/hkcm/wZ0QxOf
 9m3CxVjKH10tY6nsjDiXTOzFn+Zkzuum9/gcKtsgx+6BZ4qod2XuhtzmTeXggI8F
 b7nGeadSfSS6/unUu5Fibdn+X4Cw8+Yu5Qyiwo+jLL6yyTLUg6aKN6N9HWddhBfa
 pIjWAZVA1iB2m2XqUqRB/asEhslndSSfDmZK8nruYJSZWtQBNkNUdHXqqbqbBSHv
 KtKbHMOKoLU4zKmV2vMWGy0qypZCxtZkNF6GURhMh2m89qBcIAApFwQMzK09MwmP
 d5TcMwfi8YcKRb1Gw6n7gnJsC8e+tFDGXMi9w6z0FDGZvMbOlnO3ctSd/BY3B01H
 DWimkX8Ev3kt6KKTgbJ/n0lR/vEmDGdKo2ahH1uHOTPudrjpOHUg0cdDzS/VPZqi
 Qw4+FbVArISNrI2skTrA
 =sqkq
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2017-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Some more updates:
 * use shash in mac80211 crypto code where applicable
 * some documentation fixes
 * pass RSSI levels up in change notifications
 * remove unused rfkill-regulator
 * various other cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 14:31:51 -05:00
Russell King
4d56a29f17 net: dsa: remove unnecessary phy*.h includes
Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an
unnecessary dependency for quite a large amount of the kernel.  There's
very little which actually requires definitions from phy.h in net/dsa.h
- the include itself only wants the declaration of a couple of
structures and IFNAMSIZ.

Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to
mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h
and phy_fixed.h from net/dsa.h.

This patch reduces from around 800 files rebuilt to around 40 - even
with ccache, the time difference is noticable.

Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 13:51:04 -05:00
Amir Vadai
853a14ba46 net/act_pedit: Introduce 'add' operation
This command could be useful to inc/dec fields.

For example, to forward any TCP packet and decrease its TTL:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower ip_proto tcp \
    action pedit munge ip ttl add 0xff pipe \
    action mirred egress redirect dev veth0

In the example above, adding 0xff to this u8 field is actually
decreasing it by one, since the operation is masked.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 13:18:33 -05:00
Amir Vadai
71d0ed7079 net/act_pedit: Support using offset relative to the conventional network headers
Extend pedit to enable the user setting offset relative to network
headers. This change would enable to work with more complex header
schemes (vs the simple IPv4 case) where setting a fixed offset relative
to the network header is not enough.

After this patch, the action has information about the exact header type
and field inside this header. This information could be used later on
for hardware offloading of pedit.

Backward compatibility was being kept:
1. Old kernel <-> new userspace
2. New kernel <-> old userspace
3. add rule using new userspace <-> dump using old userspace
4. add rule using old userspace <-> dump using new userspace

When using the extended api, new netlink attributes are being used. This
way, operation will fail in (1) and (3) - and no malformed rule be added
or dumped. Of course, new user space that doesn't need the new
functionality can use the old netlink attributes and operation will
succeed.
Since action can support both api's, (2) should work, and it is easy to
write the new user space to have (4) work.

The action is having a strict check that only header types and commands
it can handle are accepted. This way future additions will be much
easier.

Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
  flower \
    ip_proto tcp \
    dst_port 80 \
  action pedit munge tcp dport set 8080 pipe \
  action mirred egress redirect dev veth0

Will forward tcp port whose original dest port is 80, while modifying
the destination port to 8080.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 13:18:33 -05:00
Amir Vadai
ea6da4fd38 net/skbuff: Introduce skb_mac_offset()
Introduce skb_mac_offset() that could be used to get mac header offset.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 13:18:33 -05:00
Nogah Frankel
6d5496483f switchdev: bridge: Offload mc router ports
Offload the mc router ports list, whenever it is being changed.
It is done because in some cases mc packets needs to be flooded to all
the ports in this list.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:46:39 -05:00
Nogah Frankel
147c1e9b90 switchdev: bridge: Offload multicast disabled
Offload multicast disabled flag, for more accurate mc flood behavior:
When it is on, the mdb should be ignored.
When it is off, unregistered mc packets should be flooded to mc router
ports.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:46:38 -05:00
Jiri Pirko
cf1facda2f sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api
Creation is done in this file, move destruction to be at the same place.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:08 -05:00
Jiri Pirko
79112c26f1 sched: rename tcf_destroy to tcf_destroy_proto
This function destroys TC filter protocol, not TC filter. So name it
accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:08 -05:00
Ido Schimmel
2f3a5272e5 ipv4: fib: Add events for FIB replace and append
The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
flags to signal routes being replaced or appended.

Instead of using netlink flags for in-kernel notifications we can simply
introduce two new events in the FIB notification chain. This has the
added advantage of making the API cleaner, thereby making it clear that
these events should be supported by listeners of the notification chain.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:32:13 -05:00
Jarno Rajahalme
dd41d33f0b openvswitch: Add force commit.
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
9dd7f8907c openvswitch: Add original direction conntrack tuple to sw_flow_key.
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
cb80d58fae openvswitch: Unionize ovs_key_ct_label with a u32 array.
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Xin Long
242bd2d519 sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter
This patch is to implement Sender-Side Procedures for the Add
Outgoing and Incoming Streams Request Parameter described in
rfc6525 section 5.1.5-5.1.6.

It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section
6.3.4 for users.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
78098117f8 sctp: add support for generating stream reconf add incoming/outgoing streams request chunk
This patch is to define Add Incoming/Outgoing Streams Request
Parameter described in rfc6525 section 4.5 and 4.6. They can
be in one same chunk trunk as rfc6525 section 3.1-7 describes,
so make them in one function.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
a92ce1a42d sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter
This patch is to implement Sender-Side Procedures for the SSN/TSN
Reset Request Parameter descibed in rfc6525 section 5.1.4.

It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
for users.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
c56480a1e9 sctp: add support for generating stream reconf ssn/tsn reset request chunk
This patch is to define SSN/TSN Reset Request Parameter described
in rfc6525 section 4.3.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
9faf1c0fd5 sctp: drop unnecessary __packed from some stream reconf structures
commit 85c727b594 ("sctp: drop __packed from almost all SCTP structures")
has removed __packed from almost all SCTP structures. But there still are
three structures where it should be dropped.

This patch is to remove it from some stream reconf structures.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Linus Torvalds
55aac6ef53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This target series for v4.10 contains fixes which address a few
  long-standing bugs that DATERA's QA + automation teams have uncovered
  while putting v4.1.y target code into production usage.

  We've been running the top three in our nightly automated regression
  runs for the last two months, and the COMPARE_AND_WRITE fix Mr. Gary
  Guo has been manually verifying against a four node ESX cluster this
  past week.

  Note all of them have CC' stable tags.

  Summary:

   - Fix a bug with ESX EXTENDED_COPY + SAM_STAT_RESERVATION_CONFLICT
     status, where target_core_xcopy.c logic was incorrectly returning
     SAM_STAT_CHECK_CONDITION for all non SAM_STAT_GOOD cases (Nixon
     Vincent)

   - Fix a TMR LUN_RESET hung task bug while other in-flight TMRs are
     being aborted, before the new one had been dispatched into tmr_wq
     (Rob Millner)

   - Fix a long standing double free OOPs, where a dynamically generated
     'demo-mode' NodeACL has multiple sessions associated with it, and
     the /sys/kernel/config/target/$FABRIC/$WWN/ subsequently disables
     demo-mode, but never converts the dynamic ACL into a explicit ACL
     (Rob Millner)

   - Fix a long standing reference leak with ESX VAAI COMPARE_AND_WRITE
     when the second phase WRITE COMMIT command fails, resulting in
     CHECK_CONDITION response never being sent and se_cmd->cmd_kref
     never reaching zero (Gary Guo)

  Beyond these items on v4.1.y we've reproduced, fixed, and run through
  our regression test suite using iscsi-target exports, there are two
  additional outstanding list items:

   - Remove a >= v4.2 RCU conversion BUG_ON that would trigger when
     dynamic node NodeACLs where being converted to explicit NodeACLs.
     The patch drops the BUG_ON to follow how pre RCU conversion worked
     for this special case (Benjamin Estrabaud)

   - Add ibmvscsis target_core_fabric_ops->max_data_sg_nent assignment
     to match what IBM's Virtual SCSI hypervisor is already enforcing at
     transport layer. (Bryant Ly + Steven Royer)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  ibmvscsis: Add SGL limit
  target: Fix COMPARE_AND_WRITE ref leak for non GOOD status
  target: Fix multi-session dynamic se_node_acl double free OOPs
  target: Fix early transport_generic_handle_tmr abort scenario
  target: Use correct SCSI status during EXTENDED_COPY exception
  target: Don't BUG_ON during NodeACL dynamic -> explicit conversion
2017-02-09 13:22:54 -08:00
Luca Coelho
8585989d14 cfg80211: fix NAN bands definition
The nl80211_nan_dual_band_conf enumeration doesn't make much sense.
The default value is assigned to a bit, which makes it weird if the
default bit and other bits are set at the same time.

To improve this, get rid of NL80211_NAN_BAND_DEFAULT and add a wiphy
configuration to let the drivers define which bands are supported.
This is exposed to the userspace, which then can make a decision on
which band(s) to use.  Additionally, rename all "dual_band" elements
to "bands", to make things clearer.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-09 15:17:30 +01:00
Tejun Heo
4d59b6ccf0 cpumask: use nr_cpumask_bits for parsing functions
Commit 513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and
parsing functions") converted both cpumask printing and parsing
functions to use nr_cpu_ids instead of nr_cpumask_bits.  While this was
okay for the printing functions as it just picked one of the two output
formats that we were alternating between depending on a kernel config,
doing the same for parsing wasn't okay.

nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS.  We can always use
nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can
be more efficient to use NR_CPUS when we can get away with it.
Converting the printing functions to nr_cpu_ids makes sense because it
affects how the masks get presented to userspace and doesn't break
anything; however, using nr_cpu_ids for parsing functions can
incorrectly leave the higher bits uninitialized while reading in these
masks from userland.  As all testing and comparison functions use
nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks
can erroneously yield false negative results.

This made the taskstats interface incorrectly return -EINVAL even when
the inputs were correct.

Fix it by restoring the parse functions to use nr_cpumask_bits instead
of nr_cpu_ids.

Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org
Fixes: 513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and parsing functions")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Martin Steigerwald <martin.steigerwald@teamix.de>
Debugged-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-08 15:41:43 -08:00
Jan Kara
0911d0041c mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
Some ->page_mkwrite handlers may return VM_FAULT_RETRY as its return
code (GFS2 or Lustre can definitely do this).  However VM_FAULT_RETRY
from ->page_mkwrite is completely unhandled by the mm code and results
in locking and writeably mapping the page which definitely is not what
the caller wanted.

Fix Lustre and block_page_mkwrite_ret() used by other filesystems
(notably GFS2) to return VM_FAULT_NOPAGE instead which results in
bailing out from the fault code, the CPU then retries the access, and we
fault again effectively doing what the handler wanted.

Link: http://lkml.kernel.org/r/20170203150729.15863-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-08 15:41:43 -08:00
Ido Schimmel
982acb9756 ipv4: fib: Notify about nexthop status changes
When a multipath route is hit the kernel doesn't consider nexthops that
are DEAD or LINKDOWN when IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN is set.
Devices that offload multipath routes need to be made aware of nexthop
status changes. Otherwise, the device will keep forwarding packets to
non-functional nexthops.

Add the FIB_EVENT_NH_{ADD,DEL} events to the fib notification chain,
which notify capable devices when they should add or delete a nexthop
from their tables.

Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 15:25:18 -05:00
LABBE Corentin
6a2cac549b net: stmmac: Remove the bus_setup function pointer
The bus_setup function pointer is not used at all, this patch remove it.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 15:11:22 -05:00
Eric Dumazet
97e219b7c1 gro_cells: move to net/core/gro_cells.c
We have many gro cells users, so lets move the code to avoid
duplication.

This creates a CONFIG_GRO_CELLS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 14:38:18 -05:00
Willem de Bruijn
217e6fa24c net: introduce device min_header_len
The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd59 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 13:56:37 -05:00
Raju Lakkaraju
04d8a0a5f3 net: phy: Add LED mode driver for Microsemi PHYs.
LED Mode:
Microsemi PHY support 2 LEDs (LED[0] and LED[1]) to display different
status information that can be selected by setting LED mode.

LED Mode parameter (vsc8531, led-0-mode) and (vsc8531, led-1-mode) get
from Device Tree.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 13:29:04 -05:00
David Ahern
2bd137de53 lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled
An error was reported upgrading to 4.9.8:
    root@Typhoon:~# ip route add default table 210 nexthop dev eth0 via 10.68.64.1
    weight 1 nexthop dev eth0 via 10.68.64.2 weight 1
    RTNETLINK answers: Operation not supported

The problem occurs when CONFIG_LWTUNNEL is not enabled and a multipath
route is submitted.

The point of lwtunnel_valid_encap_type_attr is catch modules that
need to be loaded before any references are taken with rntl held. With
CONFIG_LWTUNNEL disabled, there will be no modules to load so the
lwtunnel_valid_encap_type_attr stub should just return 0.

Fixes: 9ed59592e3 ("lwtunnel: fix autoload of lwt modules")
Reported-by: pupilla@libero.it
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 12:52:11 -05:00
Leon Romanovsky
646ebd4166 RDMA: Don't reference kernel private header from UAPI header
Remove references to private kernel header and defines from exported
ib_user_verb.h file.

The code snippet below is used to reproduce the issue:

 #include <stdio.h>
 #include <rdma/ib_user_verb.h>

 int main(void)
 {
	printf("IB_USER_VERBS_ABI_VERSION = %d\n", IB_USER_VERBS_ABI_VERSION);
	return 0;
 }

It fails during compilation phase with an error:
➜  /tmp gcc main.c
main.c:2:31: fatal error: rdma/ib_user_verb.h: No such file or directory
 #include <rdma/ib_user_verb.h>
                               ^
compilation terminated.

Fixes: 189aba99e7 ("IB/uverbs: Extend modify_qp and support packet pacing")
CC: Bodong Wang <bodong@mellanox.com>
CC: Matan Barak <matanb@mellanox.com>
CC: Christoph Hellwig <hch@infradead.org>
Tested-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-08 12:28:49 -05:00
Nicholas Bellinger
01d4d67355 target: Fix multi-session dynamic se_node_acl double free OOPs
This patch addresses a long-standing bug with multi-session
(eg: iscsi-target + iser-target) se_node_acl dynamic free
withini transport_deregister_session().

This bug is caused when a storage endpoint is configured with
demo-mode (generate_node_acls = 1 + cache_dynamic_acls = 1)
initiators, and initiator login creates a new dynamic node acl
and attaches two sessions to it.

After that, demo-mode for the storage instance is disabled via
configfs (generate_node_acls = 0 + cache_dynamic_acls = 0) and
the existing dynamic acl is never converted to an explicit ACL.

The end result is dynamic acl resources are released twice when
the sessions are shutdown in transport_deregister_session().

If the storage instance is not changed to disable demo-mode,
or the dynamic acl is converted to an explict ACL, or there
is only a single session associated with the dynamic ACL,
the bug is not triggered.

To address this big, move the release of dynamic se_node_acl
memory into target_complete_nacl() so it's only freed once
when se_node_acl->acl_kref reaches zero.

(Drop unnecessary list_del_init usage - HCH)

Reported-by: Rob Millner <rlm@daterainc.com>
Tested-by: Rob Millner <rlm@daterainc.com>
Cc: Rob Millner <rlm@daterainc.com>
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-02-08 08:25:23 -08:00
Manuel Messner
935b7f6430 netfilter: nft_exthdr: add TCP option matching
This patch implements the kernel side of the TCP option patch.

Signed-off-by: Manuel Messner <mm@skelett.io>
Reviewed-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:17:09 +01:00
Florian Westphal
ab23821f7e netfilter: nft_ct: add zone id get support
Just like with counters the direction attribute is optional.
We set priv->dir to MAX unconditionally to avoid duplicating the assignment
for all keys with optional direction.

For keys where direction is mandatory, existing code already returns
an error.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:16:22 +01:00
Pablo Neira Ayuso
0b5a787492 netfilter: nf_tables: add space notation to sets
The space notation allows us to classify the set backend implementation
based on the amount of required memory. This provides an order of the
set representation scalability in terms of memory. The size field is
still left in place so use this if the userspace provides no explicit
number of elements, so we cannot calculate the real memory that this set
needs. This also helps us break ties in the set backend selection
routine, eg. two backend implementations provide the same performance.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:16:21 +01:00
Pablo Neira Ayuso
55af753cd9 netfilter: nf_tables: rename struct nft_set_estimate class field
Use lookup as field name instead, to prepare the introduction of the
memory class in a follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:16:20 +01:00
Pablo Neira Ayuso
1f48ff6c53 netfilter: nf_tables: add flush field to struct nft_set_iter
This provides context to walk callback iterator, thus, we know if the
walk happens from the set flush path. This is required by the new bitmap
set type coming in a follow up patch which has no real struct
nft_set_ext, so it has to allocate it based on the two bit compact
element representation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:16:20 +01:00