Commit graph

52811 commits

Author SHA1 Message Date
Daniele Palmas
7b8076ce8a NET: usb: cdc_mbim: add quirk for supporting Telit LE922A
Telit LE922A MBIM based composition does not work properly
with altsetting toggle done in cdc_ncm_bind_common.

This patch adds CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk
to avoid this procedure that, instead, is mandatory for
other modems.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:02:25 -05:00
Oleg Nesterov
1da5c46fa9 kthread: Make struct kthread kmalloc'ed
commit 23196f2e5f "kthread: Pin the stack via try_get_task_stack() /
put_task_stack() in to_live_kthread() function" is a workaround for the
fragile design of struct kthread being allocated on the task stack.

struct kthread in its current form should be removed, but this needs
cleanups outside of kthread.c.

As a first step move struct kthread away from the task stack by making it
kmalloc'ed. This allows to access kthread.exited without the magic of
trying to pin task stack and the try logic in to_live_kthread().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-08 14:36:18 +01:00
Michal Hocko
777c6e0dae hotplug: Make register and unregister notifier API symmetric
Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following

[  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[  145.122628] Call Trace:
[  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17

This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.

Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.

Fixes: 47e627bc8c ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-08 10:08:41 +01:00
David S. Miller
5fccd64aa4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains a large Netfilter update for net-next,
to summarise:

1) Add support for stateful objects. This series provides a nf_tables
   native alternative to the extended accounting infrastructure for
   nf_tables. Two initial stateful objects are supported: counters and
   quotas. Objects are identified by a user-defined name, you can fetch
   and reset them anytime. You can also use a maps to allow fast lookups
   using any arbitrary key combination. More info at:

   http://marc.info/?l=netfilter-devel&m=148029128323837&w=2

2) On-demand registration of nf_conntrack and defrag hooks per netns.
   Register nf_conntrack hooks if we have a stateful ruleset, ie.
   state-based filtering or NAT. The new nf_conntrack_default_on sysctl
   enables this from newly created netnamespaces. Default behaviour is not
   modified. Patches from Florian Westphal.

3) Allocate 4k chunks and then use these for x_tables counter allocation
   requests, this improves ruleset load time and also datapath ruleset
   evaluation, patches from Florian Westphal.

4) Add support for ebpf to the existing x_tables bpf extension.
   From Willem de Bruijn.

5) Update layer 4 checksum if any of the pseudoheader fields is updated.
   This provides a limited form of 1:1 stateless NAT that make sense in
   specific scenario, eg. load balancing.

6) Add support to flush sets in nf_tables. This series comes with a new
   set->ops->deactivate_one() indirection given that we have to walk
   over the list of set elements, then deactivate them one by one.
   The existing set->ops->deactivate() performs an element lookup that
   we don't need.

7) Two patches to avoid cloning packets, thus speed up packet forwarding
   via nft_fwd from ingress. From Florian Westphal.

8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
   prevent infinite loops, patch from Dwip Banerjee. And one minor
   refactoring from Gao feng.

9) Revisit recent log support for nf_tables netdev families: One patch
   to ensure that we correctly handle non-ethernet packets. Another
   patch to add missing logger definition for netdev. Patches from
   Liping Zhang.

10) Three patches for nft_fib, one to address insufficient register
    initialization and another to solve incorrect (although harmless)
    byteswap operation. Moreover update xt_rpfilter and nft_fib to match
    lbcast packets with zeronet as source, eg. DHCP Discover packets
    (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.

11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
    Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
    been broken in many-cast mode for some little time, let's give them a
    chance by placing them at the same level as other existing protocols.
    Thus, users don't explicitly have to modprobe support for this and
    NAT rules work for them. Some people point to the lack of support in
    SOHO Linux-based routers that make deployment of new protocols harder.
    I guess other middleboxes outthere on the Internet are also to blame.
    Anyway, let's see if this has any impact in the midrun.

12) Skip software SCTP software checksum calculation if the NIC comes
    with SCTP checksum offload support. From Davide Caratti.

13) Initial core factoring to prepare conversion to hook array. Three
    patches from Aaron Conole.

14) Gao Feng made a wrong conversion to switch in the xt_multiport
    extension in a patch coming in the previous batch. Fix it in this
    batch.

15) Get vmalloc call in sync with kmalloc flags to avoid a warning
    and likely OOM killer intervention from x_tables. From Marcelo
    Ricardo Leitner.

16) Update Arturo Borrero's email address in all source code headers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 19:16:46 -05:00
Linus Walleij
acf1fcf772 Merge branch 'thread-irq-simpler' into devel 2016-12-07 15:24:12 +01:00
Linus Walleij
eb485c7d9e Merge branch 'pl061' into devel 2016-12-07 15:23:45 +01:00
Linus Walleij
562b488443 gpio: pl061: move platform data into driver
No boardfile defines any PL061 platform data anymore: the
Integrator IM/PD-1 includes the file but is not making use
of the struct. Let's delete the include and all references,
then move the platform data into the driver for later
consolidation into the driver state container.

The only resource defined by the IM/PD-1 is the IRQ which
is passed through the AMBA PrimeCell bus abstraction
struct amba_device.

Cc: arm@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-12-07 15:22:50 +01:00
Dan Williams
efda1b5d87 acpi, nfit, libnvdimm: fix / harden ars_status output length handling
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
field of the ARS (Address Range Scrub) Status command, a firmware
implementation may in practice return 0, 4, or 8 to indicate that there
is no output payload to process.

The specification states "Size of Output Buffer in bytes, including this
field.". However, 'Output Buffer' is also the name of the entire
payload, and earlier in the specification it states "Max Query ARS
Status Output Buffer Size: Maximum size of buffer (including the Status
and Extended Status fields)".

Without this fix if the BIOS happens to return 0 it causes memory
corruption as evidenced by this result from the acpi_nfit_ctl() unit
test.

 ars_status00000000: 00020000 00000000                    ........
 BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
 kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
 task: ffff8803332d2ec0 task.stack: ffffc9000174c000
 RIP: 0010:[<ffffffff814cfe72>]  [<ffffffff814cfe72>] __memcpy+0x12/0x20
 RSP: 0018:ffffc9000174f9a8  EFLAGS: 00010246
 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
 FS:  00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
 Stack:
  ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
  0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
  0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
 Call Trace:
  [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
  [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test]

Cc: <stable@vger.kernel.org>
Fixes: 747ffe11b4 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06 16:08:10 -08:00
Florian Westphal
df122f58b8 netfilter: ingress: translate 0 nf_hook_slow retval to -1
The caller assumes that < 0 means that skb was stolen (or free'd).

All other return values continue skb processing.

nf_hook_slow returns 3 different return value types:

A) a (negative) errno value: the skb was dropped (NF_DROP, e.g.
by iptables '-j DROP' rule).

B) 0. The skb was stolen by the hook or queued to userspace.

C) 1. all hooks returned NF_ACCEPT so the caller should invoke
   the okfn so packet processing can continue.

nft ingress facility currently doesn't have the 'okfn' that
the NF_HOOK() macros use; there is no nfqueue support either.

So 1 means that nf_hook_ingress() caller should go on processing the skb.

In order to allow use of NF_STOLEN from ingress we need to translate
this to an errno number, else we'd crash because we continue with
already-free'd (or about to be free-d) skb.

The errno value isn't checked, its just important that its less than 0,
so return -1.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:48:21 +01:00
Florian Westphal
ae0ac0ed6f netfilter: x_tables: pack percpu counter allocations
instead of allocating each xt_counter individually, allocate 4k chunks
and then use these for counter allocation requests.

This should speed up rule evaluation by increasing data locality,
also speeds up ruleset loading because we reduce calls to the percpu
allocator.

As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
arches with 64k page size.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:19 +01:00
Florian Westphal
f28e15bace netfilter: x_tables: pass xt_counters struct to counter allocator
Keeps some noise away from a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:18 +01:00
Florian Westphal
4d31eef517 netfilter: x_tables: pass xt_counters struct instead of packet counter
On SMP we overload the packet counter (unsigned long) to contain
percpu offset.  Hide this from callers and pass xt_counters address
instead.

Preparation patch to allocate the percpu counters in page-sized batch
chunks.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:17 +01:00
Aaron Conole
d415b9eb76 netfilter: decouple nf_hook_entry and nf_hook_ops
During nfhook traversal we only need a very small subset of
nf_hook_ops members.

We need:
- next element
- hook function to call
- hook function priv argument

Bridge netfilter also needs 'thresh'; can be obtained via ->orig_ops.

nf_hook_entry struct is now 32 bytes on x86_64.

A followup patch will turn the run-time list into an array that only
stores hook functions plus their priv arguments, eliminating the ->next
element.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:16 +01:00
Aaron Conole
0aa8c57a04 netfilter: introduce accessor functions for hook entries
This allows easier future refactoring.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06 21:42:15 +01:00
Peter Zijlstra
f4ec57b632 locking/ww_mutex: Use relaxed atomics
The stamp is a sequence number, we don't care about memory ordering.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-06 10:42:03 +01:00
Peter Zijlstra
7c4788950b x86/uaccess, sched/preempt: Verify access_ok() context
I recently encountered wreckage because access_ok() was used where it
should not be, add an explicit WARN when access_ok() is used wrongly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-06 10:32:40 +01:00
Daniel Borkmann
7bd509e311 bpf: add prog_digest and expose it via fdinfo/netlink
When loading a BPF program via bpf(2), calculate the digest over
the program's instruction stream and store it in struct bpf_prog's
digest member. This is done at a point in time before any instructions
are rewritten by the verifier. Any unstable map file descriptor
number part of the imm field will be zeroed for the hash.

fdinfo example output for progs:

  # cat /proc/1590/fdinfo/5
  pos:          0
  flags:        02000002
  mnt_id:       11
  prog_type:    1
  prog_jited:   1
  prog_digest:  b27e8b06da22707513aa97363dfb11c7c3675d28
  memlock:      4096

When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded. Furthermore, this can also be
exposed through other means such as netlink in case of a tc cls/act
dump (or xdp in future), but also through tracepoints or other
facilities to identify the program. Other than that, the digest can
also serve as a base name for the work in progress kallsyms support
of programs. The digest doesn't depend/select the crypto layer, since
we need to keep dependencies to a minimum. iproute2 will get support
for this facility.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05 15:33:11 -05:00
Eric Dumazet
7aa5470c2c tcp: tsq: move tsq_flags close to sk_wmem_alloc
tsq_flags being in the same cache line than sk_wmem_alloc
makes a lot of sense. Both fields are changed from tcp_wfree()
and more generally by various TSQ related functions.

Prior patch made room in struct sock and added sk_tsq_flags,
this patch deletes tsq_flags from struct tcp_sock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05 13:32:24 -05:00
Eric Dumazet
40fc3423b9 tcp: tsq: add tsq_flags / tsq_enum
This is a cleanup, to ease code review of following patches.

Old 'enum tsq_flags' is renamed, and a new enumeration is added
with the flags used in cmpxchg() operations as opposed to
single bit operations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05 13:32:22 -05:00
Adrian Hunter
e711f03091 mmc: mmc: Introduce mmc_abort_tuning()
If a tuning command times out, the card could still be processing it, which
will cause problems for recovery. The eMMC specification says that CMD12
can be used to stop CMD21, so add a function that does that.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-05 14:16:22 +01:00
David Lechner
30103b5b64 regulator: Fix regulator_get_error_flags() signature mismatch
The function signature of does not match regulator_get_error_flags()
when CONFIG_REGULATOR is not defined vs. when it is not defined.
This makes both declarations match to prevent compiler errors.

Signed-off-by: David Lechner <david@lechnology.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2016-12-05 11:38:30 +00:00
Adrian Hunter
925ff3a7a3 mmc: mmc: Add Command Queue definitions
Add definitions relating to Command Queuing.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-05 10:31:07 +01:00
Adrian Hunter
e0097cf5f2 mmc: queue: Fix queue thread wake-up
The only time the driver sleeps expecting to be woken upon the arrival of
a new request, is when the dispatch queue is empty. The only time that it
is known whether the dispatch queue is empty is after NULL is returned
from blk_fetch_request() while under the queue lock.

Recognizing those facts, simplify the synchronization between the queue
thread and the request function. A couple of flags tell the request
function what to do, and the queue lock and barriers associated with
wake-ups ensure synchronization.

The result is simpler and allows the removal of the context_info lock.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Harjani Ritesh <riteshh@codeaurora.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-12-05 10:31:03 +01:00
Davide Caratti
c51d39010a netfilter: conntrack: built-in support for DCCP
CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection
tracking support for DCCP protocol is built-in into nf_conntrack.ko.

footprint test:
$ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \
        net/ipv4/netfilter/nf_conntrack_ipv4.ko \
        net/ipv6/netfilter/nf_conntrack_ipv6.ko

(builtin)||  dccp  |  ipv4  |  ipv6  | nf_conntrack
---------++--------+--------+--------+--------------
none     || 469140 | 828755 | 828676 | 6141434
DCCP     ||   -    | 830566 | 829935 | 6533526

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04 20:53:15 +01:00
Erik Nordmark
adc176c547 ipv6 addrconf: Implemented enhanced DAD (RFC7527)
Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.

Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 23:21:37 -05:00
David S. Miller
2745529ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
   RX tail pointer properly overlapped with some changes
   to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
   overlapping with a reorganization of the driver to support
   ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
   stmmac driver, meanwhile a lot of this code was cleaned up
   and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
   'net-next' called __fl_delete() and this overlapped with
   Daniel Borkamann's bug fix to use RCU for object destruction
   in 'net'.  It also overlapped with Jiri's change to guard
   the rhashtable_remove_fast() call with a check against
   tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
   unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
   bug fix in 'net' overlapped with a large reorganization of
   the same code in 'net-next'.  Since the 'net-next' code no
   longer had the bug in question, there was nothing to do
   other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 12:29:53 -05:00
Linus Torvalds
8bca927f13 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Lots more phydev and probe error path leaks in various drivers by
    Johan Hovold.

 2) Fix race in packet_set_ring(), from Philip Pettersson.

 3) Use after free in dccp_invalid_packet(), from Eric Dumazet.

 4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric
    Dumazet.

 5) When tunneling between ipv4 and ipv6 we can be left with the wrong
    skb->protocol value as we enter the IPSEC engine and this causes all
    kinds of problems. Set it before the output path does any
    dst_output() calls, from Eli Cooper.

 6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from
    Florian Fainelli.

 7) Various netfilter nat bug fixes from FLorian Westphal.

 8) Fix memory leak in ipvlan_link_new(), from Gao Feng.

 9) Locking fixes, particularly wrt. socket lookups, in l2tp from
    Guillaume Nault.

10) Avoid invoking rhash teardowns in atomic context by moving netlink
    cb->done() dump completion from a worker thread. Fix from Herbert
    Xu.

11) Buffer refcount problems in tun and macvtap on errors, from Jason
    Wang.

12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user
    selects BBR. Fix from Julian Wollrath.

13) Fix deadlock in transmit path on altera TSE driver, from Lino
    Sanfilippo.

14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita
    Yushchenko.

15) tc_tunnel_key needs to be properly exported to userspace via uapi,
    fix from Roi Dayan.

16) rds_tcp_init_net() doesn't unregister notifier in error path, fix
    from Sowmini Varadhan.

17) Stale packet header pointer access after pskb_expand_head() in
    genenve driver, fix from Sabrina Dubroca.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
  net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
  geneve: avoid use-after-free of skb->data
  tipc: check minimum bearer MTU
  net: renesas: ravb: unintialized return value
  sh_eth: remove unchecked interrupts for RZ/A1
  net: bcmgenet: Utilize correct struct device for all DMA operations
  NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
  cdc_ether: Fix handling connection notification
  ip6_offload: check segs for NULL in ipv6_gso_segment.
  RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
  Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
  ipv6: Set skb->protocol properly for local output
  ipv4: Set skb->protocol properly for local output
  packet: fix race condition in packet_set_ring
  net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
  net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
  net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
  net: ethernet: stmmac: platform: fix outdated function header
  net: ethernet: stmmac: dwmac-meson8b: fix probe error path
  net: ethernet: stmmac: dwmac-generic: fix probe error path
  ...
2016-12-02 11:45:27 -08:00
David Ahern
6102365876 bpf: Add new cgroup attach type to enable sock modifications
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:46:08 -05:00
David Ahern
b2cd12574a bpf: Refactor cgroups code in prep for new type
Code move and rename only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:44:56 -05:00
Hadar Hen Zion
7091d8c705 net/sched: cls_flower: Add offload support using egress Hardware device
In order to support hardware offloading when the device given by the tc
rule is different from the Hardware underline device, extract the mirred
(egress) device from the tc action when a filter is added, using the new
tc_action_ops, get_dev().

Flower caches the information about the mirred device and use it for
calling ndo_setup_tc in filter change, update stats and delete.

Calling ndo_setup_tc of the mirred (egress) device instead of the
ingress device will allow a resolution between the software ingress
device and the underline hardware device.

The resolution will take place inside the offloading driver using
'egress_device' flag added to tc_to_netdev struct which is provided to
the offloading driver.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:37 -05:00
Florian Westphal
95a22caee3 tcp: randomize tcp timestamp offsets for each connection
jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.

commit ceaa1fef65 ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).

So only two items are left:
 - add a tsoffset for request sockets
 - extend the tcp isn generator to also return another 32bit number
   in addition to the ISN.

Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.

Includes fixes from Eric Dumazet.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:49:59 -05:00
Yuval Mintz
fc831825f9 qed: Add support for hardware offloaded iSCSI.
This adds the backbone required for the various HW initalizations
which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
4xxxx line of adapters - FW notification, resource initializations, etc.

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:44:37 -05:00
Daniel Borkmann
366cbf2f46 bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to caller
After 326fe02d1e ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers
need to hold rcu_read_lock() already to make sure BPF program doesn't
get released in the background.

Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading.
Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping
in XDP supported drivers and to keep the typecheck on the context intact.
For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can
just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock()
out of the helper. When the driver gets atomic replace support, this will
move to call-sites eventually.

mlx5 needs actual fixing as it has the same issue as described already in
326fe02d1e ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
that is, we're under RCU bh at this time, BPF programs are released via
call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark
read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue
reset.

Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 11:06:24 -05:00
Thomas Graf
3a0af8fd61 bpf: BPF for lightweight tunnel infrastructure
Registers new BPF program types which correspond to the LWT hooks:
  - BPF_PROG_TYPE_LWT_IN   => dst_input()
  - BPF_PROG_TYPE_LWT_OUT  => dst_output()
  - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()

The separate program types are required to differentiate between the
capabilities each LWT hook allows:

 * Programs attached to dst_input() or dst_output() are restricted and
   may only read the data of an skb. This prevent modification and
   possible invalidation of already validated packet headers on receive
   and the construction of illegal headers while the IP headers are
   still being assembled.

 * Programs attached to lwtunnel_xmit() are allowed to modify packet
   content as well as prepending an L2 header via a newly introduced
   helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is
   invoked after the IP header has been assembled completely.

All BPF programs receive an skb with L3 headers attached and may return
one of the following error codes:

 BPF_OK - Continue routing as per nexthop
 BPF_DROP - Drop skb and return EPERM
 BPF_REDIRECT - Redirect skb to device as per redirect() helper.
                (Only valid in lwtunnel_xmit() context)

The return codes are binary compatible with their TC_ACT_
relatives to ease compatibility.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:51:49 -05:00
Tariq Toukan
1c1b522808 net/mlx5e: Implement Fragmented Work Queue (WQ)
Add new type of struct mlx5_frag_buf which is used to allocate fragmented
buffers rather than contiguous, and make the Completion Queues (CQs) use
it as they are big (default of 2MB per CQ in Striding RQ).

This fixes the failures of type:
"mlx5e_open_locked: mlx5e_open_channels failed, -12"
due to dma_zalloc_coherent insufficient contiguous coherent memory to
satisfy the driver's request when the user tries to setup more or larger
rings.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:01 -05:00
Ingo Molnar
1b95b1a06c Merge branch 'locking/urgent' into locking/core, to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-02 11:13:44 +01:00
Linus Torvalds
4db5e636dd pci-v4.9-fixes-4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYQKe0AAoJEFmIoMA60/r8dmsQAJ1BjfcgWunT8cyBjh9DW8MT
 mFj4w4qEtN8JthecXYKDHYY1zTRocuuKYQTCdX6qKnnx37amJwfiEtPsLqzoio3U
 HqIx0Nyereh6ir3VHJgITa2C0317pw6ti2rEZS+oMfQyWUDWVXMKOo3nsCKYtqLJ
 fO0K1ubYSUwNr1ph3rxTbJaycRUZsXK1PAdaROVeDjiw6IPgSNd9eboQCQAg3WQm
 JFsENhhCDM7qlFpwgbjtjv2IkzK0zpxs6vkVKRUJ1x8D2OAfg0j+rxYEVaOU23bO
 isj7rnbM1fFuC3WrAB1uexPfISLuzqUSIceB46EItoTJ7x3wmQGs4BIIt9LlmUte
 Z6RNAMbUx+K/5p2+xCVJAnbhfnCQv/vLkYEKpr2uPx43PywALYJq/8I4p/qh0zIW
 562ulb7HUqh8jNMvFj/7kqCijnkFHw0iddL0zwC6VD5/lYiTeYN19/T00gUGLtB6
 YWunN1G/fl/SdtI29oo8e+xVKuWraAsyKVX7LZIl2XaZhVBTy9vTC2wC/hdZqiMg
 yXK4/lE+Fr0tnHt8vVRgEicTHTmlQYQnRKNcy9PyDQWyYndg4ExacmsafQ61u0EE
 bUKoPPT7zJT/TVDp54cWk4t/AHc4TONNONNUH2xZKAMElsAiQrHd4GwFHUAQgz/C
 MiwbEXvfYTBcPCRP4cqD
 =DhJD
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.9-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "PCI fixes:

   - Fix Read Completion Boundary setting, which fixes a boot failure on
     IBM x3850 with Mellanox MT27500 ConnectX-3

   - Update some MAINTAINERS entries and email addresses"

* tag 'pci-v4.9-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)
  PCI: Export pcie_find_root_port
  PCI: designware-plat: Update author email
  PCI: designware: Change maintainer to Joao Pinto
  MAINTAINERS: Add devicetree binding to PCI i.MX6 entry
  MAINTAINERS: Update Richard Zhu's email address
2016-12-01 16:44:42 -08:00
Anna-Maria Gleixner
1dd6c834fa zram: Convert to hotplug state machine
Install the callbacks via the state machine with multi instance support and let
the core invoke the callbacks on the already online CPUs.

[bigeasy: wire up the multi instance stuff]
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: rt@linutronix.de
Cc: Nitin Gupta <ngupta@vflare.org>
Link: http://lkml.kernel.org/r/20161126231350.10321-19-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:39 +01:00
Anna-Maria Gleixner
3f7cd919f3 KVM/PPC/Book3S HV: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: rt@linutronix.de
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Link: http://lkml.kernel.org/r/20161126231350.10321-18-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:38 +01:00
Anna-Maria Gleixner
21647615db iommu/vt-d: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: rt@linutronix.de
Cc: David Woodhouse <dwmw2@infradead.org>
Link: http://lkml.kernel.org/r/20161126231350.10321-14-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:37 +01:00
Sebastian Andrzej Siewior
cab7a7e5b6 mm/zswap: Convert pool to hotplug state machine
Install the callbacks via the state machine. Multi state is used to address the
per-pool notifier. Uppon adding of the intance the callback is invoked for all
online CPUs so the manual init can go.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-mm@kvack.org
Cc: Seth Jennings <sjenning@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161126231350.10321-13-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:36 +01:00
Sebastian Andrzej Siewior
ad7ed7708d mm/zswap: Convert dst-mem to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-mm@kvack.org
Cc: Seth Jennings <sjenning@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161126231350.10321-12-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:36 +01:00
Sebastian Andrzej Siewior
215c89d055 mm/zsmalloc: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: linux-mm@kvack.org
Cc: Minchan Kim <minchan@kernel.org>
Cc: rt@linutronix.de
Cc: Nitin Gupta <ngupta@vflare.org>
Link: http://lkml.kernel.org/r/20161126231350.10321-11-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:36 +01:00
Sebastian Andrzej Siewior
5438da977f mm/vmstat: Convert to hotplug state machine
Install the callbacks via the state machine, but do not invoke them as we
can initialize the node state without calling the callbacks on all online
CPUs.

start_shepherd_timer() is now called outside the get_online_cpus() block
which is safe as it only operates on cpu possible mask.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: rt@linutronix.de
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20161129145221.ffc3kg3hd7lxiwj6@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:35 +01:00
Sebastian Andrzej Siewior
b32614c034 tracing/rb: Convert to hotplug state machine
Install the callbacks via the state machine. The notifier in struct
ring_buffer is replaced by the multi instance interface.  Upon
__ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
the trace_rb_cpu_prepare() on each CPU.

This callback may now fail. This means __ring_buffer_alloc() will fail and
cleanup (like previously) and during a CPU up event this failure will not
allow the CPU to come up.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 00:52:34 +01:00
Raju Lakkaraju
f4ed2fe34f net: phy: add mdix_ctrl to hold the user configuration.
Add new parameter mdix_ctrl to hold the user configuration.
Existing mdix maintain the current status of MDI(X) crossover performed or
not.
mdix_ctrl can configure either ETH_TP_MDI or ETH_TP_MDI_X orETH_TP_MDI_AUTO.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-01 11:26:47 -05:00
Baolin Wang
4a057549d6 alarmtimer: Add tracepoints for alarm timers
Alarm timers are one of the mechanisms to wake up a system from suspend,
but there exist no tracepoints to analyse which process/thread armed an
alarmtimer.

Add tracepoints for start/cancel/expire of individual alarm timers and one
for tracing the suspend time decision when to resume the system.

The following trace excerpt illustrates the new mechanism:

Binder:3292_2-3304  [000] d..2   149.981123: alarmtimer_cancel:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325463120000000000 now:1325376810370370245

Binder:3292_2-3304  [000] d..2   149.981136: alarmtimer_start:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325376840000000000 now:1325376810370384591

Binder:3292_9-3953  [000] d..2   150.212991: alarmtimer_cancel:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179552000000 now:150154008122

Binder:3292_9-3953  [000] d..2   150.213006: alarmtimer_start:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179551000000 now:150154025622

system_server-3000  [002] ...1  162.701940: alarmtimer_suspend:
alarmtimer type:REALTIME expires:1325376840000000000

The wakeup time which is selected at suspend time allows to map it back to
the task arming the timer: Binder:3292_2.

[ tglx: Store alarm timer expiry time instead of some useless RTC relative
  	information, add proper type information for wakeups which are
  	handled via the clock_nanosleep/freezer and massage the changelog. ]

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-01 14:45:08 +01:00
Kirill A. Shutemov
5cbc198ae0 mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb
Hugetlb pages have ->index in size of the huge pages (PMD_SIZE or
PUD_SIZE), not in PAGE_SIZE as other types of pages.  This means we
cannot user page_to_pgoff() to check whether we've got the right page
for the radix-tree index.

Let's introduce page_to_index() which would return radix-tree index for
given page.

We will be able to get rid of this once hugetlb will be switched to
multi-order entries.

Fixes: fc127da085 ("truncate: handle file thp")
Link: http://lkml.kernel.org/r/20161123093053.mjbnvn5zwxw5e6lk@black.fi.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Doug Nelson <doug.nelson@intel.com>
Tested-by: Doug Nelson <doug.nelson@intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-30 16:32:52 -08:00
Dmitry Vyukov
045d599a28 kasan: update kasan_global for gcc 7
kasan_global struct is part of compiler/runtime ABI.  gcc revision
241983 has added a new field to kasan_global struct.  Update kernel
definition of kasan_global struct to include the new field.

Without this patch KASAN is broken with gcc 7.

Link: http://lkml.kernel.org/r/1479219743-28682-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-30 16:32:52 -08:00
Mintz, Yuval
3da7a37ae6 qed*: Handle-based L2-queues.
The driver needs to maintain several FW/HW-indices for each one of
its queues. Currently, that mapping is done by the QED where it uses
an rx/tx array of so-called hw-cids, populating them whenever a new
queue is opened and clearing them upon destruction of said queues.

This maintenance is far from ideal - there's no real reason why
QED needs to maintain such a data-structure. It becomes even worse
when considering the fact that the PF's queues and its child VFs' queues
are all mapped into the same data-structure.
As a by-product, the set of parameters an interface needs to supply for
queue APIs is non-trivial, and some of the variables in the API
structures have different meaning depending on their exact place
in the configuration flow.

This patch re-organizes the way L2 queues are configured and maintained.
In short:
  - Required parameters for queue init are now well-defined.
  - Qed would allocate a queue-cid based on parameters.
    Upon initialization success, it would return a handle to caller.
  - Queue-handle would be maintained by entity requesting queue-init,
    not necessarily qed.
  - All further queue-APIs [update, destroy] would use the opaque
    handle as reference for the queue instead of various indices.

The possible owners of such handles:
  - PF queues [qede] - complete handles based on provided configuration.
  - VF queues [qede] - fw-context-less handles, containing only relative
    information; Only the PF-side would need the absolute indices
    for configuration, so they're omitted here.
  - VF queues [qed, PF-side] - complete handles based on VF initialization.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:32:04 -05:00