Commit graph

42289 commits

Author SHA1 Message Date
Daniel Borkmann
35b2108550 ktime: Add __must_check prefix to ktime_to_timespec_cond
The function is currently mainly used in the networking code and
if others start using it, they must check the result, otherwise
it cannot be determined if the timespec conversion suceeded.
Currently no user lacks this check, but make future users aware of
a possible misusage.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-28 14:00:58 -07:00
Liu Ying
a44b8bd607 ktime: Use macro NSEC_PER_USEC where appropriate
We've got the macro NSEC_PER_USEC defined in header file
include/linux/time.h. To make the code decent, this patch
replaces the immediate number 1000 to convert bewteen a
time value in microseconds and one in nanoseconds with the
macro NSEC_PER_USEC.

Signed-off-by: Liu Ying <Ying.Liu@freescale.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-28 14:00:57 -07:00
Jiri Pirko
be9efd3653 net: pass changed flags along with NETDEV_CHANGE event
Use new netdevice notifier infrastructure to pass along changed flags.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>

v2->v3: shortened notifier_info struct name
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-28 13:11:01 -07:00
Jiri Pirko
351638e7de net: pass info struct via netdevice notifier
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

v2->v3: fix typo on simeth
	shortened dev_getter
	shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-28 13:11:01 -07:00
Alexander Gordeev
65f6ae66a6 PCI: Allocate only as many MSI vectors as requested by driver
Because of the encoding of the "Multiple Message Capable" and "Multiple
Message Enable" fields, a device can only advertise that it's capable of a
power-of-two number of vectors, and the OS can only enable a power-of-two
number.

For example, a device that's limited internally to using 18 vectors would
have to advertise that it's capable of 32.  The 14 extra vectors consume
vector numbers and IRQ descriptors even though the device can't actually
use them.

This fix introduces a 'msi_desc::nvec_used' field to address this issue.
When non-zero, it is the actual number of MSIs the device will send, as
requested by the device driver.  This value should be used by architectures
to set up and tear down only as many interrupt resources as the device will
actually use.

Note, although the existing 'msi_desc::multiple' field might seem
redundant, in fact it is not.  The number of MSIs advertised need not be
the smallest power-of-two larger than the number of MSIs the device will
send.  Thus, it is not always possible to derive the former from the
latter, so we need to keep them both to handle this case.

[bhelgaas: changelog, rename to "nvec_used"]
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-05-28 11:31:16 -06:00
Robert P. J. Day
611a75e187 include/linux/cpu.h: Update comments to reflect reality
Two minor changes to comments:

* Remove reference to drivers/base/sys.c, removed in 0a962657.
* CPUs are now exported by sysfs via devices/system/cpu.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:11 +02:00
Steven Rostedt
bcf312cf76 tracing: Put trace_puts() comment above trace_puts() macro for kernel doc
Kernel-doc gives the following warning:

  DOCPROC Documentation/DocBook/kernel-api.xml
Warning(/include/linux/kernel.h:590): No description found for parameter 'ip'
Warning(/include/linux/kernel.h:590): No description found for parameter 'ip'

Due to the externs between the the comment and the trace_puts() macro.
This is fixed by moving the externs below the macro and keeping the
macro and comment directly together.

Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:11 +02:00
Peter Zijlstra
26cb63ad11 perf: Fix perf mmap bugs
Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

 - it has issues against fork() since we use vma->vm_mm for accounting.
 - it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 11:05:08 +02:00
Michael S. Tsirkin
662bbcb274 mm, sched: Allow uaccess in atomic with pagefault_disable()
This changes might_fault() so that it does not
trigger a false positive diagnostic for e.g. the following
sequence:

	spin_lock_irqsave()
	pagefault_disable()
	copy_to_user()
	pagefault_enable()
	spin_unlock_irqrestore()

In particular vhost wants to do this, to call
socket ops from under a lock.

There are 3 cases to consider:

 - CONFIG_PROVE_LOCKING - might_fault is non-inline
   so it's easy to move the in_atomic test to fix
   up the false positive warning.

 - CONFIG_DEBUG_ATOMIC_SLEEP - might_fault
   is currently inline, but we are calling a
   non-inline __might_sleep anyway,
   so let's use the non-line version of might_fault
   that does the right thing.

 - !CONFIG_DEBUG_ATOMIC_SLEEP && !CONFIG_PROVE_LOCKING
   __might_sleep is a nop so might_fault is a nop.

Make this explicit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-11-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:41:11 +02:00
Michael S. Tsirkin
114276ac0a mm, sched: Drop voluntary schedule from might_fault()
might_fault() is called from functions like copy_to_user()
which most callers expect to be very fast, like a couple of
instructions.

So functions like memcpy_toiovec() call them many times in a loop.

But might_fault() calls might_sleep() and with CONFIG_PREEMPT_VOLUNTARY
this results in a function call.

Let's not do this - just call __might_sleep() that produces
a diagnostic for sleep within atomic, but drop
might_preempt().

Here's a test sending traffic between the VM and the host,
host is built with CONFIG_PREEMPT_VOLUNTARY:

 before:
	incoming: 7122.77   Mb/s
	outgoing: 8480.37   Mb/s

 after:
	incoming: 8619.24   Mb/s
	outgoing: 9455.42   Mb/s

As a side effect, this fixes an issue pointed
out by Ingo: might_fault might schedule differently
depending on PROVE_LOCKING. Now there's no
preemption point in both cases, so it's consistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-10-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:41:11 +02:00
Stephane Eranian
62b8563979 perf: Add sysfs entry to adjust multiplexing interval per PMU
This patch adds /sys/device/xxx/perf_event_mux_interval_ms to ajust
the multiplexing interval per PMU. The unit is milliseconds. Value has
to be >= 1.

In the 4th version, we renamed the sysfs file to be more consistent
with the other /proc/sys/kernel entries for perf_events.

In the 5th version, we handle the reprogramming of the hrtimer using
hrtimer_forward_now(). That way, we sync up to new timer value quickly
(suggested by Jiri Olsa).

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1364991694-5876-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:51 +02:00
Stephane Eranian
9e6302056f perf: Use hrtimers for event multiplexing
The current scheme of using the timer tick was fine for per-thread
events. However, it was causing bias issues in system-wide mode
(including for uncore PMUs). Event groups would not get their fair
share of runtime on the PMU. With tickless kernels, if a core is idle
there is no timer tick, and thus no event rotation (multiplexing).
However, there are events (especially uncore events) which do count
even though cores are asleep.

This patch changes the timer source for multiplexing.  It introduces a
per-PMU per-cpu hrtimer. The advantage is that even when a core goes
idle, it will come back to service the hrtimer, thus multiplexing on
system-wide events works much better.

The per-PMU implementation (suggested by PeterZ) enables adjusting the
multiplexing interval per PMU. The preferred interval is stashed into
the struct pmu. If not set, it will be forced to the default interval
value.

In order to minimize the impact of the hrtimer, it is turned on and
off on demand. When the PMU on a CPU is overcommited, the hrtimer is
activated.  It is stopped when the PMU is not overcommitted.

In order for this to work properly, we had to change the order of
initialization in start_kernel() such that hrtimer_init() is run
before perf_event_init().

The default interval in milliseconds is set to a timer tick just like
with the old code. We will provide a sysctl to tune this in another
patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:07:10 +02:00
Jiri Olsa
ab573844e3 perf: Fix hw breakpoints overflow period sampling
The hw breakpoint pmu 'add' function is missing the
period_left update needed for SW events.

The perf HW breakpoint events use the SW events framework
to process the overflow, so it needs to be properly initialized
in the PMU 'add' method.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-5-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:59:54 +02:00
dingtianhong
da6e378ba9 netpoll: remove return value from netpoll_rx_disable()
The netpoll_rx_disable() will always return 0, it is no use and looks wordy,
so remove the unnecessary code and get rid of it in _dev_open and _dev_close.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-27 23:18:50 -07:00
Jaegeuk Kim
a9841c4dbb f2fs: align data types between on-disk and in-memory block addresses
The on-disk block address is defined as __le32, but in-memory block address,
block_t, does as u64.

Let's synchronize them to 32 bits.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Simon Horman
0d89d2035f MPLS: Add limited GSO support
In the case where a non-MPLS packet is received and an MPLS stack is
added it may well be the case that the original skb is GSO but the
NIC used for transmit does not support GSO of MPLS packets.

The aim of this code is to provide GSO in software for MPLS packets
whose skbs are GSO.

SKB Usage:

When an implementation adds an MPLS stack to a non-MPLS packet it should do
the following to skb metadata:

* Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
  skb->inner_protocol is added by this patch.

* Set skb->protocol to the new MPLS ethertype of the packet.

* Set skb->network_header to correspond to the
  end of the L3 header, including the MPLS label stack.

I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
That patch sets the above requirements in datapath/actions.c:push_mpls()
and was used to exercise this code.  The datapath patch is against the Open
vSwtich tree but it is intended that it be added to the Open vSwtich code
present in the mainline Linux kernel at some point.

Features:

I believe that the approach that I have taken is at least partially
consistent with the handling of other protocols.  Jesse, I understand that
you have some ideas here.  I am more than happy to change my implementation.

This patch adds dev->mpls_features which may be used by devices
to advertise features supported for MPLS packets.

A new NETIF_F_MPLS_GSO feature is added for devices which support
hardware MPLS GSO offload.  Currently no devices support this
and MPLS GSO always falls back to software.

Alternate Implementation:

One possible alternate implementation is to teach netif_skb_features()
and skb_network_protocol() about MPLS, in a similar way to their
understanding of VLANs. I believe this would avoid the need
for net/mpls/mpls_gso.c and in particular the calls to
__skb_push() and __skb_push() in mpls_gso_segment().

I have decided on the implementation in this patch as it should
not introduce any overhead in the case where mpls_gso is not compiled
into the kernel or inserted as a module.

MPLS GSO suggested by Jesse Gross.
Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
by Pravin B Shelar.

Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-27 22:50:59 -07:00
Simon Horman
1a37e412a0 net: Use 16bits for *_headers fields of struct skbuff
In order to mitigate ongoing incresase in the size of struct skbuff
use 16 bit integer offsets rather than pointers for inner_*_headers.

This appears to reduce the size of struct skbuff from 0xd0 to 0xc0
bytes on x86_64 with the following all unset.

	CONFIG_XFRM
	CONFIG_NF_CONNTRACK
	CONFIG_NF_CONNTRACK_MODULE
	NET_SKBUFF_NF_DEFRAG_NEEDED
	CONFIG_BRIDGE_NETFILTER
	CONFIG_NET_SCHED
	CONFIG_IPV6_NDISC_NODETYPE
	CONFIG_NET_DMA
	CONFIG_NETWORK_SECMARK

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-27 22:50:59 -07:00
Florian Fainelli
4284b6a535 phy: allow drivers to flag a PHY device as internal
libphy currently always reports a PHY as an external transceiver from
the ethtool output. This is inaccurate, because some drivers should be
able to tell that a PHY device is an internal transceiver of an Ethernet
MAC. Add a new flag (PHY_IS_INTERNAL) which can be set by PHY drivers
just like other flags, and a corresponding helper: phy_is_internal()
which can be used by networking drivers to query if a given
PHY device is internal.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-27 22:42:50 -07:00
Olof Johansson
6f39ef575d This is a set of patches from Lee Jones to start converting
the ux500 to fetch DMA channels from the device tree:
 - Full DT support and channel mapping in the DMA40 driver
 - Dropping of platform data for migrated devices on the DT
   boot path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRnms9AAoJEEEQszewGV1zvAMP/R96pExEBjlf+o61FvPafFpw
 myZtprUXJvkHphOY+UVHk/gv+1EUFbNwcQzu5k3Ah14DOmv7qtjusylMDbJoFPar
 dDVU8PtsT+4qmCzgLRd+ATkZEBMOAaJFscD7UV1qtY0AIpz8jS1gJpOO08qqQBg1
 unEj7szVEfs1eCrcDnvfMmeBM7gqZMTFkWmGkpuNqH5MdN3HLMKgAFJEc+GafP4d
 saaX3fCdiif5OYkD9DyQO7I4zsSaAs8OcpYbQKyRmWhm/wuLGPDoFsfBjOXGgFRr
 V/XwJD74lNgGduC8bCRsgj4LDEdRDO0s1NlqRWpiX/W1GX2lo4wraLdPmd16AaTQ
 LUb34o10sN4/0mJnbweWxmFPyzGNfw/Kl+ea4gA0vOQ9QDtTIrtnjHOtowGAdhBJ
 lDob3vAd5huCU287p8me+nc6Y4eJaJS0LjcsU1yy7U4O9mLsZnubPaVPiQu2gUGe
 GUKYd5J40aslxRuyP/MCA0bnCBhjyH62YfN9omw0Cmt/VQvyKXFF4JWdQuS/Dn3s
 29JQfOWCf8xGYwZaQYSzluczP+wokfWnx8wrQ7c1Kb9G2NcDlv6f2/mbDRG9pmXV
 2xs8v/fxONDfSCrPVCiBiHH4sYtcbD6UMjiTB76etygOmb3oeSK/Aqx0Sm90MG6Q
 xL9gkP14kNTaZBGCDNLN
 =RTiI
 -----END PGP SIGNATURE-----

Merge tag 'ux500-dma40-for-arm-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson into next/drivers

From Linus Walleij:
This is a set of patches from Lee Jones to start converting
the ux500 to fetch DMA channels from the device tree:
- Full DT support and channel mapping in the DMA40 driver
- Dropping of platform data for migrated devices on the DT
  boot path.

* tag 'ux500-dma40-for-arm-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson: (36 commits)
  ARM: ux500: Register Cryp and Hash platform drivers on Snowball
  crypto: ux500/[cryp|hash] - Show successful start-up in the bootlog
  ARM: ux500: Stop passing Cryp DMA channel config information though pdata
  crypto: ux500/cryp - Set DMA configuration though dma_slave_config()
  crypto: ux500/cryp - Prepare clock before enabling it
  ARM: ux500: Stop passing Hash DMA channel config information though pdata
  crypto: ux500/hash - Set DMA configuration though dma_slave_config()
  crypto: ux500/hash - Prepare clock before enabling it
  ARM: ux500: Remove unnecessary attributes from DMA channel request pdata
  dmaengine: ste_dma40: Correct copy/paste error
  ARM: ux500: Remove DMA address look-up table
  dmaengine: ste_dma40: Remove redundant address fetching function
  dmaengine: ste_dma40: Only use addresses passed as configuration information
  ARM: ux500: Stop passing UART's platform data for Device Tree boots
  dmaengine: ste_dma40: Don't configure runtime configurable setup during allocate
  dmaengine: ste_dma40: Remove unnecessary call to d40_phy_cfg()
  dmaengine: ste_dma40: Separate Logical Global Interrupt Mask (GIM) unmasking
  ARM: ux500: Pass remnant platform data though to DMA40 driver
  dmaengine: ste_dma40: Supply full Device Tree parsing support
  dmaengine: ste_dma40: Allow driver to be probe()able when DT is enabled
  ...

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-05-27 20:10:04 -07:00
Gu Zheng
3c6e6ae770 PCI: Introduce pci_alloc_dev(struct pci_bus*) to replace alloc_pci_dev()
Here we introduce a new interface to replace alloc_pci_dev():

    struct pci_dev *pci_alloc_dev(struct pci_bus *bus)

It takes a "struct pci_bus *" argument, so we can alloc a PCI device
on a target PCI bus, and it acquires a reference on the pci_bus.
We use pci_alloc_dev(NULL) to simplify the old alloc_pci_dev(),
and keep it for a while but mark it as __deprecated.

Holding a reference to the pci_bus ensures that referencing
pci_dev->bus is valid as long as the pci_dev is valid.

[bhelgaas: keep existing "return error early" structure in pci_alloc_dev()]
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-05-27 16:23:28 -06:00
Jiang Liu
fe830ef62a PCI: Introduce pci_bus_{get|put}() to manage PCI bus reference count
Introduce helper functions pci_bus_{get|put}() to manage PCI bus
reference count.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-05-27 16:22:09 -06:00
Viresh Kumar
2361be2366 cpufreq: Don't create empty /sys/devices/system/cpu/cpufreq directory
When we don't have any file in cpu/cpufreq directory we shouldn't
create it. Specially with the introduction of per-policy governor
instance patchset, even governors are moved to
cpu/cpu*/cpufreq/governor-name directory and so this directory is
just not required.

Lets have it only when required.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27 13:24:02 +02:00
Viresh Kumar
72a4ce340a cpufreq: Move get_cpu_idle_time() to cpufreq.c
Governors other than ondemand and conservative can also use
get_cpu_idle_time() and they aren't required to compile
cpufreq_governor.c. So, move these independent routines to
cpufreq.c instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27 13:20:56 +02:00
Viresh Kumar
944e9a0316 cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c
get_governor_parent_kobj() can be used by any governor, generic
cpufreq governors or platform specific ones and so must be present in
cpufreq.c instead of cpufreq_governor.c.

This patch moves it to cpufreq.c. This also adds
EXPORT_SYMBOL_GPL(get_governor_parent_kobj) so that modules can use
this function too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27 13:20:56 +02:00
Soren Brinkmann
30e1e28598 arm: zynq: Migrate platform to clock controller
Migrate the Zynq platform and its drivers to use the new clock
controller driver.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: linux-serial@vger.kernel.org
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
2013-05-27 09:21:22 +02:00
Greg Kroah-Hartman
45f6bc5ff9 Merge 3.10-rc3 into usb-next
We want these fixes.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-27 11:00:52 +09:00
Greg Kroah-Hartman
8095e4e81b Merge 3.10-rc3 into tty-next
We want these fixes.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-27 10:57:53 +09:00
Adrian Hunter
f0710a557c mmc: sdhci: add ability to stay runtime-resumed if the card is powered up
If card power is dependent on SD bus power then the host controller
must not be runtime suspended while the card is powered up.  Add
the ability to stay runtime-resumed in that case and enable it with a new
quirk SDHCI_QUIRK2_CARD_ON_NEEDS_BUS_ON.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:23 -04:00
Guennadi Liakhovetski
eec95ee226 mmc: sdhi/tmio: switch to using dmaengine_slave_config()
This removes the deprecated use of the .private member of struct dma_chan
and switches the sdhi / tmio mmc driver to using the
dmaengine_slave_config() channel configuration method.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski+renesas@gmail.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:20 -04:00
Guennadi Liakhovetski
03a0675b2a mmc: sdhi/tmio: make DMA filter implementation specific
So far only the SDHI implementation uses TMIO MMC with DMA. That way a DMA
channel filter function, defined in the TMIO driver wasn't a problem.
However, such a filter function is DMA controller specific. Since the SDHI
glue is only running on systems with the SHDMA DMA controller, the filter
function can safely be provided by it. Move it into SDHI.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski+renesas@gmail.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:19 -04:00
Fredrik Soderstedt
6044371219 mmc: core: Fix select power class after resume
Use the saved values in card->ext_csd when selecting power class.
By doing this the power class will be selected even if mmc_init_card
is called with oldcard != NULL, which is the case after a suspend/resume.

Today ext_csd is NULL if mmc_init_card is called with oldcard != NULL
and power class will not be selected.

According to the eMMC specification the POWER_CLASS value is reset after
power failure, H/W reset assertion and any CMD0 reset.

Signed-off-by: Fredrik Soderstedt <fredrik.soderstedt@stericsson.com>
Reviewed-by: Johan Rudholm <jrudholm@gmail.com>
Acked By: Girish K S <girish.shivananjappa@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:18 -04:00
Ulf Hansson
07a6821608 mmc: core: Restructure and simplify code for mmc sleep|awake
The mmc_card_sleep|awake APIs are not being used since the support is
already properly encapsulated within the suspend sequence. Sleep|awake
command is also specific for eMMC.

We remove the sleep|awake bus_ops, the mmc_card_sleep|awake APIs and
move the code into the mmc specific core instead. This also includes
the mmc ops function, mmc_sleepawake. All releated functions have then
become static and we have got far less code to maintain.

Additionally this patch also simplifies the code from mmc_sleepawake,
since it is only used to put the card to sleep and not awake.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:17 -04:00
Ulf Hansson
c4d770d724 mmc: core: Support aggressive power management for (e)MMC/SD
Aggressive power management is suitable when saving power is
essential. At request inactivity timeout, aka pm runtime
autosuspend timeout, the card will be suspended.

Once a new request arrives, the card will be re-initalized and
thus the first request will suffer from a latency. This latency
is card-specific, experiments has shown in general that SD-cards
has quite poor initialization time, around 300ms-1100ms. eMMC is
not surprisingly far better but still a couple of hundreds of ms
has been observed.

Except for the request latency, it is important to know that
suspending the card will also prevent the card from executing
internal house-keeping operations in idle mode. This could mean
degradation in performance.

To use this feature make sure the request inactivity timeout is
chosen carefully. This has not been done as a part of this patch.

Enable this feature by using host cap MMC_CAP_AGGRESSIVE_PM and
by setting CONFIG_MMC_UNSAFE_RESUME.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:17 -04:00
Ulf Hansson
e94cfef698 mmc: block: Enable runtime pm for mmc blkdevice
Once the mmc blkdevice is being probed, runtime pm will be enabled.
By using runtime autosuspend, the power save operations can be done
when request inactivity occurs for a certain time. Right now the
selected timeout value is set to 3 s. Obviously this value will likely
need to be configurable somehow since it needs to be trimmed depending
on the power save algorithm.

For SD-combo cards, we are still leaving the enablement of runtime PM
to the SDIO init sequence since it depends on the capabilities of the
SDIO func driver.

Moreover, when the blk device is being suspended, we make sure the device
will be runtime resumed. The reason for doing this is that we want the
host suspend sequence to be unaware of any runtime power save operations
done for the card in this phase. Thus it can just handle the suspend as
the card is fully powered from a runtime perspective.

Finally, this patch prepares to make it possible to move BKOPS handling
into the runtime callbacks for the mmc bus_ops. Thus IDLE BKOPS can be
accomplished.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:16 -04:00
Maya Erez
775a9362b5 mmc: card: Adding support for sanitize in eMMC 4.5
The sanitize support is added as a user-app ioctl call, and
was removed from the block-device request, since its purpose is
to be invoked not via File-System but by a user.

This feature deletes the unmap memory region of the eMMC card,
by writing to a specific register in the EXT_CSD.

unmap region is the memory region that was previously deleted
(by erase, trim or discard operation).

In order to avoid timeout when sanitizing large-scale cards,
the timeout for sanitize operation is 240 seconds.

Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:13 -04:00
Ulf Hansson
b689167984 mmc: core: Re-use code for MMC_CAP2_DETECT_ON_ERR in polling mode
Previously the MMC_CAP2_DETECT_ON_ERR was invented for detecting
slow card removal. In was never a realy good solution and a proper
fix has been merged using gpio debouncing instead. We remove this
cap in this patch.

Although when using polling card detect mode, the code invented for
MMC_CAP2_DETECT_ON_ERR is re-used to complete card removal in an
earlier phase. There are no need waiting for the polling timeout to
elapse in this case.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Kevin Liu <kliu5@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-05-26 14:23:12 -04:00
Jiri Pirko
42e52bf9e3 net: add netnotifier event for upper device change
Now when upper device is changed, event is not propagated via RT Netlink
to userspace. Userspace might never now about the change. Fix this by
adding upper-device-change notifier event.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-25 23:12:19 -07:00
Linus Torvalds
27a24cfa04 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dma fixes from Vinod Koul:
 "We have two patches from Andy & Rafael fixing the Lynxpoint dma"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  ACPI / LPSS: register clock device for Lynxpoint DMA properly
  dma: acpi-dma: parse CSRT to extract additional resources
2013-05-25 20:30:31 -07:00
Lars-Peter Clausen
b6b5e76bb8 ASoC: Add ssm2518 support
This patch adds a ASoC CODEC driver for the SSM2516. The SSM2516 is a stereo
Class-D audio amplifier with an I2S interface for audio in and a built-in
dynamic range control processor.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-05-25 10:33:30 -04:00
Linus Torvalds
9cf1848278 Merge branch 'akpm' (incoming from Andrew Morton)
Merge fixes from Andrew Morton:
 "A bunch of fixes and one simple fbdev driver which missed the merge
  window because people will still talking about it (to no great
  effect)."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (30 commits)
  aio: fix kioctx not being freed after cancellation at exit time
  mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
  drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing
  ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()
  random: fix accounting race condition with lockless irq entropy_count update
  drivers/char/random.c: fix priming of last_data
  mm/memory_hotplug.c: fix printk format warnings
  nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
  drivers/block/brd.c: fix brd_lookup_page() race
  fbdev: FB_GOLDFISH should depend on HAS_DMA
  drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq()
  auditfilter.c: fix kernel-doc warnings
  aio: fix io_getevents documentation
  revert "selftest: add simple test for soft-dirty bit"
  drivers/leds/leds-ot200.c: fix error caused by shifted mask
  mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer
  linux/kernel.h: fix kernel-doc warning
  mm compaction: fix of improper cache flush in migration code
  rapidio/tsi721: fix bug in MSI interrupt handling
  hfs: avoid crash in hfs_bnode_create
  ...
2013-05-24 18:12:15 -07:00
David S. Miller
e6ff4c75f9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next because some upcoming net-next changes
build on top of bug fixes that went into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-24 16:48:28 -07:00
Linus Torvalds
00cec111ac ARM: SoC fixes for 3.10-rc
We didn't have any fixes sent up for -rc2, so this is a slightly larger
 batch. A bit all over the place platform-wise; OMAP, at91, marvell,
 renesas, sunxi, ux500, etc.
 
 I tried to summarize highlights but there isn't a whole lot to point
 out. Lots of little things fixed all over. A couple of defconfig updates
 due to new/changing options.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRn+66AAoJEIwa5zzehBx3f/4P/3sqK2z7u5SSa+tpkKYkxezO
 MykUOpUc4tuwrKiuUEeXPh89pjIrclQVzKYDqdaXIcezKB7IXFfQSyLNxDzGM7Y3
 NrqrURvNpDmUi6F/xP89gXWkbvg2zr563mxSMpkF4G8HTYxvCv7sY0W/PDzb48Qg
 q3Efc/AfQsOGM0Zl8WXX9jZBdCNTquZYWd7YEFxCe9oVRGfmNlIYKirPOLnk9MAZ
 iIrbfFQG+cOos0NKrjM+tbtQNnPUBQeZdy3MvR7DtXCpKNdxs5EJL9EyHHqGGzPk
 Jd1CG3hNrPiuKVhmVSDkELNDYNT7J9+/rya1Mc33pwMrReAIKWUU/sitvsOkKypi
 PnTvlJkUv3UM2fOtoF8mOhQLTGdKSWfaF0BfHNmnCqJFDPs8vuQjx7O1WGKKUdC4
 SbYZetUnL3AoMrEbza5EoMyqQwpTXiALWp4/6MunLhNyXo0OQvsqexvT6QIrhKyQ
 0iExuSSbXDZAw2GXsqzOlCJecxHYG4qkGbf1DmeTW31GRQQ3nhpiu0GVAvHIEAhT
 SJGD57P0gbEXMpnSIKLNKIAYlLdFkGx6AhX1/Xb+AL7Jsod+UEwgz2ya4GMI4JI/
 0lUe8fPglU3ws8Up1y5FcIq4gjXTsvsEy317zeW/oq2vac0ACKBika2EVukdMD/5
 SYr+m40EzQtVdTKuaZ+e
 =8tMi
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "We didn't have any fixes sent up for -rc2, so this is a slightly
  larger batch.  A bit all over the place platform-wise; OMAP, at91,
  marvell, renesas, sunxi, ux500, etc.

  I tried to summarize highlights but there isn't a whole lot to point
  out.  Lots of little things fixed all over.  A couple of defconfig
  updates due to new/changing options."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits)
  ARM: at91/sama5: fix incorrect PMC pcr div definition
  ARM: at91/dt: fix macb pinctrl_macb_rmii_mii_alt definition
  ARM: at91: at91sam9n12: move external irq declatation to DT
  ARM: shmobile: marzen: Use error values in usb_power_*
  ARM: tegra: defconfig fixes
  ARM: nomadik: fix IRQ assignment for SMC ethernet
  ARM: vt8500: Add missing NULL terminator in dt_compat
  clk: tegra: add ac97 controller clock
  clk: tegra: remove USB from clk init table
  ARM: dts: mvebu: Fix wrong the address reg value for the L2-cache node
  ARM: plat-orion: Fix num_resources and id for ge10 and ge11
  ARM: OMAP2+: hwmod: Remove sysc slave idle and auto idle apis
  SERIAL: OMAP: Remove the slave idle handling from the driver
  ARM: OMAP2+: serial: Remove the un-used slave idle hooks
  ARM: OMAP2+: hwmod-data: UART IP needs software control to manage sidle modes
  ARM: OMAP2+: hwmod: Add a new flag to handle SIDLE in SWSUP only in active
  ARM: OMAP2+: hwmod: Fix sidle programming in _enable_sysc()/_idle_sysc()
  arm: mvebu: fix the 'ranges' property to handle PCIe
  ARM: mvebu: select ARCH_REQUIRE_GPIOLIB for mvebu platform
  ARM: AM33XX: Add missing .clkdm_name to clkdiv32k_ick clock
  ...
2013-05-24 16:27:37 -07:00
Randy Dunlap
7450231fb3 linux/kernel.h: fix kernel-doc warning
Fix kernel-doc warning in <linux/kernel.h>:

  Warning(include/linux/kernel.h:590): No description found for parameter 'ip'

scripts/kernel-doc cannot handle macros, functions, or function
prototypes between the function or macro that is being documented and
its definition, so move these prototypes above the function that is
being documented.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:51 -07:00
Imre Deak
4c663cfc52 wait: fix false timeouts when using wait_event_timeout()
Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses.  However, at the moment this isn't guaranteed.  If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true.  This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").

Daniel said "We have 3 instances of this bug in drm/i915.  One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics.  I very much like this."

One such bug is reported at
  https://bugs.freedesktop.org/show_bug.cgi?id=64133

Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: "Paul E.  McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Alexandre Bounine
bc8fcfea18 rapidio: add enumeration/discovery start from user space
Add RapidIO enumeration/discovery start from user space.  User space
start allows to defer RapidIO fabric scan until the moment when all
participating endpoints are initialized avoiding mandatory synchronized
start of all endpoints (which may be challenging in systems with large
number of RapidIO endpoints).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Alexandre Bounine
a11650e110 rapidio: make enumeration/discovery configurable
Systems that use RapidIO fabric may need to implement their own
enumeration and discovery methods which are better suitable for needs of
a target application.

The following set of patches is intended to simplify process of
introduction of new RapidIO fabric enumeration/discovery methods.

The first patch offers ability to add new RapidIO enumeration/discovery
methods using kernel configuration options.  This new configuration
option mechanism allows to select statically linked or modular
enumeration/discovery method(s) from the list of existing methods or use
external module(s).

This patch also updates the currently existing enumeration/discovery
code to be used as a statically linked or modular method.

The corresponding configuration option is named "Basic
enumeration/discovery" method.  This is the only one configuration
option available today but new methods are expected to be introduced
after adoption of provided patches.

The second patch address a long time complaint of RapidIO subsystem
users regarding fabric enumeration/discovery start sequence.  Existing
implementation offers only a boot-time enumeration/discovery start which
requires synchronized boot of all endpoints in RapidIO network.  While
it works for small closed configurations with limited number of
endpoints, using this approach in systems with large number of endpoints
is quite challenging.

To eliminate requirement for synchronized start the second patch
introduces RapidIO enumeration/discovery start from user space.

For compatibility with the existing RapidIO subsystem implementation,
automatic boot time enumeration/discovery start can be configured in by
specifying "rio-scan.scan=1" command line parameter if statically linked
basic enumeration method is selected.

This patch:

Rework to implement RapidIO enumeration/discovery method selection
combined with ability to use enumeration/discovery as a kernel module.

This patch adds ability to introduce new RapidIO enumeration/discovery
methods using kernel configuration options.  Configuration option
mechanism allows to select statically linked or modular
enumeration/discovery method from the list of existing methods or use
external modules.  If a modular enumeration/discovery is selected each
RapidIO mport device can have its own method attached to it.

The existing enumeration/discovery code was updated to be used as
statically linked or modular method.  This configuration option is named
"Basic enumeration/discovery" method.

Several common routines have been moved from rio-scan.c to make them
available to other enumeration methods and reduce number of exported
symbols.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Linus Torvalds
eb3d33900a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "It's been a while since my last pull request so quite a few fixes have
  piled up."

Indeed.

 1) Fix nf_{log,queue} compilation with PROC_FS disabled, from Pablo
    Neira Ayuso.

 2) Fix data corruption on some tg3 chips with TSO enabled, from Michael
    Chan.

 3) Fix double insertion of VLAN tags in be2net driver, from Sarveshwar
    Bandi.

 4) Don't have TCP's MD5 support pass > PAGE_SIZE page offsets in
    scatter-gather entries into the crypto layer, the crypto layer can't
    handle that.  From Eric Dumazet.

 5) Fix lockdep splat in 802.1Q MRP code, also from Eric Dumazet.

 6) Fix OOPS in netfilter log module when called from conntrack, from
    Hans Schillstrom.

 7) FEC driver needs to use netif_tx_{lock,unlock}_bh() rather than the
    non-BH disabling variants.  From Fabio Estevam.

 8) TCP GSO can generate out-of-order packets, fix from Eric Dumazet.

 9) vxlan driver doesn't update 'used' field of fdb entries when it
    should, from Sridhar Samudrala.

10) ipv6 should use kzalloc() to allocate inet6 socket cork options,
    otherwise we can OOPS in ip6_cork_release().  From Eric Dumazet.

11) Fix races in bonding set mode, from Nikolay Aleksandrov.

12) Fix checksum generation regression added by "r8169: fix 8168evl
    frame padding.", from Francois Romieu.

13) ip_gre can look at stale SKB data pointer, fix from Eric Dumazet.

14) Fix checksum handling when GSO is enabled in bnx2x driver with
    certain chips, from Yuval Mintz.

15) Fix double free in batman-adv, from Martin Hundebøll.

16) Fix device startup synchronization with firmware in tg3 driver, from
    Nithin Sujit.

17) perf networking dropmonitor doesn't work at all due to mixed up
    trace parameter ordering, from Ben Hutchings.

18) Fix proportional rate reduction handling in tcp_ack(), from Nandita
    Dukkipati.

19) IPSEC layer doesn't return an error when a valid state is detected,
    causing an OOPS.  Fix from Timo Teräs.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
  be2net: bug fix on returning an invalid nic descriptor
  tcp: xps: fix reordering issues
  net: Revert unused variable changes.
  xfrm: properly handle invalid states as an error
  virtio_net: enable napi for all possible queues during open
  tcp: bug fix in proportional rate reduction.
  net: ethernet: sun: drop unused variable
  net: ethernet: korina: drop unused variable
  net: ethernet: apple: drop unused variable
  qmi_wwan: Added support for Cinterion's PLxx WWAN Interface
  perf: net_dropmonitor: Remove progress indicator
  perf: net_dropmonitor: Use bisection in symbol lookup
  perf: net_dropmonitor: Do not assume ordering of dictionaries
  perf: net_dropmonitor: Fix symbol-relative addresses
  perf: net_dropmonitor: Fix trace parameter order
  net: fec: use a more proper compatible string for MVF type device
  qlcnic: Fix updating netdev->features
  qlcnic: remove netdev->trans_start updates within the driver
  qlcnic: Return proper error codes from probe failure paths
  tg3: Update version to 3.132
  ...
2013-05-24 08:27:32 -07:00
Tejun Heo
75501a6d59 cgroup: update iterators to use cgroup_next_sibling()
This patch converts cgroup_for_each_child(),
cgroup_next_descendant_pre/post() and thus
cgroup_for_each_descendant_pre/post() to use cgroup_next_sibling()
instead of manually dereferencing ->sibling.next.

The only reason the iterators couldn't allow dropping RCU read lock
while iteration is in progress was because they couldn't determine the
next sibling safely once RCU read lock is dropped.  Using
cgroup_next_sibling() removes that problem and enables all iterators
to allow dropping RCU read lock in the middle.  Comments are updated
accordingly.

This makes the iterators easier to use and will simplify controllers.

Note that @cgroup argument is renamed to @cgrp in
cgroup_for_each_child() because it conflicts with "struct cgroup" used
in the new macro body.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2013-05-24 10:55:38 +09:00
Tejun Heo
53fa526174 cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.

It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.

This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.

This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.

Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.

v2: Typo fix as per Serge.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-05-24 10:55:38 +09:00
Tejun Heo
bdc7119f1b cgroup: make cgroup_is_removed() static
cgroup_is_removed() no longer has external users and it shouldn't grow
any - controllers should deal with cgroup_subsys_state on/offline
state instead of cgroup removal state.  Make it static.

While at it, make it return bool.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-24 10:55:38 +09:00