Commit graph

47621 commits

Author SHA1 Message Date
Christoph Hellwig
f705f837c5 nvme: consolidate synchronous command submission helpers
Note that we keep the unused timeout argument, but allow callers to
pass 0 instead of a timeout if they want the default.  This will allow
adding a timeout to the pass through path later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:31 -06:00
Arnd Bergmann
a4526915b6 Merge tag 'arm-soc/for-4.2/soc-take2' of http://github.com/broadcom/stblinux into next/soc
Merge mach-bcm changes from Florian Fainelli:

This pull request contains the following changes:

- Rafal adds an additional fault code to be ignored by the kernel on BCM5301X SoC

- BCM63138 SMP support which:
	* common code to control the PMB bus, to be shared with a reset
	  controller driver in drivers/reset
	* secondary CPU initialization sequence using PMB helpers
	* small changes suggested by Russell King to allow platforms to disable VFP

* tag 'arm-soc/for-4.2/soc-take2' of http://github.com/broadcom/stblinux:
  ARM: BCM63xx: Add SMP support for BCM63138
  ARM: vfp: Add vfp_disable for problematic platforms
  ARM: vfp: Add include guards
  ARM: BCM63xx: Add secondary CPU PMB initialization sequence
  ARM: BCM63xx: Add Broadcom BCM63xx PMB controller helpers
  ARM: BCM5301X: Ignore another (BCM4709 specific) fault code
2015-05-22 16:32:02 +02:00
Laxman Dewangan
69eb0980ab regulator: max8973: add mechanism to enable/disable through GPIO
MAX8973 supports the voltage output enable/disable through its EN
pin. This EN pin can be connected through GPIO from host processor.
Add support to provide GPIO number from platform/DT and if it is
valid GPIO then enable external control default.

Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-05-22 13:47:33 +01:00
Srinivas Kandagatla
a2f776cbb8 regmap: Introduce regmap_get_reg_stride
This patch introduces regmap_get_reg_stride() function which would
be used by the infrastructures like nvmem framework built on top of
regmap. Mostly this function would be used for sanity checks on inputs
within such infrastructure.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-05-22 12:19:21 +01:00
Srinivas Kandagatla
668abc729f regmap: Introduce regmap_get_max_register
This patch introduces regmap_get_max_register() function which would be
used by the infrastructures like nvmem framework built on top of
regmap.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-05-22 12:19:18 +01:00
Chanwoo Choi
046050f6e6 extcon: Update the prototype of extcon_register_notifier() with enum extcon
Previously, extcon consumer driver used the extcon_register_interest()
to register the notifier chain and then to receive the notifier event
when external connector's state is changed. When registering the notifier chain
for specific external connector with extcon_register_interest(), it used the
the string name of external connector directly. There are potential problem
because of unclear, non-standard and inconsequent cable name. Namely,
it is not appropriate method to identify each external connector.

So, this patch modify the prototype of extcon_register_notifier() by using
the 'enum extcon' which are the unique id for each external connector
instead of unclear string method.

- Previously, the extcon consumer driver used the extcon_register_interest()
with 'cable_name' to point out the specific external connector. Also. it used
the un-needed structure (struct extcon_specific_cable_nb).
: int extcon_register_interest(struct extcon_specific_cable_nb *obj,
			     const char *extcon_name, const char *cable_name,
			     struct notifier_block *nb)

- Newly, the updated extcon_register_notifier() would definitely support
the same feature to detech the changed state of external connector without
any specific structure (struct extcon_specific_cable_nb).
: int extcon_register_notifier(struct extcon_dev *edev, enum extcon id,
			     struct notifier_block *nb)

This patch support the both extcon_register_interest() and new extcon_register_
notifier(). But the extcon_{register|unregister}_interest() will be deprecated
because extcon core would support the notifier event for extcon consumer driver
with only updated extcon_register_notifier() and 'extcon_specific_cable_nb'
will be removed if there are no extcon consumer driver with legacy
extcon_{register|unregister}_interest().

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
2015-05-22 18:58:55 +09:00
Chanwoo Choi
2a9de9c0f0 extcon: Use the unique id for external connector instead of string
This patch uses the unique id to identify the type of external connector instead
of string name. The string name have the many potential issues. So, this patch
defines the 'extcon' enumeration which includes all supported external connector
on EXTCON subsystem. If new external connector is necessary, the unique id of
new connector have to be added in 'extcon' enumeration. There are current
supported external connector in 'enum extcon' as following:

enum extcon {
	EXTCON_NONE		= 0x0,

	/* USB external connector */
	EXTCON_USB		= 0x1,
	EXTCON_USB_HOST		= 0x2,

	/* Charger external connector */
	EXTCON_TA		= 0x10,
	EXTCON_FAST_CHARGER	= 0x11,
	EXTCON_SLOW_CHARGER	= 0x12,
	EXTCON_CHARGE_DOWNSTREAM = 0x13,

	/* Audio and video external connector */
	EXTCON_LINE_IN		= 0x20,
	EXTCON_LINE_OUT		= 0x21,
	EXTCON_MICROPHONE	= 0x22,
	EXTCON_HEADPHONE	= 0x23,

	EXTCON_HDMI		= 0x30,
	EXTCON_MHL		= 0x31,
	EXTCON_DVI		= 0x32,
	EXTCON_VGA		= 0x33,
	EXTCON_SPDIF_IN		= 0x34,
	EXTCON_SPDIF_OUT	= 0x35,
	EXTCON_VIDEO_IN		= 0x36,
	EXTCON_VIDEO_OUT	= 0x37,

	/* Miscellaneous external connector */
	EXTCON_DOCK		= 0x50,
	EXTCON_JIG		= 0x51,
	EXTCON_MECHANICAL	= 0x52,

	EXTCON_END,
};

For example in extcon-arizona.c:
To use unique id removes the potential issue about handling
the inconsistent name of external connector with string.
- Previously, use the string to register the type of arizona jack connector
static const char *arizona_cable[] = {
	"Mechanical",
	"Microphone",
	"Headphone",
	"Line-out",
};
- Newly, use the unique id to register the type of arizona jack connector
static const enum extcon arizona_cable[] = {
	EXTCON_MECHANICAL,
	EXTCON_MICROPHONE,
	EXTCON_HEADPHONE,
	EXTCON_LINE_OUT,

	EXTCON_NONE,
};

And this patch modify the prototype of extcon_{get|set}_cable_state_() which
uses the 'enum extcon id' instead of 'cable_index'. Because although one more
extcon drivers support USB cable, each extcon driver might has the differnt
'cable_index' for USB cable. All extcon drivers can use the unique id number
for same external connector with modified extcon_{get|set}_cable_state_().

- Previously, use 'cable_index' on these functions:
extcon_get_cable_state_(struct extcon_dev*, int cable_index)
extcon_set_cable_state_(struct extcon_dev*, int cable_index, bool state)

-Newly, use 'enum extcon id' on these functions:
extcon_get_cable_state_(struct extcon_dev*, enum extcon id)
extcon_set_cable_state_(struct extcon_dev*, enum extcon id, bool state)

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Roger Quadros <rogerq@ti.com>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: Ramakrishna Pallala <ramakrishna.pallala@intel.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
[arnd: Report the build break about drivers/usb/phy/phy-tahvo.c after using the
unique id for external connector insteadf of string]
Reported-by: Arnd Bergmann <arnd@arndb.de>
[dan.carpenter: Report the build warning of extcon_{set|get}_cable_state_()]
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2015-05-22 18:58:44 +09:00
Herbert Xu
2d0f230fe0 crypto: aead - Rename aead_alg to old_aead_alg
This patch is the first step in the introduction of a new AEAD
alg type.  Unlike normal conversions this patch only renames the
existing aead_alg structure because there are external references
to it.

Those references will be removed after this patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-22 11:25:51 +08:00
Marcelo Ricardo Leitner
2efd055c53 tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.

RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut

These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.

Join work with Eric Dumazet.

Note that received SYN are accounted on the listener, but sent SYNACK
are not accounted.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 23:25:21 -04:00
Linus Torvalds
6efdb114b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
 "Bugfixes for HID subsystem that should go in 4.1.  Important
  highlights:

   - the patch that extended support for HID++ protocol for TK820
     touchpad turns out to be causing regressions due to firmware
     issues; patch reverting back to basic support from Benjamin
     Tissoires

   - Wacom driver can oops for devices that report non-touch data on
     touch interfaces.  Fix from Ping Cheng

   - gpiolib is not mandatory for i2c-hid, so the driver shouldn't fail
     if gpiolib is not enabled.  Fix from Mika Westerberg"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: wacom: fix an Oops caused by wacom_wac_finger_count_touches
  HID: usbhid: Add HID_QUIRK_NOGET for Aten DVI KVM switch
  HID: hid-sensor-hub: Fix debug lock warning
  Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820"
  HID: i2c-hid: Do not fail probing if gpiolib is not enabled
2015-05-21 17:23:11 -07:00
Linus Torvalds
3d854120e9 The first set of clk fixes for 4.1 are all driver bugs, with the
exception of a single locking fix in the core code. All driver fixes are
 for code that was merged recently. The Samsung stuff is mostly fixes
 around suspend/resume, the Qualcomm fixes are for invalid hardware
 configuration data and the Silicon Labs patches are fixes following
 their move away from platform_data to Device Tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVW/1LAAoJEKI6nJvDJaTUuTEP/i0mykL1zAXOoUDAVOsULkLq
 55MUnknjfoh4CHeLRru+XeBkGNDjLKPNg4y47NadrMvRsav3sqY9WR6HZasB3roa
 uGanJZRr1nVqaD3l2ki8twc4FgNfeEaFfY018Z3dDLSYbnsq990hWQChgTGLxuDN
 iz9fh8DdOALa9dp2qo604oUjVN9QU+rqRClA1d8JhtiEfeCkTjzUBO8pkJ5ZO8ve
 sRgQ6TkY0u69rVTYspot1cwkPLL8gCiX+nTazakhKNxEg6W1q7PrrHtZsWms47P5
 SpemNaSnwwRBy1wYnbZAxgsLOmg/g7seICN/uz/OyR9ZgzRJz66HeI9yPceaCbnz
 9eNxGtsDWvE5uejoqIBqivFULTtuUdo5ZOe5Xwz/b35Hy0l79T5Ag7NmPNwkAqOd
 WOSMuv581tnRpgHz3bG+PFknsjqzCKCoMfW3bjHIH56lZA9AMlJhKuV6g85XY3WA
 WAF+QtBWqD76TMzcf3DMg9egGn2321jvs5/I8jRMa9xCgV0Ycam0DwzDZ6G8KAhK
 nWWoUyQSJU0tElphfylqSFR+00anovbB/BORGwMhZ9gohDR4UCdtCK2FwOxRY7kg
 SWU1ruDpFSfLMstpwU9Rb+9F0TtSkxWWPtk23bto83erYXzx0ezwhLIA/pEPi+mO
 chOqdtgPvtfS3NeBwB/5
 =yjsw
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Michael Turquette:
 "The first set of clk fixes for 4.1 are all driver bugs, with the
  exception of a single locking fix in the core code.

  All driver fixes are for code that was merged recently.  The Samsung
  stuff is mostly fixes around suspend/resume, the Qualcomm fixes are
  for invalid hardware configuration data and the Silicon Labs patches
  are fixes following their move away from platform_data to Device Tree"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: si5351: Do not pass struct clk in platform_data
  clk: si5351: Mention clock-names in the binding documentation
  clk: add missing lock when call clk_core_enable in clk_set_parent
  clk: exynos5420: Restore GATE_BUS_TOP on suspend
  clk: qcom: Fix MSM8916 gfx3d_clk_src configuration
  clk: qcom: Fix MSM8916 venus divider value
  clk: exynos5433: Fix wrong PMS value of exynos5433_pll_rates
  clk: exynos5433: Fix wrong parent clock of sclk_apollo clock
  clk: exynos5433: Fix CLK_PCLK_MONOTONIC_CNT clk register assignment
  clk: exynos5433: Fix wrong offset of PCLK_MSCL_SECURE_SMMU_JPEG
  clk: Use CONFIG_ARCH_EXYNOS instead of CONFIG_ARCH_EXYNOS5433
2015-05-21 16:57:50 -07:00
Lai Jiangshan
37b1ef31a5 workqueue: move flush_scheduled_work() to workqueue.h
flush_scheduled_work() is just a simple call to flush_work().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-05-21 17:26:22 -04:00
Alexei Starovoitov
04fd61ab36 bpf: allow bpf programs to tail-call other bpf programs
introduce bpf_tail_call(ctx, &jmp_table, index) helper function
which can be used from BPF programs like:
int bpf_prog(struct pt_regs *ctx)
{
  ...
  bpf_tail_call(ctx, &jmp_table, index);
  ...
}
that is roughly equivalent to:
int bpf_prog(struct pt_regs *ctx)
{
  ...
  if (jmp_table[index])
    return (*jmp_table[index])(ctx);
  ...
}
The important detail that it's not a normal call, but a tail call.
The kernel stack is precious, so this helper reuses the current
stack frame and jumps into another BPF program without adding
extra call frame.
It's trivially done in interpreter and a bit trickier in JITs.
In case of x64 JIT the bigger part of generated assembler prologue
is common for all programs, so it is simply skipped while jumping.
Other JITs can do similar prologue-skipping optimization or
do stack unwind before jumping into the next program.

bpf_tail_call() arguments:
ctx - context pointer
jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table
index - index in the jump table

Since all BPF programs are idenitified by file descriptor, user space
need to populate the jmp_table with FDs of other BPF programs.
If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere
and program execution continues as normal.

New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can
populate this jmp_table array with FDs of other bpf programs.
Programs can share the same jmp_table array or use multiple jmp_tables.

The chain of tail calls can form unpredictable dynamic loops therefore
tail_call_cnt is used to limit the number of calls and currently is set to 32.

Use cases:
Acked-by: Daniel Borkmann <daniel@iogearbox.net>

==========
- simplify complex programs by splitting them into a sequence of small programs

- dispatch routine
  For tracing and future seccomp the program may be triggered on all system
  calls, but processing of syscall arguments will be different. It's more
  efficient to implement them as:
  int syscall_entry(struct seccomp_data *ctx)
  {
     bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */);
     ... default: process unknown syscall ...
  }
  int sys_write_event(struct seccomp_data *ctx) {...}
  int sys_read_event(struct seccomp_data *ctx) {...}
  syscall_jmp_table[__NR_write] = sys_write_event;
  syscall_jmp_table[__NR_read] = sys_read_event;

  For networking the program may call into different parsers depending on
  packet format, like:
  int packet_parser(struct __sk_buff *skb)
  {
     ... parse L2, L3 here ...
     __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol));
     bpf_tail_call(skb, &ipproto_jmp_table, ipproto);
     ... default: process unknown protocol ...
  }
  int parse_tcp(struct __sk_buff *skb) {...}
  int parse_udp(struct __sk_buff *skb) {...}
  ipproto_jmp_table[IPPROTO_TCP] = parse_tcp;
  ipproto_jmp_table[IPPROTO_UDP] = parse_udp;

- for TC use case, bpf_tail_call() allows to implement reclassify-like logic

- bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table
  are atomic, so user space can build chains of BPF programs on the fly

Implementation details:
=======================
- high performance of bpf_tail_call() is the goal.
  It could have been implemented without JIT changes as a wrapper on top of
  BPF_PROG_RUN() macro, but with two downsides:
  . all programs would have to pay performance penalty for this feature and
    tail call itself would be slower, since mandatory stack unwind, return,
    stack allocate would be done for every tailcall.
  . tailcall would be limited to programs running preempt_disabled, since
    generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would
    need to be either global per_cpu variable accessed by helper and by wrapper
    or global variable protected by locks.

  In this implementation x64 JIT bypasses stack unwind and jumps into the
  callee program after prologue.

- bpf_prog_array_compatible() ensures that prog_type of callee and caller
  are the same and JITed/non-JITed flag is the same, since calling JITed
  program from non-JITed is invalid, since stack frames are different.
  Similarly calling kprobe type program from socket type program is invalid.

- jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map'
  abstraction, its user space API and all of verifier logic.
  It's in the existing arraymap.c file, since several functions are
  shared with regular array map.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:07:59 -04:00
Krzysztof Kozlowski
7f1a57fdd6 power_supply: Fix possible NULL pointer dereference on early uevent
Don't call the power_supply_changed() from power_supply_register() when
parent is still probing because it may lead to accessing parent too
early.

In bq27x00_battery this caused NULL pointer exception because uevent of
power_supply_changed called back the the get_property() method provided
by the driver. The get_property() method accessed pointer which should
be returned by power_supply_register().

Starting from bq27x00_battery_probe():
  di->bat = power_supply_register()
    power_supply_changed()
      kobject_uevent()
        power_supply_uevent()
          power_supply_show_property()
            power_supply_get_property()
              bq27x00_battery_get_property()
                dereference of di->bat which is NULL here

The dereference of di->bat (value returned by power_supply_register())
is the currently visible problem. However calling back the methods
provided by driver before ending the probe may lead to accessing other
driver-related data which is not yet initialized.

The call to power_supply_changed() is postponed till probing ends -
mutex of parent device is released.

Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 297d716f62 ("power_supply: Change ownership from driver to core")
Tested-By: Dr. H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
2015-05-21 15:41:09 +02:00
Florian Fainelli
6e844b0353 ARM: BCM63xx: Add Broadcom BCM63xx PMB controller helpers
This patch adds both common register definitions and helper functions
used to issue read/write commands to the Broadcom BCM63xx PMB controller
which is used to power on and release from reset internal on-chip
peripherals such as the integrated Ethernet switch, AHCI, USB, as well
as the secondary CPU core.

This is going to be utilized by the BCM63138 SMP code, as well as by the
BCM63138 reset controller later.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2015-05-20 15:05:39 -07:00
Arnd Bergmann
1b0c509733 Merge branch 'irq/for-arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next/soc
* 'irq/for-arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: vf610-mscm: Support NVIC parent chip
  irqchip: nvic: Support hierarchy irq domain
  genirq: generic chip: Support hierarchy domain
  genirq: Add irq_chip_(enable/disable)_parent
  irqdomain: Add non-hierarchy helper irq_domain_set_info
2015-05-20 23:09:12 +02:00
Jarod Wilson
be32417796 block: export blkdev_reread_part() and __blkdev_reread_part()
This patch exports blkdev_reread_part() for block drivers, also
introduce __blkdev_reread_part().

For some drivers, such as loop, reread of partitions can be run
from the release path, and bd_mutex may already be held prior to
calling ioctl_by_bdev(bdev, BLKRRPART, 0), so introduce
__blkdev_reread_part for use in such cases.

CC: Christoph Hellwig <hch@lst.de>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20 09:05:42 -06:00
Boris Brezillon
985eba3aba mfd: syscon: Add Atmel MC (Memory Controller) registers definition
The at91rm9200 SoC embeds a Memory Controller block which is used to
configure several aspects of the platform:
- AHB/APB Bus behavior
- SDRAM Controller
- EBI (External Bus Interface) and SMC (Static Memory Controller) config

Those registers might be accessed by different drivers, hence we need to
define it as a syscon device.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2015-05-20 16:37:30 +02:00
Florian Westphal
faecbb45eb Revert "netfilter: bridge: query conntrack about skb dnat"
This reverts commit c055d5b03b.

There are two issues:
'dnat_took_place' made me think that this is related to
-j DNAT/MASQUERADE.

But thats only one part of the story.  This is also relevant for SNAT
when we undo snat translation in reverse/reply direction.

Furthermore, I originally wanted to do this mainly to avoid
storing ipv6 addresses once we make DNAT/REDIRECT work
for ipv6 on bridges.

However, I forgot about SNPT/DNPT which is stateless.

So we can't escape storing address for ipv6 anyway. Might as
well do it for ipv4 too.

Reported-and-tested-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-20 13:51:25 +02:00
Dmitry Torokhov
ec0ccc16a0 module: add core_param_unsafe
Similarly to module_param_unsafe(), add the helper to be used by core
code wishing to expose unsafe debugging or testing parameters that taint
the kernel when set.

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:25 -07:00
Luis R. Rodriguez
d173a137c5 driver-core: enable drivers to opt-out of async probe
There are drivers that can not be probed asynchronously. One such group
is platform drivers registered with platform_driver_probe(), which
expects driver's probe routine be discarded after the driver has been
registered and initial binding attempt executed. Also
platform_driver_probe() an error when no devices were bound to the
driver, allowing failing to load such driver module altogether.

Other drivers do not work well with asynchronous probing because of
driver bug or not optimal driver organization.

To allow using such drivers even when user requests asynchronous probing
as default boot strategy, let's allow them to opt out.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:25 -07:00
Luis R. Rodriguez
f2411da746 driver-core: add driver module asynchronous probe support
Some init systems may wish to express the desire to have device drivers
run their probe() code asynchronously. This implements support for this
and allows userspace to request async probe as a preference through a
generic shared device driver module parameter, async_probe.

Implementation for async probe is supported through a module parameter
given that since synchronous probe has been prevalent for years some
userspace might exist which relies on the fact that the device driver
will probe synchronously and the assumption that devices it provides
will be immediately available after this.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:24 -07:00
Dmitry Torokhov
765230b5f0 driver-core: add asynchronous probing support for drivers
Some devices take a long time when initializing, and not all drivers are
suited to initialize their devices when they are open. For example,
input drivers need to interrogate their devices in order to publish
device's capabilities before userspace will open them. When such drivers
are compiled into kernel they may stall entire kernel initialization.

This change allows drivers request for their probe functions to be
called asynchronously during driver and device registration (manual
binding is still synchronous). Because async_schedule is used to perform
asynchronous calls module loading will still wait for the probing to
complete.

Note that the end goal is to make the probing asynchronous by default,
so annotating drivers with PROBE_PREFER_ASYNCHRONOUS is a temporary
measure that allows us to speed up boot process while we validating and
fixing the rest of the drivers and preparing userspace.

This change is based on earlier patch by "Luis R. Rodriguez"
<mcgrof@suse.com>

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:24 -07:00
Luis R. Rodriguez
ecc8617053 module: add extra argument for parse_params() callback
This adds an extra argument onto parse_params() to be used
as a way to make the unused callback a bit more useful and
generic by allowing the caller to pass on a data structure
of its choice. An example use case is to allow us to easily
make module parameters for every module which we will do
next.

@ parse @
identifier name, args, params, num, level_min, level_max;
identifier unknown, param, val, doing;
type s16;
@@
 extern char *parse_args(const char *name,
 			 char *args,
 			 const struct kernel_param *params,
 			 unsigned num,
 			 s16 level_min,
 			 s16 level_max,
+			 void *arg,
 			 int (*unknown)(char *param, char *val,
					const char *doing
+					, void *arg
					));

@ parse_mod @
identifier name, args, params, num, level_min, level_max;
identifier unknown, param, val, doing;
type s16;
@@
 char *parse_args(const char *name,
 			 char *args,
 			 const struct kernel_param *params,
 			 unsigned num,
 			 s16 level_min,
 			 s16 level_max,
+			 void *arg,
 			 int (*unknown)(char *param, char *val,
					const char *doing
+					, void *arg
					))
{
	...
}

@ parse_args_found @
expression R, E1, E2, E3, E4, E5, E6;
identifier func;
@@

(
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   func);
|
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   &func);
|
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   NULL);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   func);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   &func);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   NULL);
)

@ parse_args_unused depends on parse_args_found @
identifier parse_args_found.func;
@@

int func(char *param, char *val, const char *unused
+		 , void *arg
		 )
{
	...
}

@ mod_unused depends on parse_args_found @
identifier parse_args_found.func;
expression A1, A2, A3;
@@

-	func(A1, A2, A3);
+	func(A1, A2, A3, NULL);

Generated-by: Coccinelle SmPL
Cc: cocci@systeme.lip6.fr
Cc: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:24 -07:00
Tony Lindgren
4990d4fe32 PM / Wakeirq: Add automated device wake IRQ handling
Turns out we can automate the handling for the device_may_wakeup()
quite a bit by using the kernel wakeup source list as suggested
by Rafael J. Wysocki <rjw@rjwysocki.net>.

And as some hardware has separate dedicated wake-up interrupt
in addition to the IO interrupt, we can automate the handling by
adding a generic threaded interrupt handler that just calls the
device PM runtime to wake up the device.

This allows dropping code from device drivers as we currently
are doing it in multiple ways, and often wrong.

For most drivers, we should be able to drop the following
boilerplate code from runtime_suspend and runtime_resume
functions:

	...
	device_init_wakeup(dev, true);
	...
	if (device_may_wakeup(dev))
		enable_irq_wake(irq);
	...
	if (device_may_wakeup(dev))
		disable_irq_wake(irq);
	...
	device_init_wakeup(dev, false);
	...

We can replace it with just the following init and exit
time code:

	...
	device_init_wakeup(dev, true);
	dev_pm_set_wake_irq(dev, irq);
	...
	dev_pm_clear_wake_irq(dev);
	device_init_wakeup(dev, false);
	...

And for hardware with dedicated wake-up interrupts:

	...
	device_init_wakeup(dev, true);
	dev_pm_set_dedicated_wake_irq(dev, irq);
	...
	dev_pm_clear_wake_irq(dev);
	device_init_wakeup(dev, false);
	...

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-20 01:56:31 +02:00
Jiri Slaby
8cdd043ab3 livepatch: introduce patch/func-walking helpers
klp_for_each_object and klp_for_each_func are now used all over the
code. One need not think what is the proper condition to check in the
for loop now.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-05-19 23:58:43 +02:00
Miroslav Benes
cad706df7e livepatch: make kobject in klp_object statically allocated
Make kobj variable (of type struct kobject) statically allocated in
klp_object structure. It will allow us to move in the func-object-patch
hierarchy through kobject links.

The only reason to have it dynamic was to not have empty release
callback in the code. However we have empty callbacks for function and
patch in the code now, so it is no longer valid and the advantage of
static allocation is clear.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-05-19 23:56:41 +02:00
Paolo Bonzini
3520469d65 KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_async
gfn_to_pfn_async is used in just one place, and because of x86-specific
treatment that place will need to look at the memory slot.  Hence inline
it into try_async_pf and export __gfn_to_pfn_memslot.

The patch also switches the subsequent call to gfn_to_pfn_prot to use
__gfn_to_pfn_memslot.  This is a small optimization.  Finally, remove
the now-unused async argument of __gfn_to_pfn.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:45 +02:00
Christoph Hellwig
343df3c79c suspend: simplify block I/O handling
Stop abusing struct page functionality and the swap end_io handler, and
instead add a modified version of the blk-lib.c bio_batch helpers.

Also move the block I/O code into swap.c as they are directly tied into
each other.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Ming Lin <mlin@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:19:59 -06:00
Jens Axboe
b2dbe0a60f block: collapse bio bit space
Various previous patches removed bits and left holes, collapse them
all. Leave the reset start bit where it is, we don't need to change
that.

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:18:28 -06:00
Thomas Gleixner
4e3d9cb013 jiffies: Remove the extra indentation level
Somehow I missed to clean that up when applying the patches. Fix it up
now.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
2015-05-19 17:17:34 +02:00
Christoph Hellwig
97ca223c3b block: remove unused BIO_RW_BLOCK and BIO_EOF flags
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:17:05 -06:00
Christoph Hellwig
b25de9d6da block: remove BIO_EOPNOTSUPP
Since the big barrier rewrite/removal in 2007 we never fail FLUSH or
FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag
to help propagating those to the buffer_head layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:17:03 -06:00
Christoph Hellwig
4ecd4fef3a block: use an atomic_t for mq_freeze_depth
lockdep gets unhappy about the not disabling irqs when using the queue_lock
around it.  Instead of trying to fix that up just switch to an atomic_t
and get rid of the lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:12:59 -06:00
Viresh Kumar
8fff52fd50 clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state
When no timers/hrtimers are pending, the expiry time is set to a
special value: 'KTIME_MAX'. This normally happens with
NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.

When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer
(NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device
(NOHZ_MODE_LOWRES).  But, the clockevent device is already
reprogrammed from the tick-handler for next tick.

As the clock event device is programmed in ONESHOT mode it will at
least fire one more time (unnecessarily). Timers on few
implementations (like arm_arch_timer, etc.) only support PERIODIC mode
and their drivers emulate ONESHOT over that. Which means that on these
platforms we will get spurious interrupts periodically (at last
programmed interval rate, normally tick rate).

In order to avoid spurious interrupts, the clockevent device should be
stopped or its interrupts should be masked.

A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and
update clockevent drivers to STOP generating events (or delay it to
max time) when 'expires' is set to KTIME_MAX. But the drawback here is
that every clockevent driver has to be hacked for this particular case
and its very easy for new ones to miss this.

However, Thomas suggested to add an optional state ONESHOT_STOPPED to
solve this problem: lkml.org/lkml/2014/5/9/508.

This patch adds support for ONESHOT_STOPPED state in clockevents
core. It will only be available to drivers that implement the
state-specific callbacks instead of the legacy ->set_mode() callback.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 16:18:02 +02:00
Thomas Gleixner
c3b5d3cea5 Merge branch 'linus' into timers/core
Make sure the upstream fixes are applied before adding further
modifications.
2015-05-19 16:12:32 +02:00
Thomas Gleixner
a9a3f5c26a Merge branch 'irq/for-x86' into x86/apic
Pull the irq core change which is required to merge the preparatory
patches for posted interrupts.
2015-05-19 15:43:01 +02:00
Thomas Gleixner
a6c761e44c Merge branch 'irq/for-x86' into irq/core
Pull in the branch which can be consumed by x86 to build their changes
on top.
2015-05-19 15:41:30 +02:00
Jiang Liu
0a4377de30 genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU
With Posted-Interrupts support in Intel CPU and IOMMU, an external
interrupt from assigned-devices could be directly delivered to a
virtual CPU in a virtual machine. Instead of hacking KVM and Intel
IOMMU drivers, we propose a platform independent interface to target
an interrupt to a specific virtual CPU in a virtual machine, or set
virtual CPU affinity for an interrupt.

By adopting this new interface and the hierarchy irqdomain, we could
easily support posted-interrupts on Intel platforms, and also provide
flexible enough interfaces for other platforms to support similar
features.

Here is the usage scenario for this interface:
Guest update MSI/MSI-X interrupt configuration
        -->QEMU and KVM handle this
        -->KVM call this interface (passing posted interrupts descriptor
           and guest vector)
        -->irq core will transfer the control to IOMMU
        -->IOMMU will do the real work of updating IRTE (IRTE has new
           format for VT-d Posted-Interrupts)

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Link: http://lkml.kernel.org/r/1432026437-16560-2-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:41:19 +02:00
Nicholas Mc Guire
daa67b4b70 time: Allow gcc to fold constants when possible
To allow constant folding in msecs_to_jiffies() conditionally calls
the HZ dependent _msecs_to_jiffies() helpers or, when gcc can not
figure out constant folding, __msecs_to_jiffies which is the renamed
original msecs_to_jiffies() function.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1431951554-5563-3-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:14:16 +02:00
Nicholas Mc Guire
ca42aaf0c8 time: Refactor msecs_to_jiffies
Refactor the msecs_to_jiffies conditional code part in time.c and 
jiffies.h putting it into conditional functions rather than #ifdefs
to improve readability.

[ tglx: Verified that there is no binary code change ]

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1431951554-5563-2-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:13:46 +02:00
Inha Song
9e86b2ad4c extcon: arizona: Add support for select accessory detect mode when headphone detection
This patch add support for select accessory detect mode to HPDETL or HPDETR.
Arizona provides a headphone detection circuit on the HPDETL and HPDETR pins
to measure the impedance of an external load connected to the headphone.

Depending on board design, headphone detect pins can change to HPDETR or HPDETL.

Signed-off-by: Inha Song <ideal.song@samsung.com>
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2015-05-19 16:39:05 +09:00
Chanwoo Choi
b9ec23c08a extcon: Fix the checkpatch warning and minor coding style issue
This patch clean up the extcon core driver by fixing the checkpatch warning
and minor coding style issue.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2015-05-19 16:39:04 +09:00
Chanwoo Choi
707d755087 extcon: Add extcon_get_edev_name() API to get the extcon device name
This patch adds the extcon_get_edev_name() API to get the name of extcon device
because all information inclued in the structure extcon_dev should be accessed
by extcon core API instead of directly accessing the data.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2015-05-19 16:39:04 +09:00
Ramakrishna Pallala
f03123783d extcon: axp288: Add axp288 extcon driver support
This patch adds the extcon support for AXP288 PMIC which
has the BC1.2 charger detection capability. Additionally
it also adds the USB mux switching support b/w SOC and PMIC
based on GPIO control.

Signed-off-by: Ramakrishna Pallala <ramakrishna.pallala@intel.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
[cw00.choi: Modify the log message to keep the consistent log message pattern]
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2015-05-19 16:39:04 +09:00
Peter Zijlstra
80ed87c8a9 sched/wait: Introduce TASK_NOLOAD and TASK_IDLE
Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
all idle kthreads contribute to the loadavg is somewhat silly.

Now mostly this works OK, because kthreads have all their signals
masked. However there's a few sites where this is causing problems and
TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue.

This patch adds TASK_NOLOAD which, when combined with
TASK_UNINTERRUPTIBLE avoids the loadavg accounting.

As most of imagined usage sites are loops where a thread wants to
idle, waiting for work, a helper TASK_IDLE is introduced.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:18 +02:00
David Hildenbrand
8222dbe21e sched/preempt, mm/fault: Decouple preemption from the page fault logic
As the fault handlers now all rely on the pagefault_disabled() checks
and implicit preempt_disable() calls by pagefault_disable() have been
made explicit, we can completely rely on the pagefault_disableD counter.

So let's no longer touch the preempt count when disabling/enabling
pagefaults. After a call to pagefault_disable(), pagefault_disabled()
will return true, but in_atomic() won't.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-16-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:18 +02:00
David Hildenbrand
70ffdb9393 mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.

Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).

In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().

Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.

faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.

This patch is based on a patch from Thomas Gleixner.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:15 +02:00
David Hildenbrand
2cb7c9cb42 sched/preempt, mm/kmap: Explicitly disable/enable preemption in kmap_atomic_*
The existing code relies on pagefault_disable() implicitly disabling
preemption, so that no schedule will happen between kmap_atomic() and
kunmap_atomic().

Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-5-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:14 +02:00
David Hildenbrand
9ec23531fd sched/preempt, mm/fault: Trigger might_sleep() in might_fault() with disabled pagefaults
Commit 662bbcb274 ("mm, sched: Allow uaccess in atomic with
pagefault_disable()") removed might_sleep() checks for all user access
code (that uses might_fault()).

The reason was to disable wrong "sleep in atomic" warnings in the
following scenario:

    pagefault_disable()
    rc = copy_to_user(...)
    pagefault_enable()

Which is valid, as pagefault_disable() increments the preempt counter
and therefore disables the pagefault handler. copy_to_user() will not
sleep and return an error code if a page is not available.

However, as all might_sleep() checks are removed,
CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario:

    spin_lock(&lock);
    rc = copy_to_user(...)
    spin_unlock(&lock)

If the kernel is compiled with preemption turned on, preempt_disable()
will make in_atomic() detect disabled preemption. The fault handler would
correctly never sleep on user access.
However, with preemption turned off, preempt_disable() is usually a NOP
(with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to
detect disabled preemption nor disabled pagefaults. The fault handler
could sleep.
We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access
functions again, otherwise we can end up with horrible deadlocks.

Root of all evil is that pagefault_disable() acts almost as
preempt_disable(), depending on preemption being turned on/off.

As we now have pagefault_disabled(), we can use it to distinguish
whether user acces functions might sleep.

Convert might_fault() into a makro that calls __might_fault(), to
allow proper file + line messages in case of a might_sleep() warning.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-3-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:14 +02:00