Commit graph

47621 commits

Author SHA1 Message Date
Tejun Heo
4452226ea2 writeback: move backing_dev_info->state into bdi_writeback
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear.  For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi.  To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.

This patch moves bdi->state into wb.

* enum bdi_state is renamed to wb_state and the prefix of all enums is
  changed from BDI_ to WB_.

* Explicit zeroing of bdi->state is removed without adding zeoring of
  wb->state as the whole data structure is zeroed on init anyway.

* As there's still only one bdi_writeback per backing_dev_info, all
  uses of bdi->state are mechanically replaced with bdi->wb.state
  introducing no behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: drbd-dev@lists.linbit.com
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Tejun Heo
ad7fa852d3 memcg: implement mem_cgroup_css_from_page()
Implement mem_cgroup_css_from_page() which returns the
cgroup_subsys_state of the memcg associated with a given page on the
default hierarchy.  This will be used by cgroup writeback support.

This function assumes that page->mem_cgroup association doesn't change
until the page is released, which is true on the default hierarchy as
long as replace_page_cache_page() is not used.  As the only user of
replace_page_cache_page() is FUSE which won't support cgroup writeback
for the time being, this works for now, and replace_page_cache_page()
will soon be updated so that the invariant actually holds.

Note that the RCU protected page->mem_cgroup access is consistent with
other usages across memcg but ultimately incorrect.  These unlocked
accesses are missing required barriers.  page->mem_cgroup should be
made an RCU pointer and updated and accessed using RCU operations.

v4: Instead of triggering WARN, return the root css on the traditional
    hierarchies.  This makes the function a lot easier to deal with
    especially as there's no light way to synchronize against
    hierarchy rebinding.

v3: s/mem_cgroup_migrate()/mem_cgroup_css_from_page()/

v2: Trigger WARN if the function is used on the traditional
    hierarchies and add comment about the assumed invariant.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Tejun Heo
1d933cf096 blkcg: implement bio_associate_blkcg()
Currently, a bio can only be associated with the io_context and blkcg
of %current using bio_associate_current().  This is too restrictive
for cgroup writeback support.  Implement bio_associate_blkcg() which
associates a bio with the specified blkcg.

bio_associate_blkcg() leaves the io_context unassociated.
bio_associate_current() is updated so that it considers a bio as
already associated if it has a blkcg_css, instead of an io_context,
associated with it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Tejun Heo
fd383c2d3c blkcg: implement task_get_blkcg_css()
Implement a wrapper around task_get_css() to acquire the blkcg css for
a given task.  The wrapper is necessary for cgroup writeback support
as there will be places outside blkcg proper trying to acquire
blkcg_css and blkio_cgrp_id will be undefined when !CONFIG_BLK_CGROUP.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Tejun Heo
ec438699a9 cgroup, block: implement task_get_css() and use it in bio_associate_current()
bio_associate_current() currently open codes task_css() and
css_tryget_online() to find and pin $current's blkcg css.  Abstract it
into task_get_css() which is implemented from cgroup side.  As a task
is always associated with an online css for every subsystem except
while the css_set update is propagating, task_get_css() retries till
css_tryget_online() succeeds.

This is a cleanup and shouldn't lead to noticeable behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Tejun Heo
496d5e7560 blkcg: add blkcg_root_css
Add global constant blkcg_root_css which points to &blkcg_root.css.
This will be used by cgroup writeback support.  If blkcg is disabled,
it's defined as ERR_PTR(-EINVAL).

v2: The declarations moved to include/linux/blk-cgroup.h as suggested
    by Vivek.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Tejun Heo
56161634e4 memcg: add mem_cgroup_root_css
Add global mem_cgroup_root_css which points to the root memcg css.
This will be used by cgroup writeback support.  If memcg is disabled,
it's defined as ERR_PTR(-EINVAL).

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
aCc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Tejun Heo
efa7d1c733 update !CONFIG_BLK_CGROUP dummies in include/linux/blk-cgroup.h
The header file will be used more widely with the pending cgroup
writeback support and the current set of dummy declarations aren't
enough to handle different config combinations.  Update as follows.

* Drop the struct cgroup declaration.  None of the dummy defs need it.

* Define blkcg as an empty struct instead of just declaring it.

* Wrap dummy function defs in CONFIG_BLOCK.  Some functions use block
  data types and none of them are to be used w/o block enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Tejun Heo
eea8f41cc5 blkcg: move block/blk-cgroup.h to include/linux/blk-cgroup.h
cgroup aware writeback support will require exposing some of blkcg
details.  In preprataion, move block/blk-cgroup.h to
include/linux/blk-cgroup.h.  This patch is pure file move.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Greg Thelen
c4843a7593 memcg: add per cgroup dirty page accounting
When modifying PG_Dirty on cached file pages, update the new
MEM_CGROUP_STAT_DIRTY counter.  This is done in the same places where
global NR_FILE_DIRTY is managed.  The new memcg stat is visible in the
per memcg memory.stat cgroupfs file.  The most recent past attempt at
this was http://thread.gmane.org/gmane.linux.kernel.cgroups/8632

The new accounting supports future efforts to add per cgroup dirty
page throttling and writeback.  It also helps an administrator break
down a container's memory usage and provides evidence to understand
memcg oom kills (the new dirty count is included in memcg oom kill
messages).

The ability to move page accounting between memcg
(memory.move_charge_at_immigrate) makes this accounting more
complicated than the global counter.  The existing
mem_cgroup_{begin,end}_page_stat() lock is used to serialize move
accounting with stat updates.
Typical update operation:
	memcg = mem_cgroup_begin_page_stat(page)
	if (TestSetPageDirty()) {
		[...]
		mem_cgroup_update_page_stat(memcg)
	}
	mem_cgroup_end_page_stat(memcg)

Summary of mem_cgroup_end_page_stat() overhead:
- Without CONFIG_MEMCG it's a no-op
- With CONFIG_MEMCG and no inter memcg task movement, it's just
  rcu_read_lock()
- With CONFIG_MEMCG and inter memcg  task movement, it's
  rcu_read_lock() + spin_lock_irqsave()

A memcg parameter is added to several routines because their callers
now grab mem_cgroup_begin_page_stat() which returns the memcg later
needed by for mem_cgroup_update_page_stat().

Because mem_cgroup_begin_page_stat() may disable interrupts, some
adjustments are needed:
- move __mark_inode_dirty() from __set_page_dirty() to its caller.
  __mark_inode_dirty() locking does not want interrupts disabled.
- use spin_lock_irqsave(tree_lock) rather than spin_lock_irq() in
  __delete_from_page_cache(), replace_page_cache_page(),
  invalidate_complete_page2(), and __remove_mapping().

   text    data     bss      dec    hex filename
8925147 1774832 1785856 12485835 be84cb vmlinux-!CONFIG_MEMCG-before
8925339 1774832 1785856 12486027 be858b vmlinux-!CONFIG_MEMCG-after
                            +192 text bytes
8965977 1784992 1785856 12536825 bf4bf9 vmlinux-CONFIG_MEMCG-before
8966750 1784992 1785856 12537598 bf4efe vmlinux-CONFIG_MEMCG-after
                            +773 text bytes

Performance tests run on v4.0-rc1-36-g4f671fe2f952.  Lower is better for
all metrics, they're all wall clock or cycle counts.  The read and write
fault benchmarks just measure fault time, they do not include I/O time.

* CONFIG_MEMCG not set:
                            baseline                              patched
  kbuild                 1m25.030000(+-0.088% 3 samples)       1m25.426667(+-0.120% 3 samples)
  dd write 100 MiB          0.859211561 +-15.10%                  0.874162885 +-15.03%
  dd write 200 MiB          1.670653105 +-17.87%                  1.669384764 +-11.99%
  dd write 1000 MiB         8.434691190 +-14.15%                  8.474733215 +-14.77%
  read fault cycles       254.0(+-0.000% 10 samples)            253.0(+-0.000% 10 samples)
  write fault cycles     2021.2(+-3.070% 10 samples)           1984.5(+-1.036% 10 samples)

* CONFIG_MEMCG=y root_memcg:
                            baseline                              patched
  kbuild                 1m25.716667(+-0.105% 3 samples)       1m25.686667(+-0.153% 3 samples)
  dd write 100 MiB          0.855650830 +-14.90%                  0.887557919 +-14.90%
  dd write 200 MiB          1.688322953 +-12.72%                  1.667682724 +-13.33%
  dd write 1000 MiB         8.418601605 +-14.30%                  8.673532299 +-15.00%
  read fault cycles       266.0(+-0.000% 10 samples)            266.0(+-0.000% 10 samples)
  write fault cycles     2051.7(+-1.349% 10 samples)           2049.6(+-1.686% 10 samples)

* CONFIG_MEMCG=y non-root_memcg:
                            baseline                              patched
  kbuild                 1m26.120000(+-0.273% 3 samples)       1m25.763333(+-0.127% 3 samples)
  dd write 100 MiB          0.861723964 +-15.25%                  0.818129350 +-14.82%
  dd write 200 MiB          1.669887569 +-13.30%                  1.698645885 +-13.27%
  dd write 1000 MiB         8.383191730 +-14.65%                  8.351742280 +-14.52%
  read fault cycles       265.7(+-0.172% 10 samples)            267.0(+-0.000% 10 samples)
  write fault cycles     2070.6(+-1.512% 10 samples)           2084.4(+-2.148% 10 samples)

As expected anon page faults are not affected by this patch.

tj: Updated to apply on top of the recent cancel_dirty_page() changes.

Signed-off-by: Sha Zhengju <handai.szj@gmail.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Tejun Heo
11f81becca page_writeback: revive cancel_dirty_page() in a restricted form
cancel_dirty_page() had some issues and b9ea25152e ("page_writeback:
clean up mess around cancel_dirty_page()") replaced it with
account_page_cleaned() which makes the caller responsible for clearing
the dirty bit; unfortunately, the planned changes for cgroup writeback
support requires synchronization between dirty bit manipulation and
stat updates.  While we can open-code such synchronization in each
account_page_cleaned() callsite, that's gonna be unnecessarily awkward
and verbose.

This patch revives cancel_dirty_page() but in a more restricted form.
All it does is TestClearPageDirty() followed by account_page_cleaned()
invocation if the page was dirty.  This helper covers all
account_page_cleaned() usages except for __delete_from_page_cache()
which is a special case anyway and left alone.  As this leaves no
module user for account_page_cleaned(), EXPORT_SYMBOL() is dropped
from it.

This patch just revives cancel_dirty_page() as a trivial wrapper to
replace equivalent usages and doesn't introduce any functional
changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Ira Weiny
a97e2d86a9 IB/core cleanup: Add const on args - device->process_mad
The process_mad device function declares some parameters as "in".  Make those
parameters const and adjust the call tree under process_mad in the various
drivers accordingly.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02 09:33:13 -04:00
Viresh Kumar
3434d23b69 clockevents: Add helpers to check the state of a clockevent device
Some clockevent drivers, once migrated to use per-state callbacks,
need to check the state of the clockevent device in their callbacks or
interrupt handler.

Add accessor functions clockevent_state_*() to get this information.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/04a717d490335c688dd7af899fbcede97e1bb8ee.1432192527.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-02 14:40:47 +02:00
Tero Kristo
989feafb84 clk: ti: move low-level access and init code under clock driver
With most of the clock code under clock driver already, the low-level
register access code, and the init code for the same, is no longer
needed outside the clock driver. Thus, these can be moved under clock
driver also.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:46 +03:00
Tero Kristo
e9e63088e4 clk: ti: remove exported ll_ops struct, instead add an API for registration
We should avoid exporting data from drivers, instead use an API for
registering the clock low level operations.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:46 +03:00
Tero Kristo
a3314e9cf6 clk: ti: move some public definitions to private header
Several exported TI clock driver features are no longer needed outside
the clock driver itself, thus move all of these to the driver private
header file. Also, update some of the driver files to actually include
this header.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:45 +03:00
Tero Kristo
c9a58b0a84 clk: ti: am3517: move remaining am3517 clock support code to clock driver
With legacy clock support gone, this is no longer needed under platform,
so move it under the clock driver itself. Make some exports be driver
internal definitions at the same time.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:45 +03:00
Tero Kristo
f2671d5c6c clk: ti: omap34xx: move omap34xx clock type support code to clock driver
With the legacy clock data gone, this is no longer needed under platform,
so move it under the clock driver itself. Remove unnecessary declarations
from the TI clock header also.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:37 +03:00
Tero Kristo
bd86cfdcbd clk: ti: clkdm: move clkdm gate clock support code to clock driver
With the legacy clock data gone, this is no longer needed under platform,
so move it under the clock driver itself. Remove the exported clock driver
APIs as well, as these are not needed outside clock driver anymore.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:30 +03:00
Tero Kristo
d5a04dddf5 clk: ti: omap2430: move clock support code under clock driver
With the legacy clock support gone, this is no longer needed under
platform code-base. Thus, move this under the TI clock driver, and
remove the exported API from the public header.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:29 +03:00
Tero Kristo
9f37e90efa clk: ti: dflt: move support for default gate clock to clock driver
With the legacy support gone, OMAP2+ default gate clock can be moved
under clock driver. Create a new file for the purpose, and clean-up
the header exports a bit as some clock APIs are no longer needed
outside clock driver itself.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:22 +03:00
Tero Kristo
046b7c3166 ARM: OMAP2+: clock: remove clkdm_control static boolean from code
clkdm_control is used to determine, whether clocks should trigger a
clockdomain transition when they are enabled/disabled. Keep this
functionality intact, but replace this with a clk_features flag
which can be initialized during boot if needed.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:21 +03:00
Tero Kristo
0565fb168d clk: ti: dpll: move omap3 DPLL functionality to clock driver
With the legacy clock support gone, OMAP3 generic DPLL code can now be
moved over to the clock driver also. A few un-unused clkoutx2 functions
are also removed at the same time.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:14 +03:00
Tero Kristo
192383d87b ARM: OMAP2+: clock: add support for specific CM ops to ti_clk_ll_ops
Clock driver requires access to some CM API functions once the code
is being moved under the clock driver from the platform directory.
Gate type clock requires access to cm_wait_module_ready and
cm_split_idlest_reg functions, which are both used for waiting until
the module being clocked has been successfully activated. These CM
APIs are now exported through the ti_clk_ll_ops struct.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:13 +03:00
Tero Kristo
9a356d622e ARM: OMAP2+: clock: add support for clkdm ops to the low level clk ops
Clock driver requires access to certain clockdomain handling ops once
the code is being moved over under clock driver. Example of this is
clk_enable / clk_disable under omap3 DPLL code. The required clkdm
APIs are now exported through the ti_clk_ll_ops struct.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:13 +03:00
Tero Kristo
a5aa8a603e clk: ti: move omap2_clk_enable_init_clocks under clock driver
This is no longer used outside clock driver, so move it under the driver
and remove the export for it from the global header file.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:13 +03:00
Tero Kristo
bf22bae794 clk: ti: autoidle: move generic autoidle handling code to clock driver
This is no longer needed in platform directory, as the legacy clock data
is gone, so move it under TI clock driver. Some static functions are
renamed also.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:31:13 +03:00
Tero Kristo
ef14db0977 clk: ti: move interface clock implementation under drivers/clk
With the legacy clock support gone, the OMAP interface clock implementation
can be moved under the clock driver. Some temporary header file tweaks are
also needed to make this change work properly.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:30:59 +03:00
Tero Kristo
59245ce01a clk: ti: move OMAP4+ DPLL implementation under drivers/clk
With the legacy clock support gone, the OMAP4 specific DPLL implementations
can be moved under the clock driver. Change some of the function prototypes
to be static at the same time, and remove some exports from the global TI
clock driver header.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:30:58 +03:00
Tero Kristo
b138b0283d clk: ti: move generic OMAP DPLL implementation under drivers/clk
With the legacy clock data now gone, we can start moving OMAP clock
type implementations under clock driver. Start this with moving the
generic OMAP DPLL clock type under TI clock driver.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:30:58 +03:00
Tero Kristo
f3b19aa5ca ARM: OMAP2+: clock: export driver API to setup/get clock features
As most of the clock driver support code is going to be moved under
drivers/clk/ti, an API for setting / getting the SoC specific clock
features is needed. This patch provides this API and changes the
existing code to use it.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
2015-06-02 12:30:58 +03:00
Ingo Molnar
6b33033c24 * Use idiomatic negative error values in efivar_create_sysfs_entry()
instead of returning '1' to indicate error - Dan Carpenter
 
  * New support to expose the EFI System Resource Tables in sysfs, which
    provides information for performing firmware updates - Peter Jones
 
  * Documentation cleanup in the EFI handover protocol section which
    falsely claimed that 'cmdline_size' needed to be filled out by the
    boot loader - Alex Smith
 
  * Align the order of SMBIOS tables in /sys/firmware/efi/systab to match
    the way that we do things for ACPI and add documentation to
    Documentation/ABI - Jean Delvare
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVazVXAAoJEC84WcCNIz1VtVIP/1bwaRIw4eHBuunTY5ONZ9FP
 +uP0hyvUyGajES91PArqWpCeubn6hOAENT98+Tp+w81n3BPL3ZKKZB5jIbIpVqiF
 IOlpUud+MlpoHbyBleCVQHBG6+8pfE8ty3sC+gljjDhjaXnT1QJt9IdoEMpLnx7P
 pS0b9RzBVHJX1Y0ILMXstJKtNjyZfsxZ031XbjEuRfw7V2DtptkjRivR8EKDBKsG
 kNYcHxJJX/+DE9+pNPc3wrByBasQlBmrnZpwP3LIG12GRtoEZzbogHmFExeQZ+9k
 Gp3xuyOFx2Texl7bXM0artWbtTdzQj1ai8MoT5fQexy0UzO1TtlkdfaBkYKd3mtY
 AxvLPxCQpmGMV16T3QNaHEocFDAHSUvc2o85sQj+EdHhUcSkFybi4rSpDFf7HzO6
 x6xkt2Fu9d7GEpZG1O7V/v1uMNsp3tOBRMiMdruRq2Ui2UV8s616DqfjtoX/pkS3
 clNGrGZlUfDegKhkCuQqfUZY4jz/gioCEciY1S4auz/OX5jK0NTWUmAWzBnnWjsC
 M/RHbTbRbYGh1lTUSZQIdGSe5ejW/kBGMCeNh5ZmaxsZx057TYywSqLvo4PVoxON
 DTJUMwP2X/rzS2L3o3KVdjDTf3PTw7tQbieAjr5M4N7cd0I+BjRWBcQaCOnA0qN0
 SQwqdWeY/ZHcZftbgCAw
 =Twjn
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi

Pull EFI changes from Matt Fleming:

  - Use idiomatic negative error values in efivar_create_sysfs_entry()
    instead of returning '1' to indicate error. (Dan Carpenter)

  - Implement new support to expose the EFI System Resource Tables in sysfs,
    which provides information for performing firmware updates. (Peter Jones)

  - Documentation cleanup in the EFI handover protocol section which
    falsely claimed that 'cmdline_size' needed to be filled out by the
    boot loader. (Alex Smith)

  - Align the order of SMBIOS tables in /sys/firmware/efi/systab to match
    the way that we do things for ACPI and add documentation to
    Documentation/ABI. (Jean Delvare)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:38:11 +02:00
Ingo Molnar
085c789783 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

  - Initialization/Kconfig updates: hide most Kconfig options from unsuspecting users.
    There's now a single high level configuration option:

      *
      * RCU Subsystem
      *
      Make expert-level adjustments to RCU configuration (RCU_EXPERT) [N/y/?] (NEW)

    Which if answered in the negative, leaves us with a single interactive
    configuration option:

      Offload RCU callback processing from boot-selected CPUs (RCU_NOCB_CPU) [N/y/?] (NEW)

    All the rest of the RCU options are configured automatically.

  - Remove all uses of RCU-protected array indexes: replace the
    rcu_[access|dereference]_index_check() APIs with READ_ONCE() and rcu_lockdep_assert().

  - RCU CPU-hotplug cleanups.

  - Updates to Tiny RCU: a race fix and further code shrinkage.

  - RCU torture-testing updates: fixes, speedups, cleanups and
    documentation updates.

  - Miscellaneous fixes.

  - Documentation updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:18:34 +02:00
Ingo Molnar
f407a82586 Merge branch 'linus' into sched/core, to resolve conflict
Conflicts:
	arch/sparc/include/asm/topology_64.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:05:42 +02:00
David S. Miller
dda922c831 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/amd-xgbe-phy.c
	drivers/net/wireless/iwlwifi/Kconfig
	include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 22:51:30 -07:00
Greg Kroah-Hartman
e152813ff5 usb: patches for v4.2 merge window
- dwc2 adds hibernation support
 - preparation for sunxi glue to musb driver
 - new ULPI bus
 - new ULPI PHY driver for TUSB1210
 - musb patches to support multiple DMA engines on same binary
 - support for R-Car E2 on renesas_usbhs
 
 Signed-off-by: Felipe Balbi <balbi@ti.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVaIsMAAoJEIaOsuA1yqREQXQP/2tYG5H2f71mpYAuOKu1KcYi
 DnFH5bKgVQZwl+Dafa41B7P40qub7goCULQSmAgErKboHQvzf7GhzbH22jG3cUg+
 icowa+pEstXxzcwQGOMVIOZ3jAIQCZFFOizolfAaCHrnYNGwaCKq+5809EdMyXHc
 IdYnkYFYdOVGjTPypzoHf8fjpMoCfMx+TLh2ZV4ICj4o5cVXkr9oXNAA8/B/dn+T
 MLK02IPmMOVNJs2gYFQSsGYy03YVR+y62jrYXa+bIa1lzYr4v2yx5SymYTcCWMsH
 c6szhMxCOoL3CGzONCT52b+ogfpCzefl0HV6yvpAQmRcgbAiRuVt3caq1TBpvMq0
 NJVChEfFtOaUR+UGNkBUzh7/CjcVlH8516NQlCIdPemB8HzqBfRMv9MjHsGt1rXC
 SHQMQSTOBw2fbdG5zvi06BOO2hCStqXLNCHARJ2Z5YjP7mLrwteO+wYg2FG9zhTp
 ShGZlHPqP/U+50WuzorHEpr/pRTJh3LhDlYJ6CBwyd4+vg17PrYYbkf82d9WDEDL
 BFH9qe7jfOwZiWHJvNFeHLriKzj/k3i9/DoRaqtm7QwJgafzBz7l+319jvruupnG
 8oxhr+gt74uDPIzxWWv+eFOtO+Uu/q8C3dSp0H/0sY4soBELIgf+QtOkwn596rFp
 mBLSkru+oEN62IQbTKDK
 =Jqzf
 -----END PGP SIGNATURE-----

Merge tag 'usb-for-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next

Felipe writes:

usb: patches for v4.2 merge window

- dwc2 adds hibernation support
- preparation for sunxi glue to musb driver
- new ULPI bus
- new ULPI PHY driver for TUSB1210
- musb patches to support multiple DMA engines on same binary
- support for R-Car E2 on renesas_usbhs

Signed-off-by: Felipe Balbi <balbi@ti.com>
2015-06-02 10:47:03 +09:00
Toshiaki Makita
66e5133f19 vlan: Add GRO support for non hardware accelerated vlan
Currently packets with non-hardware-accelerated vlan cannot be handled
by GRO. This causes low performance for 802.1ad and stacked vlan, as their
vlan tags are currently not stripped by hardware.

This patch adds GRO support for non-hardware-accelerated vlan and
improves receive performance of them.

Test Environment:
 vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599)

Result:

- Before

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    5233.17

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.27     58.03      0.00     41.70      0.00

- After

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    7586.85

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.50     25.83      0.00     59.53     14.14

[ Register VLAN offloads with priority 10 -DaveM ]

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 16:50:52 -07:00
David S. Miller
bdef7de4b8 net: Add priority to packet_offload objects.
When we scan a packet for GRO processing, we want to see the most
common packet types in the front of the offload_base list.

So add a priority field so we can handle this properly.

IPv4/IPv6 get the highest priority with the implicit zero priority
field.

Next comes ethernet with a priority of 10, and then we have the MPLS
types with a priority of 15.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 14:56:09 -07:00
Jens Axboe
843e8ddb25 Merge branch 'for-4.2/core' into for-4.2/drivers 2015-06-01 14:36:02 -06:00
Keith Busch
f26cdc8536 blk-mq: Shared tag enhancements
Storage controllers may expose multiple block devices that share hardware
resources managed by blk-mq. This patch enhances the shared tags so a
low-level driver can access the shared resources not tied to the unshared
h/w contexts. This way the LLD can dynamically add and delete disks and
request queues without having to track all the request_queue hctx's to
iterate outstanding tags.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-01 14:35:56 -06:00
Maxime Ripard
4af34b572a drivers: soc: sunxi: Introduce SoC driver to map SRAMs
The Allwinner SoCs have a handful of SRAM that can be either mapped to be
accessible by devices or the CPU.

That mapping is controlled by an SRAM controller, and that mapping might
not be set by the bootloader, for example if the device wasn't used at all,
or if we're using solutions like the U-Boot's Falcon Boot.

We could also imagine changing this at runtime for example to change the
mapping of these SRAMs to use them for suspend/resume or runtime memory
rate change, if that ever happens.

These use cases require some API in the kernel to control that mapping,
exported through a drivers/soc driver.

This driver also implement a debugfs file that shows the SRAM found in the
system, the current mapping and the SRAM that have been claimed by some
drivers in the kernel.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-06-01 17:57:34 +02:00
Arnd Bergmann
72275b4c08 mvebu drivers change for 4.2
mvebu-mbus: add mv_mbus_dram_info_nooverlap() needed for the new
 Marvell crypto driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlVo26sACgkQCwYYjhRyO9UQ4gCeLb6E7RWXGXtLBXRvRi+1a5Vm
 GckAn2a/4vAQExwlDsiqHMsM2Iw9Phc7
 =1es8
 -----END PGP SIGNATURE-----

Merge tag 'mvebu-drivers-4.2' of git://git.infradead.org/linux-mvebu into next/drivers

Merge "mvebu drivers change for 4.2" from Gregory CLEMENT:

mvebu-mbus: add mv_mbus_dram_info_nooverlap() needed for the new
Marvell crypto driver

* tag 'mvebu-drivers-4.2' of git://git.infradead.org/linux-mvebu:
  bus: mvebu-mbus: add mv_mbus_dram_info_nooverlap()

Based on the earlier bug fixes branch, which contains six other
patches already merged into 4.1.
2015-06-01 17:34:57 +02:00
Greg Kurz
7d82410950 virtio: add explicit big-endian support to memory accessors
The current memory accessors logic is:
- little endian if little_endian
- native endian (i.e. no byteswap) if !little_endian

If we want to fully support cross-endian vhost, we also need to be
able to convert to big endian.

Instead of changing the little_endian argument to some 3-value enum, this
patch changes the logic to:
- little endian if little_endian
- big endian if !little_endian

The native endian case is handled by all users with a trivial helper. This
patch doesn't change any functionality, nor it does add overhead.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:54 +02:00
Greg Kurz
5da7b160b3 vringh: introduce vringh_is_little_endian() helper
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:52 +02:00
Greg Kurz
cf561f0d2e virtio: introduce virtio_is_little_endian() helper
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:49 +02:00
Masahiro Yamada
b3da97ee58 pinctrl: use "const struct ..." rather than "struct ... const"
Only this member, pins, is defined as "struct ... const *", but the
others in this struct, pinlops, pmxops, confops, etc. are defined as
"const struct ... *".

Swap the "struct pinctrl_pin_desc" and "const" for consistency.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-06-01 15:48:12 +02:00
Masahiro Yamada
5fbf65d5c9 pinctrl: remove useless const qualifier
This "const" claims the get_function_groups callback never
changes the given num_groups pointer.  It is always true
in C language, so not worth mentioning.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-06-01 15:47:20 +02:00
Rojhalat Ibrahim
3fff99bc4e gpiolib: rename gpiod_set_array to gpiod_set_array_value
There have been concerns that the function names gpiod_set_array() and
gpiod_get_array() might be confusing to users. One might expect
gpiod_get_array() to return array values, while it is actually the array
counterpart of gpiod_get(). To be consistent with the single descriptor API
we could rename gpiod_set_array() to gpiod_set_array_value(). This makes
some function names a bit lengthy: gpiod_set_raw_array_value_cansleep().

Signed-off-by: Rojhalat Ibrahim <imr@rtschenk.de>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-06-01 15:10:09 +02:00
Goffredo Baroncelli
04fba7864f HID: Export hid_field_extract()
Rename the function extract() to hid_field_extract(), make it external linkage
to allow the use from other modules.

Suggested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-06-01 14:33:35 +02:00
Lars-Peter Clausen
225d59adf1 iio: Specify supported modes for buffers
For each buffer type specify the supported device modes for this buffer.
This allows us for devices which support multiple different operating modes
to pick the correct operating mode based on the modes supported by the
attached buffers.

It also prevents that buffers with conflicting modes are attached
to a device at the same time or that a buffer with a non-supported mode is
attached to a device (e.g. in-kernel callback buffer to a device only
supporting hardware mode).

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2015-06-01 11:31:12 +01:00