Patch series "rbtree: Cache leftmost node internally", v4.
A series to extending rbtrees to internally cache the leftmost node such
that we can have fast overlap check optimization for all interval tree
users[1]. The benefits of this series are that:
(i) Unify users that do internal leftmost node caching.
(ii) Optimize all interval tree users.
(iii) Convert at least two new users (epoll and procfs) to the new interface.
This patch (of 16):
Red-black tree semantics imply that nodes with smaller or greater (or
equal for duplicates) keys always be to the left and right,
respectively. For the kernel this is extremely evident when considering
our rb_first() semantics. Enabling lookups for the smallest node in the
tree in O(1) can save a good chunk of cycles in not having to walk down
the tree each time. To this end there are a few core users that
explicitly do this, such as the scheduler and rtmutexes. There is also
the desire for interval trees to have this optimization allowing faster
overlap checking.
This patch introduces a new 'struct rb_root_cached' which is just the
root with a cached pointer to the leftmost node. The reason why the
regular rb_root was not extended instead of adding a new structure was
that this allows the user to have the choice between memory footprint
and actual tree performance. The new wrappers on top of the regular
rb_root calls are:
- rb_first_cached(cached_root) -- which is a fast replacement
for rb_first.
- rb_insert_color_cached(node, cached_root, new)
- rb_erase_cached(node, cached_root)
In addition, augmented cached interfaces are also added for basic
insertion and deletion operations; which becomes important for the
interval tree changes.
With the exception of the inserts, which adds a bool for updating the
new leftmost, the interfaces are kept the same. To this end, porting rb
users to the cached version becomes really trivial, and keeping current
rbtree semantics for users that don't care about the optimization
requires zero overhead.
Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically
results in an integer overflow. clang raises a warning if the overflow
occurs in a preprocessor expression. Clear the low-order bits through a
substraction instead of the left-shift to avoid the overflow.
(akpm: no change in .text size in my testing)
Link: http://lkml.kernel.org/r/20170803212020.24939-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have seen some generic code use config parameter CONFIG_CPU_BIG_ENDIAN
to decide the endianness.
Here are the few examples.
include/asm-generic/qrwlock.h
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c
Display warning if CPU_BIG_ENDIAN is not defined on big endian
architecture and also warn if it defined on little endian architectures.
Here is our original discussion
https://lkml.org/lkml/2017/5/24/620
Link: http://lkml.kernel.org/r/1499358861-179979-4-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Michal Simek <monstr@monstr.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First, number of CPUs can't be negative number.
Second, different signnnedness leads to suboptimal code in the following
cases:
1)
kmalloc(nr_cpu_ids * sizeof(X));
"int" has to be sign extended to size_t.
2)
while (loff_t *pos < nr_cpu_ids)
MOVSXD is 1 byte longed than the same MOV.
Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".
Code savings on allyesconfig kernel: -3KB
add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function old new delta
coretemp_cpu_online 450 512 +62
rcu_init_one 1234 1272 +38
pci_device_probe 374 399 +25
...
pgdat_reclaimable_pages 628 556 -72
select_fallback_rq 446 369 -77
task_numa_find_cpu 1923 1807 -116
Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Where possible, call memset16(), memmove() or memcpy() instead of using
open-coded loops. I don't like the calling convention that uses a byte
count instead of a count of u16s, but it's a little late to change that.
Reduces code size of fbcon.o by almost 400 bytes on my laptop build.
[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/20170720184539.31609-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Multibyte memset variations", v4.
A relatively common idiom we're missing is a function to fill an area of
memory with a pattern which is larger than a single byte. I first
noticed this with a zram patch which wanted to fill a page with an
'unsigned long' value. There turn out to be quite a few places in the
kernel which can benefit from using an optimised function rather than a
loop; sometimes text size, sometimes speed, and sometimes both. The
optimised PowerPC version (not included here) improves performance by
about 30% on POWER8 on just the raw memset_l().
Most of the extra lines of code come from the three testcases I added.
This patch (of 8):
memset16(), memset32() and memset64() are like memset(), but allow the
caller to fill the destination with a value larger than a single byte.
memset_l() and memset_p() allow the caller to use unsigned long and
pointer values respectively.
Link: http://lkml.kernel.org/r/20170720184539.31609-2-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This macro is useful to avoid link error on 32-bit systems.
We have the same definition in two drivers, so move it to
include/linux/kernel.h
While we are here, refactor DIV_ROUND_UP_ULL() by using
DIV_ROUND_DOWN_ULL().
Link: http://lkml.kernel.org/r/1500945156-12907-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To avoid deviation, the per cpu number of NUMA stats in
vm_numa_stat_diff[] is included when a user *reads* the NUMA stats.
Since NUMA stats does not be read by users frequently, and kernel does not
need it to make a decision, it will not be a problem to make the readers
more expensive.
Link: http://lkml.kernel.org/r/1503568801-21305-4-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is significant overhead in cache bouncing caused by zone counters
(NUMA associated counters) update in parallel in multi-threaded page
allocation (suggested by Dave Hansen).
This patch updates NUMA counter threshold to a fixed size of MAX_U16 - 2,
as a small threshold greatly increases the update frequency of the global
counter from local per cpu counter(suggested by Ying Huang).
The rationality is that these statistics counters don't affect the
kernel's decision, unlike other VM counters, so it's not a problem to use
a large threshold.
With this patchset, we see 31.3% drop of CPU cycles(537-->369) for per
single page allocation and reclaim on Jesper's page_bench03 benchmark.
Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/
bench
Threshold CPU cycles Throughput(88 threads)
32 799 241760478
64 640 301628829
125 537 358906028 <==> system by default (base)
256 468 412397590
512 428 450550704
4096 399 482520943
20000 394 489009617
30000 395 488017817
65533 369(-31.3%) 521661345(+45.3%) <==> with this patchset
N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics
Link: http://lkml.kernel.org/r/1503568801-21305-3-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Ying Huang <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Separate NUMA statistics from zone statistics", v2.
Each page allocation updates a set of per-zone statistics with a call to
zone_statistics(). As discussed in 2017 MM summit, these are a
substantial source of overhead in the page allocator and are very rarely
consumed. This significant overhead in cache bouncing caused by zone
counters (NUMA associated counters) update in parallel in multi-threaded
page allocation (pointed out by Dave Hansen).
A link to the MM summit slides:
http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf
To mitigate this overhead, this patchset separates NUMA statistics from
zone statistics framework, and update NUMA counter threshold to a fixed
size of MAX_U16 - 2, as a small threshold greatly increases the update
frequency of the global counter from local per cpu counter (suggested by
Ying Huang). The rationality is that these statistics counters don't
need to be read often, unlike other VM counters, so it's not a problem
to use a large threshold and make readers more expensive.
With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
below) for per single page allocation and reclaim on Jesper's
page_bench03 benchmark. Meanwhile, this patchset keeps the same style
of virtual memory statistics with little end-user-visible effects (only
move the numa stats to show behind zone page stats, see the first patch
for details).
I did an experiment of single page allocation and reclaim concurrently
using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
server (88 processors with 126G memory) with different size of threshold
of pcp counter.
Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
Threshold CPU cycles Throughput(88 threads)
32 799 241760478
64 640 301628829
125 537 358906028 <==> system by default
256 468 412397590
512 428 450550704
4096 399 482520943
20000 394 489009617
30000 395 488017817
65533 369(-31.3%) 521661345(+45.3%) <==> with this patchset
N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics
This patch (of 3):
In this patch, NUMA statistics is separated from zone statistics
framework, all the call sites of NUMA stats are changed to use
numa-stats-specific functions, it does not have any functionality change
except that the number of NUMA stats is shown behind zone page stats
when users *read* the zone info.
E.g. cat /proc/zoneinfo
***Base*** ***With this patch***
nr_free_pages 3976 nr_free_pages 3976
nr_zone_inactive_anon 0 nr_zone_inactive_anon 0
nr_zone_active_anon 0 nr_zone_active_anon 0
nr_zone_inactive_file 0 nr_zone_inactive_file 0
nr_zone_active_file 0 nr_zone_active_file 0
nr_zone_unevictable 0 nr_zone_unevictable 0
nr_zone_write_pending 0 nr_zone_write_pending 0
nr_mlock 0 nr_mlock 0
nr_page_table_pages 0 nr_page_table_pages 0
nr_kernel_stack 0 nr_kernel_stack 0
nr_bounce 0 nr_bounce 0
nr_zspages 0 nr_zspages 0
numa_hit 0 *nr_free_cma 0*
numa_miss 0 numa_hit 0
numa_foreign 0 numa_miss 0
numa_interleave 0 numa_foreign 0
numa_local 0 numa_interleave 0
numa_other 0 numa_local 0
*nr_free_cma 0* numa_other 0
... ...
vm stats threshold: 10 vm stats threshold: 10
... ...
The next patch updates the numa stats counter size and threshold.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves all new code including new page migration helper behind kernel
Kconfig option so that there is no codee bloat for arch or user that do
not want to use HMM or any of its associated features.
arm allyesconfig (without all the patchset, then with and this patch):
text data bss dec hex filename
83721896 46511131 27582964 157815991 96814b7 ../without/vmlinux
83722364 46511131 27582964 157816459 968168b vmlinux
[jglisse@redhat.com: struct hmm is only use by HMM mirror functionality]
Link: http://lkml.kernel.org/r/20170825213133.27286-1-jglisse@redhat.com
[sfr@canb.auug.org.au: fix build (arm multi_v7_defconfig)]
Link: http://lkml.kernel.org/r/20170828181849.323ab81b@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170818032858.7447-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unlike unaddressable memory, coherent device memory has a real resource
associated with it on the system (as CPU can address it). Add a new
helper to hotplug such memory within the HMM framework.
Link: http://lkml.kernel.org/r/20170817000548.32038-20-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Platform with advance system bus (like CAPI or CCIX) allow device memory
to be accessible from CPU in a cache coherent fashion. Add a new type of
ZONE_DEVICE to represent such memory. The use case are the same as for
the un-addressable device memory but without all the corners cases.
Link: http://lkml.kernel.org/r/20170817000548.32038-19-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows callers of migrate_vma() to allocate new page for empty CPU
page table entry (pte_none or back by zero page). This is only for
anonymous memory and it won't allow new page to be instanced if the
userfaultfd is armed.
This is useful to device driver that want to migrate a range of virtual
address and would rather allocate new memory than having to fault later
on.
Link: http://lkml.kernel.org/r/20170817000548.32038-18-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow to unmap and restore special swap entry of un-addressable
ZONE_DEVICE memory.
Link: http://lkml.kernel.org/r/20170817000548.32038-17-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch add a new memory migration helpers, which migrate memory
backing a range of virtual address of a process to different memory (which
can be allocated through special allocator). It differs from numa
migration by working on a range of virtual address and thus by doing
migration in chunk that can be large enough to use DMA engine or special
copy offloading engine.
Expected users are any one with heterogeneous memory where different
memory have different characteristics (latency, bandwidth, ...). As an
example IBM platform with CAPI bus can make use of this feature to migrate
between regular memory and CAPI device memory. New CPU architecture with
a pool of high performance memory not manage as cache but presented as
regular memory (while being faster and with lower latency than DDR) will
also be prime user of this patch.
Migration to private device memory will be useful for device that have
large pool of such like GPU, NVidia plans to use HMM for that.
Link: http://lkml.kernel.org/r/20170817000548.32038-15-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new migration mode that allow to offload the copy to a device
DMA engine. This changes the workflow of migration and not all
address_space migratepage callback can support this.
This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).
No additional per-filesystem migratepage testing is needed. I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch). The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.
Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations. But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.
Usual flow is:
For each page {
1 - lock page
2 - call migratepage() callback
3 - (extra locking in some migratepage() callback)
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
5 - copy page
6 - (unlock any extra lock of migratepage() callback)
7 - return from migratepage() callback
8 - unlock page
}
The new mode MIGRATE_SYNC_NO_COPY:
1 - lock multiple pages
For each page {
2 - call migratepage() callback
3 - abort in all problematic migratepage() callback
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
} // finished all calls to migratepage() callback
5 - DMA copy multiple pages
6 - unlock all the pages
To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.
Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.
Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduce a dummy HMM device class so device driver can use it to
create hmm_device for the sole purpose of registering device memory. It
is useful to device driver that want to manage multiple physical device
memory under same struct device umbrella.
Link: http://lkml.kernel.org/r/20170817000548.32038-13-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduce a simple struct and associated helpers for device driver to
use when hotpluging un-addressable device memory as ZONE_DEVICE. It will
find a unuse physical address range and trigger memory hotplug for it
which allocates and initialize struct page for the device memory.
Device driver should use this helper during device initialization to
hotplug the device memory. It should only need to remove the memory once
the device is going offline (shutdown or hotremove). There should not be
any userspace API to hotplug memory expect maybe for host device driver to
allow to add more memory to a guest device driver.
Device's memory is manage by the device driver and HMM only provides
helpers to that effect.
Link: http://lkml.kernel.org/r/20170817000548.32038-12-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A ZONE_DEVICE page that reach a refcount of 1 is free ie no longer have
any user. For device private pages this is important to catch and thus we
need to special case put_page() for this.
Link: http://lkml.kernel.org/r/20170817000548.32038-9-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HMM (heterogeneous memory management) need struct page to support
migration from system main memory to device memory. Reasons for HMM and
migration to device memory is explained with HMM core patch.
This patch deals with device memory that is un-addressable memory (ie CPU
can not access it). Hence we do not want those struct page to be manage
like regular memory. That is why we extend ZONE_DEVICE to support
different types of memory.
A persistent memory type is define for existing user of ZONE_DEVICE and a
new device un-addressable type is added for the un-addressable memory
type. There is a clear separation between what is expected from each
memory type and existing user of ZONE_DEVICE are un-affected by new
requirement and new use of the un-addressable type. All specific code
path are protect with test against the memory type.
Because memory is un-addressable we use a new special swap type for when a
page is migrated to device memory (this reduces the number of maximum swap
file).
The main two additions beside memory type to ZONE_DEVICE is two callbacks.
First one, page_free() is call whenever page refcount reach 1 (which
means the page is free as ZONE_DEVICE page never reach a refcount of 0).
This allow device driver to manage its memory and associated struct page.
The second callback page_fault() happens when there is a CPU access to an
address that is back by a device page (which are un-addressable by the
CPU). This callback is responsible to migrate the page back to system
main memory. Device driver can not block migration back to system memory,
HMM make sure that such page can not be pin into device memory.
If device is in some error condition and can not migrate memory back then
a CPU page fault to device memory should end with SIGBUS.
[arnd@arndb.de: fix warning]
Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are new users of memory hotplug emerging. Some of them require
different subset of arch_add_memory. There are some which only require
allocation of struct pages without mapping those pages to the kernel
address space. We currently have __add_pages for that purpose. But this
is rather lowlevel and not very suitable for the code outside of the
memory hotplug. E.g. x86_64 wants to update max_pfn which should be done
by the caller. Introduce add_pages() which should care about those
details if they are needed. Each architecture should define its
implementation and select CONFIG_ARCH_HAS_ADD_PAGES. All others use the
currently existing __add_pages.
Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This handles page fault on behalf of device driver, unlike
handle_mm_fault() it does not trigger migration back to system memory for
device memory.
Link: http://lkml.kernel.org/r/20170817000548.32038-6-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This does not use existing page table walker because we want to share
same code for our page fault handler.
Link: http://lkml.kernel.org/r/20170817000548.32038-5-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a heterogeneous memory management (HMM) process address space
mirroring. In a nutshell this provide an API to mirror process address
space on a device. This boils down to keeping CPU and device page table
synchronize (we assume that both device and CPU are cache coherent like
PCIe device can be).
This patch provide a simple API for device driver to achieve address space
mirroring thus avoiding each device driver to grow its own CPU page table
walker and its own CPU page table synchronization mechanism.
This is useful for NVidia GPU >= Pascal, Mellanox IB >= mlx5 and more
hardware in the future.
[jglisse@redhat.com: fix hmm for "mmu_notifier kill invalidate_page callback"]
Link: http://lkml.kernel.org/r/20170830231955.GD9445@redhat.com
Link: http://lkml.kernel.org/r/20170817000548.32038-4-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HMM provides 3 separate types of functionality:
- Mirroring: synchronize CPU page table and device page table
- Device memory: allocating struct page for device memory
- Migration: migrating regular memory to device memory
This patch introduces some common helpers and definitions to all of
those 3 functionality.
Link: http://lkml.kernel.org/r/20170817000548.32038-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Soft dirty bit is designed to keep tracked over page migration. This
patch makes it work in the same manner for thp migration too.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When THP migration is being used, memory management code needs to handle
pmd migration entries properly. This patch uses !pmd_present() or
is_swap_pmd() (depending on whether pmd_none() needs separate code or
not) to check pmd migration entries at the places where a pmd entry is
present.
Since pmd-related code uses split_huge_page(), split_huge_pmd(),
pmd_trans_huge(), pmd_trans_unstable(), or
pmd_none_or_trans_huge_or_clear_bad(), this patch:
1. adds pmd migration entry split code in split_huge_pmd(),
2. takes care of pmd migration entries whenever pmd_trans_huge() is present,
3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.
Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
them.
Until this commit, a pmd entry should be:
1. pointing to a pte page,
2. is_swap_pmd(),
3. pmd_trans_huge(),
4. pmd_devmap(), or
5. pmd_none().
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.
This patch makes it possible to support thp migration. If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration. If you succeed to
allocate destination thp, you enter thp migration. Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.
[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce CONFIG_ARCH_ENABLE_THP_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.
Link: http://lkml.kernel.org/r/20170717193955.20207-5-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
TTU_MIGRATION is used to convert pte into migration entry until thp
split completes. This behavior conflicts with thp migration added later
patches, so let's introduce a new TTU flag specifically for freezing.
try_to_unmap() is used both for thp split (via freeze_page()) and page
migration (via __unmap_and_move()). In freeze_page(), ttu_flag given
for head page is like below (assuming anonymous thp):
(TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
TTU_MIGRATION | TTU_SPLIT_HUGE_PMD)
and ttu_flag given for tail pages is:
(TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
TTU_MIGRATION)
__unmap_and_move() calls try_to_unmap() with ttu_flag:
(TTU_MIGRATION | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS)
Now I'm trying to insert a branch for thp migration at the top of
try_to_unmap_one() like below
static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
unsigned long address, void *arg)
{
...
/* PMD-mapped THP migration entry */
if (!pvmw.pte && (flags & TTU_MIGRATION)) {
if (!PageAnon(page))
continue;
set_pmd_migration_entry(&pvmw, page);
continue;
}
...
}
so try_to_unmap() for tail pages called by thp split can go into thp
migration code path (which converts *pmd* into migration entry), while
the expectation is to freeze thp (which converts *pte* into migration
entry.)
I detected this failure as a "bad page state" error in a testcase where
split_huge_page() is called from queue_pages_pte_range().
Link: http://lkml.kernel.org/r/20170717193955.20207-4-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, zfcp and a host of minor updates.
The major driver change here is the elimination of the block based
cciss driver in favour of the SCSI based hpsa driver (which now drives
all the legacy cases cciss used to be required for). Plus a reset
handler clean up and the redo of the SAS SMP handler to use bsg lib.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZscDNAAoJEAVr7HOZEZN4DWIQAK/UkkrvKpV/jLATM/yi7CoL
QidY86Hmwwl7A9HQ+2fjLfAsye0xcCzRwkucKK90IP5b4pefHhiJJfiMKAAe3TUW
xstnY5z5jaOhDG4nyJFoSm5fH5qXkMnJ8NZRK8f6Qg5yBN5dStEKqoBboNsz4KBI
md7idw0mbp5i2GXlJwSpc5eDS97GiPL6WkwgGaGKfXF1NDau0GbEdjijfz55haCD
pMhY7WJh/71RfOq/1ThXT1Z3khOlVcKXrkdO+602n7zh/klRBRtBC8m2a6xCfZPj
n7Pb/s0jhCQPd+e/Xtv7WEbY8uNOCrGoVgZ6U5EGrT5IeTfep24ackYqerjMhE63
esi4BJY8lUP9SGleLMgjYWyCHdmxBJRa7UI614DWN/H0QoGP6j/2EzGoi5Fw04vC
H8/+aqPPWZc9KUBioRYo8xWO8YgMqL2eyXY+Tc9cwxqAe2T6k/NC1zJVgDFKXfzb
QoWW4v9NNmYwf5vL/7tNgkeTMFQV66yUR7dR3SGTSk8UIrJ40ok0JyUAsDg86ZAH
BfMkWwhWQ6Byoel0Y7Ti88T49Cox/64r/I0ux06Qgg99+KpRLT7z20+GLIEHgXxg
116C39rgvYKqzc7W8RCyj8qSROuMVzg6QFbB6n+1PEsYIX2O8A2Re3jdS34q2LbX
aBDm/Lfdl4kkJrV9xY6P
=nQUG
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, zfcp and a host of minor updates.
The major driver change here is the elimination of the block based
cciss driver in favour of the SCSI based hpsa driver (which now drives
all the legacy cases cciss used to be required for). Plus a reset
handler clean up and the redo of the SAS SMP handler to use bsg lib"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: scsi-mq: Always unprepare before requeuing a request
scsi: Show .retries and .jiffies_at_alloc in debugfs
scsi: Improve requeuing behavior
scsi: Call scsi_initialize_rq() for filesystem requests
scsi: qla2xxx: Reset the logo flag, after target re-login.
scsi: qla2xxx: Fix slow mem alloc behind lock
scsi: qla2xxx: Clear fc4f_nvme flag
scsi: qla2xxx: add missing includes for qla_isr
scsi: qla2xxx: Fix an integer overflow in sysfs code
scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2()
scsi: aacraid: get rid of one level of indentation
scsi: aacraid: fix indentation errors
scsi: storvsc: fix memory leak on ring buffer busy
scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough
scsi: smartpqi: remove the smp_handler stub
scsi: hpsa: remove the smp_handler stub
scsi: bsg-lib: pass the release callback through bsg_setup_queue
scsi: Rework handling of scsi_device.vpd_pg8[03]
scsi: Rework the code for caching Vital Product Data (VPD)
scsi: rcu: Introduce rcu_swap_protected()
...
running set*id processes. To do this, the bprm_secureexec LSM hook is
collapsed into the bprm_set_creds hook so the secureexec-ness of an exec
can be determined early enough to make decisions about rlimits and the
resulting memory layouts. Other logic acting on the secureexec-ness of an
exec is similarly consolidated. Capabilities needed some special handling,
but the refactoring removed other special handling, so that was a wash.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJZrwRKAAoJEIly9N/cbcAmhboP/iwLbYfWngIJdu3pYKrW+CEg
uUVY6RNnsumJ5yEhD/yQKXSPmZ8PkC8vexPYvf8TcPOlMRQuhVvdiR0FfSUvkMWy
pB8ZVCyAV1uSnW4BH61FCxHInrahy8jlvQwnAujvw+FNxhcQjyEGKupOLIMGLioQ
8G5Ihf+hOjiXRhKbXueQi89n8i4jEI5YTH1RnC+Gsy8jG11EC9BhPddKSMaUKZA3
HYYqUyV0daYpGuxTOxaRdDO5wb6rlS+B46hqtOsSsIBOQkCjnLCRcdeMCqvXjQmv
kyZj03cPlUjEHqh3d3nB6utvVWReGf/p986//kQjT1OZPhATbySAu7wUHoLik3dU
zuexudNTBROf6YXahMxSJp348GS++xoBFARa78402E++U7C4/eoclbLCWAylBwVA
H+QAHFYRC2WFoskejSYBRPz6HLr1SIaSYMsKbkHqP07zi6p3ic2Uq3XvOP2zL/5p
l/mXa1Fs2vcDOWPER8a8b9mVkJDvuXj6J11lG+q80UWAWC3sd9GkSwOen80ps3Xo
/7dd+h2BAJSSVxZQFxd5YCx99mT0ntQZ797PhjxOY6SX/xUdOCAp9x1zDU5OUovP
q2ty3UTd7tq8h1RnHOnrn9cKmMmI7kpBvEfPGM507cEVjyfsMu2jJtUxN9dXOAkB
aebEsg3C8M6z5OdGVpWH
=Yva4
-----END PGP SIGNATURE-----
Merge tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull secureexec update from Kees Cook:
"This series has the ultimate goal of providing a sane stack rlimit
when running set*id processes.
To do this, the bprm_secureexec LSM hook is collapsed into the
bprm_set_creds hook so the secureexec-ness of an exec can be
determined early enough to make decisions about rlimits and the
resulting memory layouts. Other logic acting on the secureexec-ness of
an exec is similarly consolidated. Capabilities needed some special
handling, but the refactoring removed other special handling, so that
was a wash"
* tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Consolidate pdeath_signal clearing
exec: Use sane stack rlimit under secureexec
exec: Consolidate dumpability logic
smack: Remove redundant pdeath_signal clearing
exec: Use secureexec for clearing pdeath_signal
exec: Use secureexec for setting dumpability
LSM: drop bprm_secureexec hook
commoncap: Move cap_elevated calculation into bprm_set_creds
commoncap: Refactor to remove bprm_secureexec hook
smack: Refactor to remove bprm_secureexec hook
selinux: Refactor to remove bprm_secureexec hook
apparmor: Refactor to remove bprm_secureexec hook
binfmt: Introduce secureexec flag
exec: Correct comments about "point of no return"
exec: Rename bprm->cred_prepared to called_set_creds
and defining more restrictive root directory DAC permissions default
(0750, which can be adjust after boot unlike the CAP_SYSLOG check).
Suggested by Nick Kralevich.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJZrv+iAAoJEIly9N/cbcAmZXcP/jZ7dW3zQiZ2q6YQDokaABT4
AZxGdDrogLQ6wWmV+ApHIYEOTcVvbswvBLwKIE7l9XpG41tIKUe4h9iCVvpBSARP
SpyeawztJ8KNw00EFZWP/hOxCXHeausilea/1zh/+Rt5VhU2YIw/fhew821bjLmh
3exBjoLcWSHHCUY/e9ByMB0mB0SYUmnqhFub77Z6zZMhaRw9/gvPibS1DdmjGPPI
Rq0zejFAqXy50rmbKVTT2QQPq/gQnUyb/Q216ytbSUntaAwfISDrwN74slupjG3S
Vrca+BxThJYZ+rnbqjMDoROgKAYNqyIlvFVCO3H6DUqnPnGROIAeGELAcGyncUo+
6Mdpumhy25K0+YbJkNYxm1cyH0w47EWpIqBqPTh1IhuedDB5cpdamR88dShmMzNA
XhvMhe9eNxI5ZzOg8X8qCEc/hRZoZj5F4m2R+Wh55YRH3rDtuaIzONPvGyJfYYVS
tY8ut/r8+qMID9I4qLtIAmVX2rzR/6BG7H3ofApY0OGFRmCt0nicUdN56JJ+GNRf
7XfpEXDL+sG3fkUk8oQSfSEhLuOseTazLuxrQAWJIZ3FZ4JnRW/a/izlbsI2+nvy
FcC1+tG43ISwir5jZzNznYNrGM01TdFwQ5izKE3E1U+xsBRbR7OT8Y0005Z+GUwW
6feSKts8UKq4tFNt1WY9
=+gsj
-----END PGP SIGNATURE-----
Merge tag 'pstore-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore update from Kees Cook:
"Make pstore permissions more versatile by removing CAP_SYSLOG
requirement and defining more restrictive root directory DAC
permissions default (0750, which can be adjust after boot unlike the
CAP_SYSLOG check).
Suggested by Nick Kralevich"
* tag 'pstore-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps"
pstore: Make default pstorefs root dir perms 0750
Pull quota scaling updates from Jan Kara:
"This contains changes to make the quota subsystem more scalable.
Reportedly it improves number of files created per second on ext4
filesystem on fast storage by about a factor of 2x"
* 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
quota: Add lock annotations to struct members
quota: Reduce contention on dq_data_lock
fs: Provide __inode_get_bytes()
quota: Inline dquot_[re]claim_reserved_space() into callsite
quota: Inline inode_{incr,decr}_space() into callsites
quota: Inline functions into their callsites
ext4: Disable dirty list tracking of dquots when journalling quotas
quota: Allow disabling tracking of dirty dquots in a list
quota: Remove dq_wait_unused from dquot
quota: Move locking into clear_dquot_dirty()
quota: Do not dirty bad dquots
quota: Fix possible corruption of dqi_flags
quota: Propagate ->quota_read errors from v2_read_file_info()
quota: Fix error codes in v2_read_file_info()
quota: Push dqio_sem down to ->read_file_info()
quota: Push dqio_sem down to ->write_file_info()
quota: Push dqio_sem down to ->get_next_id()
quota: Push dqio_sem down to ->release_dqblk()
quota: Remove locking for writing to the old quota format
quota: Do not acquire dqio_sem for dquot overwrites in v2 format
...
- Convert more DT code to use of_property_read_* API.
- Improve DT overlay support when adding multiple overlays.
- Convert printk's to %pOF format specifiers. Most went via subsystem
trees, but picked up the remaining orphans.
- Correct unittests to use preferred "okay" for "status" property value.
- Add a KASLR seed property.
- Vendor prefixes for Mellanox, Theobroma System, Adaptrum, Moxa.
- Fix modalias buffer handling.
- Clean-up of include paths for building dtbs.
- Add bindings for amc6821, isl1208, tsl2x7x, srf02, and srf10 devices.
- Add nvmem bindings for MediaTek MT7623 and MT7622 SoC.
- Add compatible string for Allwinner H5 Mali-450 GPU.
- Fix links to old OpenFirmware docs with new mirror on devicetree.org.
- Remove status property from binding doc examples.
-----BEGIN PGP SIGNATURE-----
iQItBAABCAAXBQJZsVkbEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcPWPhAA
gi3Ekc3680YE1iLnXHkDkZHmzE0KvzhIyHrzqIkoxtISfmboVdydMQFnAfyhPJA4
UA5vBKiL4uhWSpHglQpyY2ld+S9tym3IQrGEhEsHxf6njfQpkiNqVKsTYxGAmwxW
E5B6sFl5O4q9e84pnselFsmx6TI+SlmPrqbN7BiluqczeUu358QlF2x8GZuJDN35
cLJKZSeE/w2xLIRIpHUoh7My8/d3jJ/OxuqXFyt/f42BtGp++WganCQS5XR0dxSA
SMdzHhWDTqCKsih5/80vqVXpDBn8iX6NEx7zKprSRc3mTCNIWHG70m/tNAk6/FQR
gvMR3BJOiA0MOIO3M3qaJeVuFkJDixaXmwL0V/Qpuon+6EMdRIfgcVTScAXNnamP
IHmN7fzFYE9tNCzkQjEHkQtVxyQi+1CAM61dZQD1rwi4M2YZHmNxdfLj4ilRb+q8
2SDugUjz3tEdEzi6huKc5oGwqmJwLQmSlgP+VGcJnt6kotLy+PEdPK1cYWtwSKmp
p/xhbXZSCFcwCHXGbyGE6yOXX4DKaLD11KmAMlJ2zwphfvwE4v/azuLmOtviiYTS
23KGIEZJYwRP1QG/BwsjKhl7x37NeKKKHomryMVF3R7M0mf1VtcdSSYROirNi2+t
AZZSyXoK8E/Fx4hR1YHxJ3TX4aBkJ2rBi3+RgABXa10=
=IBAh
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
"There's a few orphans in the conversion to %pOF printf specifiers
included here that no one else picked up.
Summary:
- Convert more DT code to use of_property_read_* API.
- Improve DT overlay support when adding multiple overlays
- Convert printk's to %pOF format specifiers. Most went via subsystem
trees, but picked up the remaining orphans
- Correct unittests to use preferred "okay" for "status" property
value
- Add a KASLR seed property
- Vendor prefixes for Mellanox, Theobroma System, Adaptrum, Moxa
- Fix modalias buffer handling
- Clean-up of include paths for building dtbs
- Add bindings for amc6821, isl1208, tsl2x7x, srf02, and srf10
devices
- Add nvmem bindings for MediaTek MT7623 and MT7622 SoC
- Add compatible string for Allwinner H5 Mali-450 GPU
- Fix links to old OpenFirmware docs with new mirror on
devicetree.org
- Remove status property from binding doc examples"
* tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (45 commits)
devicetree: Adjust status "ok" -> "okay" under drivers/of/
dt-bindings: Remove "status" from examples
dt-bindings: pinctrl: sh-pfc: Use generic node name
dt-bindings: Add vendor Mellanox
dt-binding: net/phy: fix interrupts description
virt: Convert to using %pOF instead of full_name
macintosh: Convert to using %pOF instead of full_name
ide: pmac: Convert to using %pOF instead of full_name
microblaze: Convert to using %pOF instead of full_name
dt-bindings: usb: musb: Grammar s/the/to/, s/is/are/
of: Use PLATFORM_DEVID_NONE definition
of/device: Fix of_device_get_modalias() buffer handling
of/device: Prevent buffer overflow in of_device_modalias()
dt-bindings: add amc6821, isl1208 trivial bindings
dt-bindings: add vendor prefix for Theobroma Systems
of: search scripts/dtc/include-prefixes path for both CPP and DTC
of: remove arch/$(SRCARCH)/boot/dts from include search path for CPP
of: remove drivers/of/testcase-data from include search path for CPP
of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered
iio: srf08: add device tree binding for srf02 and srf10
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZsZswAAoJEL1qUBy3i3wm2cwQANp9Hufui0BYxD7X9bB3sBvX
LJ2lbwQlZ93gmQo176fvE22wYw90wmf4CG12pdx8CB4TIBOuaEl1EUptaIxHjBkP
+MOmiGxT0YZgUDmSo+sCL+LHEDWU8gcs8MI5ZkAxgALUb0kYiQySH5UlCJJOJIJv
ww1A4K1bo6VONli+y8ZDa3eKgkArg1/pXYl2VejNMSkr88fyzf35fOKD44lMUGos
5KFVWdPMnz1OuVbh199x9vAPZygb4PNzJL1Tf2ntklBfU3mX+582e6uvIPqh4Tuy
D5ZianpBNtA0IBV4G0UE+n/9rOy9oAr34+Tq9Gv0BRLlQXsC8FMEM5xcCDd1kFCI
L6FeiGc1YnEtJpUVEhcWzmpww2EBBRKIB9lJK5c8VUlWf/yZME72Al5R8c7uffcE
hc78eUVwM2fHNmBhmCmK49flC67/LMzHMiaEQ/lSGtGnd8ratclshFuooiPzsJN/
MSTX2B3yxWvBjwmZMNixY6WasduKKJs4K5xHnHJzyLt0mvClLiwgOvi1Dks0t+Ko
imE6cdU3LCUeJJb4I9SgHTTjp33CTqq5ZBhbFsgqjNsfj4o/6GmnWxvSZ4ywqJMW
HwU4iyBKlx2AVfhzIPozuoE67+HJyRYIexsUZi4vt0Pw2qHxwp1RvH+BMQbXGLLg
rJ/H9ef5BAPRkPWxe1HM
=E1ff
-----END PGP SIGNATURE-----
Merge tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
"LED class drivers improvements:
leds-pca955x:
- add Device Tree support and bindings
- use devm_led_classdev_register()
- add GPIO support
- prevent crippled LED class device name
- check for I2C errors
leds-gpio:
- add optional retain-state-shutdown DT property
- allow LED to retain state at shutdown
leds-tlc591xx:
- merge conditional tests
- add missing of_node_put
leds-powernv:
- delete an error message for a failed memory allocation in
powernv_led_create()
leds-is31fl32xx.c
- convert to using custom %pOF printf format specifier
Constify attribute_group structures in:
- leds-blinkm
- leds-lm3533
Make several arrays static const in:
- leds-aat1290
- leds-lp5521
- leds-lp5562
- leds-lp8501"
* tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: pca955x: check for I2C errors
leds: gpio: Allow LED to retain state at shutdown
dt-bindings: leds: gpio: Add optional retain-state-shutdown property
leds: powernv: Delete an error message for a failed memory allocation in powernv_led_create()
leds: lp8501: make several arrays static const
leds: lp5562: make several arrays static const
leds: lp5521: make several arrays static const
leds: aat1290: make array max_mm_current_percent static const
leds: pca955x: Prevent crippled LED device name
leds: lm3533: constify attribute_group structure
dt-bindings: leds: add pca955x
leds: pca955x: add GPIO support
leds: pca955x: use devm_led_classdev_register
leds: pca955x: add device tree support
leds: Convert to using %pOF instead of full_name
leds: blinkm: constify attribute_group structures.
leds: tlc591xx: add missing of_node_put
leds: tlc591xx: merge conditional tests
- Removal of DMA_SG support as we have no users for this feature
- New driver for Altera / Intel mSGDMA IP core
- Support for memset in dmatest and qcom_hidma driver
- Update for non cyclic mode in k3dma, bunch of update in bam_dma, bcm sba-raid
- Constify device ids across drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsXteAAoJEHwUBw8lI4NHPwcP/iihF1n7jOQVtUm3zxPvUV+n
GzU7+rqAEDLKBaIttK28LIjgvg0AC/4aiEsosfCzTjpkzMHteRw00YyplwF7/wdM
O0owKOIub4PriDiL6d/SWFnhcWwv0/KLbyKscQcOwwvkksG/mwMn1VfW7alCrz1w
81TOQaW9SxLxL7guJU0aQHljkudT53l8Dgsp55iC9Ccz515Iuu7dQm3DnSG3sYjJ
Ct4u4MWWzDmmKKpbDoYe/Z+fiQT0WKuGfI7QHURVnw5qLo2sDKREWGbThhRG/lZj
YlnLQnkjWwLU5dyX1MyIWipPxe83sjf/7OwJ7XUlLjD6o+lNEuQxjmNkVAh0hNRc
dgrXRuqPRJMW40uOvAMDHTkexxikWc5ggt5LN9dIYDOdaS4Ch5ewf19SRi9pSDap
FZeIWY1FWwQCAU7HQMwSYyRLBjlmEmeSkElkXCd+2wu5aH2oKOMUMbUIYcqL4fjD
qMAR7kfn6e92fDT1gR1ZKL79Cfe9zsCQA3XmecpC/HwqiE3XtfZuDY/73cXD0MeO
SbJUCv4ldPGjrTKBHvs0wiWbxi5Mj5sXglmSaD0lEhtMsOfhPHY2BGatTzSmKKwO
WwmKAvM8qElQZy2Eh25dvlE04yAOofoJb6Pf/AraQOLTUkMyF8wRWEpltjUuttM9
VzQLvh8s25naKM5mOAM2
=88SI
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"This one features the usual updates to the drivers and one good part
of removing DA_SG from core as it has no users.
Summary:
- Remove DMA_SG support as we have no users for this feature
- New driver for Altera / Intel mSGDMA IP core
- Support for memset in dmatest and qcom_hidma driver
- Update for non cyclic mode in k3dma, bunch of update in bam_dma,
bcm sba-raid
- Constify device ids across drivers"
* tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (52 commits)
dmaengine: sun6i: support V3s SoC variant
dmaengine: sun6i: make gate bit in sun8i's DMA engines a common quirk
dmaengine: rcar-dmac: document R8A77970 bindings
dmaengine: xilinx_dma: Fix error code format specifier
dmaengine: altera: Use macros instead of structs to describe the registers
dmaengine: ti-dma-crossbar: Fix dra7 reserve function
dmaengine: pl330: constify amba_id
dmaengine: pl08x: constify amba_id
dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_COMPLETED
dmaengine: bcm-sba-raid: Explicitly ACK mailbox message after sending
dmaengine: bcm-sba-raid: Add debugfs support
dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_RECEIVED
dmaengine: bcm-sba-raid: Re-factor sba_process_deferred_requests()
dmaengine: bcm-sba-raid: Pre-ack async tx descriptor
dmaengine: bcm-sba-raid: Peek mbox when we have no free requests
dmaengine: bcm-sba-raid: Alloc resources before registering DMA device
dmaengine: bcm-sba-raid: Improve sba_issue_pending() run duration
dmaengine: bcm-sba-raid: Increase number of free sba_request
dmaengine: bcm-sba-raid: Allow arbitrary number free sba_request
dmaengine: bcm-sba-raid: Remove reqs_free_count from sba_device
...
- RK805 Power Management IC (PMIC)
- ROHM BD9571MWV-M MFD Power Management IC (PMIC)
- Texas Instruments TPS68470 Power Management IC (PMIC) & LEDs
- New Device Support
- Add support for HiSilicon Hi6421v530 to hi6421-pmic-core
- Add support for X-Powers AXP806 to axp20x
- Add support for X-Powers AXP813 to axp20x
- Add support for Intel Sunrise Point LPSS to intel-lpss-pci
- New Functionality
- Amend API to provide register layout; atmel-smc
- Fix-ups
- DT re-work; omap, nokia
- Header file location change {I2C => MFD}; dm355evm_msp, tps65010
- Fix chip ID formatting issue(s); rk808
- Optionally register touchscreen devices; da9052-core
- Documentation improvements; twl-core
- Constification; rtsx_pcr, ab8500-core, da9055-i2c, da9052-spi
- Drop unnecessary static declaration; max8925-i2c
- Kconfig changes (missing deps and remove module support)
- Slim down oversized licence statement; hi6421-pmic-core
- Use managed resources (devm_*); lp87565
- Supply proper error checking/handling; t7l66xb
- Bug Fixes
- Fix counter duplication issue; da9052-core
- Fix potential NULL deference issue; max8998
- Leave SPI-NOR write-protection bit alone; lpc_ich
- Ensure device is put into reset during suspend; intel-lpss
- Correct register offset variable size; omap-usb-tll
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJZsP0YAAoJEFGvii+H/HdhrJUP/RB6BTCDMf3WCi5e6PN8IFST
JspCcf4bwKVc5lDvORQglVRfBhKY/uSr7F9xlfXtHx8V60ZNo1VOQcyJBTKIz+IJ
+FQQgM3lEMKIn3QCcu9lKSRomJx55YDnF5SrZ8FzkC8pGLrCYEru5HfqFqOTfPqq
OH2wZSqiX4H/jYdfVzp3bgqXkDff/nSEGTeFankFkv4wRvLGRxlpVuqkRJcvEJA3
d8N9MoBBxkZAtAn2j1H5cHyPx5NrBEM2gkXpDfdd+kJNnFzjL72xsXd6rp+N6rcm
d20eL+1fyJVyvGhGiDOhFwqRAZEqvjPSI4k5kQdRk8IdioGgbmaI74eUbv+rGAKp
P9QdR7n1ctYyVgwnawIwKTPMzdZo5+9kdagCtu8IBVT02zQqVSDKZM7dAYo2rJuF
yw24jONcwHFrKA25n1pLJmMbJGHq83kqqw3q5kl17nyArvOOcyspCTODIL9iskhZ
L0IoIMwQYEj/pnI+iuXl9bJ30v2FIJxyCzUR2u7OJnrH7G27rsoOL0WDqxbp3Dp9
7tD+6OzMiyIEDxtcd74kjg7g9p5HCmcY3FiDWirmQuZIR3abSET4ap+cTYPdFqVZ
widS5Pi4PP40ZFN6+4lbBHLlh6MgpHpig9M03kFAr1SyZnH8nf4TnCsFV+wYPyTb
LR3cKpFeTY8IyFWaLoSg
=TKIm
-----END PGP SIGNATURE-----
Merge tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Drivers
- RK805 Power Management IC (PMIC)
- ROHM BD9571MWV-M MFD Power Management IC (PMIC)
- Texas Instruments TPS68470 Power Management IC (PMIC) & LEDs
New Device Support:
- Add support for HiSilicon Hi6421v530 to hi6421-pmic-core
- Add support for X-Powers AXP806 to axp20x
- Add support for X-Powers AXP813 to axp20x
- Add support for Intel Sunrise Point LPSS to intel-lpss-pci
New Functionality:
- Amend API to provide register layout; atmel-smc
Fix-ups:
- DT re-work; omap, nokia
- Header file location change {I2C => MFD}; dm355evm_msp, tps65010
- Fix chip ID formatting issue(s); rk808
- Optionally register touchscreen devices; da9052-core
- Documentation improvements; twl-core
- Constification; rtsx_pcr, ab8500-core, da9055-i2c, da9052-spi
- Drop unnecessary static declaration; max8925-i2c
- Kconfig changes (missing deps and remove module support)
- Slim down oversized licence statement; hi6421-pmic-core
- Use managed resources (devm_*); lp87565
- Supply proper error checking/handling; t7l66xb
Bug Fixes:
- Fix counter duplication issue; da9052-core
- Fix potential NULL deference issue; max8998
- Leave SPI-NOR write-protection bit alone; lpc_ich
- Ensure device is put into reset during suspend; intel-lpss
- Correct register offset variable size; omap-usb-tll"
* tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (61 commits)
mfd: intel_soc_pmic: Differentiate between Bay and Cherry Trail CRC variants
mfd: intel_soc_pmic: Export separate mfd-cell configs for BYT and CHT
dt-bindings: mfd: Add bindings for ZII RAVE devices
mfd: omap-usb-tll: Fix register offsets
mfd: da9052: Constify spi_device_id
mfd: intel-lpss: Put I2C and SPI controllers into reset state on suspend
mfd: da9055: Constify i2c_device_id
mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices
mfd: t7l66xb: Handle return value of clk_prepare_enable
mfd: Add ROHM BD9571MWV-M PMIC DT bindings
mfd: intel_soc_pmic_chtwc: Turn Kconfig option into a bool
mfd: lp87565: Convert to use devm_mfd_add_devices()
mfd: Add support for TPS68470 device
mfd: lpc_ich: Do not touch SPI-NOR write protection bit on Haswell/Broadwell
mfd: syscon: atmel-smc: Add helper to retrieve register layout
mfd: axp20x: Use correct platform device ID for many PEK
dt-bindings: mfd: axp20x: Introduce bindings for AXP813
mfd: axp20x: Add support for AXP813 PMIC
dt-bindings: mfd: axp20x: Add AXP806 to supported list of chips
mfd: Add ROHM BD9571MWV-M MFD PMIC driver
...
- Continue to refactor the mmc block code to prepare for blkmq
- Move mmc block debugfs into block module
- Next step for eMMC CMDQ by adding a new mmc host interface for it
- Move Kconfig option MMC_DEBUG from core to host
- Some additional minor improvements
MMC host:
- Declare structs as const when applicable
- Explicitly request exclusive reset control when applicable
- Improve some error paths and other various cleanups
- sdhci: Preparations to support SDHCI OMAP
- sdhci: Improve some PM related code
- sdhci: Re-factoring and modernizations
- sdhci-xenon: Add runtime PM and system sleep support
- sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe
- sdhci-cadence: Add system sleep support
- sdhci-of-at91: Improve system sleep support
- dw_mmc: Add support for Hisilicon hi3660
- sunxi: Add support for A83T eMMC
- sunxi: Add support for DDR52 mode
- meson-gx: Add support for UHS-I SD-cards
- meson-gx: Cleanups and improvements
- tmio: Fix CMD12 (STOP) handling
- tmio: Cleanups and improvements
- renesas_sdhi: Add r8a7743/5 support
- renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC
- renesas_sdhi: Cleanups and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZr7CEAAoJEP4mhCVzWIwp5WQQAK9l2Hg1k4tFzxQ5EmKB/9Sm
r3eS2GrosqrsCffR3vSSKnmva/lOrQuBzhqzx1MvWAByMUc5w8Yc8OowrKGhCWm9
pAzi/3Tnjf7A9nAq+0NeGhkwybckam9ZpGhMyC1E4bp63g6PCoHjTcqOMVnjYxHz
cUNNQUz7oCjW6tjtpvdJQWZuIGiScNuyxrTYKi8SUpQZ0LQo8nU9DujKcwsKsZed
gYEIimqOqZnGz1rWs/EP2Y5TSoPVxvnb6nc90gt8kh0nfXYumxKjEmHZ0PB7K97b
pioCN/THtkDgdYn8j3gnDXZYYa6JA4fKKOw+S6VZraLoVLeDtLo5zK353Rr3BscI
SddxLePp5WclRal+WulLLJs1FeY5PN3ji+mxC3FAG6cvCqIyosyU8HKG79Lhwwl6
7qlaDf27BhK71Sf17jzxtc5OwVTkSsY+9iKzVZAw5tIHSLR+nwhjM2vlAVU+oG2r
KAsuVO1CVAqYbeIBJ85R6bPzgRGxQ0Kmkqwxe1QDVhgXl3eC5Ot5N/bOifv7HzV+
m+6W1Wdw6/tUKD5g5c6s2WMijXgTdEnfj7dYXmHHN4q1abAKj0cOVjXtmVb90DHM
5tvfxNurQZCCLo2A88/BYXRd299vBzOy9HAWvMvt5effQfxgFfpC1gc9NkfUTfkA
FTOQ96vOpOmAH5uA0Xvm
=850Z
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Continue to refactor the mmc block code to prepare for blkmq
- Move mmc block debugfs into block module
- Next step for eMMC CMDQ by adding a new mmc host interface for it
- Move Kconfig option MMC_DEBUG from core to host
- Some additional minor improvements
MMC host:
- Declare structs as const when applicable
- Explicitly request exclusive reset control when applicable
- Improve some error paths and other various cleanups
- sdhci: Preparations to support SDHCI OMAP
- sdhci: Improve some PM related code
- sdhci: Re-factoring and modernizations
- sdhci-xenon: Add runtime PM and system sleep support
- sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe
- sdhci-cadence: Add system sleep support
- sdhci-of-at91: Improve system sleep support
- dw_mmc: Add support for Hisilicon hi3660
- sunxi: Add support for A83T eMMC
- sunxi: Add support for DDR52 mode
- meson-gx: Add support for UHS-I SD-cards
- meson-gx: Cleanups and improvements
- tmio: Fix CMD12 (STOP) handling
- tmio: Cleanups and improvements
- renesas_sdhi: Add r8a7743/5 support
- renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC
- renesas_sdhi: Cleanups and improvements"
* tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (145 commits)
mmc: renesas_sdhi: Add r8a7743/5 support
mmc: meson-gx: fix __ffsdi2 undefined on arm32
mmc: sdhci-xenon: add runtime pm support and reimplement standby
mmc: core: Move mmc_start_areq() declaration
mmc: mmci: stop building qcom dml as module
mmc: sunxi: Reset the device at probe time
clk: sunxi-ng: Provide a default reset hook
mmc: meson-gx: rework tuning function
mmc: meson-gx: change default tx phase
mmc: meson-gx: implement voltage switch callback
mmc: meson-gx: use CCF to handle the clock phases
mmc: meson-gx: implement card_busy callback
mmc: meson-gx: simplify interrupt handler
mmc: meson-gx: work around clk-stop issue
mmc: meson-gx: fix dual data rate mode frequencies
mmc: meson-gx: rework clock init function
mmc: meson-gx: rework clk_set function
mmc: meson-gx: rework set_ios function
mmc: meson-gx: cfg init overwrite values
mmc: meson-gx: initialize sane clk default before clock register
...
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
Nothing really major this release, despite quite a lot of activity. Just lots of
things all over the place.
Some things of note include:
- Access via perf to a new type of PMU (IMC) on Power9, which can count both
core events as well as nest unit events (Memory controller etc).
- Optimisations to the radix MMU TLB flushing, mostly to avoid unnecessary Page
Walk Cache (PWC) flushes when the structure of the tree is not changing.
- Reworks/cleanups of do_page_fault() to modernise it and bring it closer to
other architectures where possible.
- Rework of our page table walking so that THP updates only need to send IPIs
to CPUs where the affected mm has run, rather than all CPUs.
- The size of our vmalloc area is increased to 56T on 64-bit hash MMU systems.
This avoids problems with the percpu allocator on systems with very sparse
NUMA layouts.
- STRICT_KERNEL_RWX support on PPC32.
- A new sched domain topology for Power9, to capture the fact that pairs of
cores may share an L2 cache.
- Power9 support for VAS, which is a new mechanism for accessing coprocessors,
and initial support for using it with the NX compression accelerator.
- Major work on the instruction emulation support, adding support for many new
instructions, and reworking it so it can be used to implement the emulation
needed to fixup alignment faults.
- Support for guests under PowerVM to use the Power9 XIVE interrupt controller.
And probably that many things again that are almost as interesting, but I had to
keep the list short. Plus the usual fixes and cleanups as always.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Aneesh Kumar K.V, Anju
T Sudhakar, Arvind Yadav, Balbir Singh, Benjamin Herrenschmidt, Bhumika Goyal,
Breno Leitao, Bryant G. Ly, Christophe Leroy, Cédric Le Goater, Dan Carpenter,
Dou Liyang, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand,
Hannes Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall, LABBE
Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring, Masahiro
Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Rashmica
Gupta, Rob Herring, Rui Teng, Sam Bobroff, Santosh Sivaraj, Scott Wood,
Shilpasri G Bhat, Sukadev Bhattiprolu, Suraj Jitindar Singh, Tobin C. Harding,
Victor Aoqui.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZr83SAAoJEFHr6jzI4aWA6pUP/3CEaj2bSxNzWIwidqyYjuoS
O1moEsP0oYH7eBEWVHalYxvo0QPIIAhbFPaFyrOrgtfDH01Szwu9LcCALGb8orC5
Hg3IY8mpNG3Q1T8wEtTa56Ik4b5ZFty35S5+X9qLNSFoDUqSvGlSsLzhPNN7f2tl
XFm2hWqd8wXCwDsuVSFBCF61M3SAm+g6NMVNJ+VL2KIDCwBrOZLhKDPRoxLTAuMa
jjSdjVIozWyXjUrBFi8HVcoOWLxcT1HsNF0tRs51LwY/+Mlj2jAtFtsx+a06HZa6
f2p/Kcp/MEispSTk064Ap9cC1seXWI18zwZKpCUFqu0Ec2yTAiGdjOWDyYQldIp+
ttVPSHQ01YrVKwDFTtM9CiA0EET6fVPhWgAPkPfvH5TvtKwGkNdy0b+nQLuWrYip
BUmOXmjdIG3nujCzA9sv6/uNNhjhj2y+HWwuV7Qo002VFkhgZFL67u2SSUQLpYPj
PxdkY8pPVq+O+in94oDV3c36dYFF6+g6A6505Vn6eKUm/TLpszRFGkS3bKKA5vtn
74FR+guV/5RwYJcdZbfm04DgAocl7AfUDxpwRxibt6KtAK2VZKQuw4ugUTgYEd7W
mL2+AMmPKuajWXAMTHjCZPbUp9gFNyYyBQTFfGVX/XLiM8erKBnGfoa1/KzUJkhr
fVZLYIO/gzl34PiTIfgD
=UJtt
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Nothing really major this release, despite quite a lot of activity.
Just lots of things all over the place.
Some things of note include:
- Access via perf to a new type of PMU (IMC) on Power9, which can
count both core events as well as nest unit events (Memory
controller etc).
- Optimisations to the radix MMU TLB flushing, mostly to avoid
unnecessary Page Walk Cache (PWC) flushes when the structure of the
tree is not changing.
- Reworks/cleanups of do_page_fault() to modernise it and bring it
closer to other architectures where possible.
- Rework of our page table walking so that THP updates only need to
send IPIs to CPUs where the affected mm has run, rather than all
CPUs.
- The size of our vmalloc area is increased to 56T on 64-bit hash MMU
systems. This avoids problems with the percpu allocator on systems
with very sparse NUMA layouts.
- STRICT_KERNEL_RWX support on PPC32.
- A new sched domain topology for Power9, to capture the fact that
pairs of cores may share an L2 cache.
- Power9 support for VAS, which is a new mechanism for accessing
coprocessors, and initial support for using it with the NX
compression accelerator.
- Major work on the instruction emulation support, adding support for
many new instructions, and reworking it so it can be used to
implement the emulation needed to fixup alignment faults.
- Support for guests under PowerVM to use the Power9 XIVE interrupt
controller.
And probably that many things again that are almost as interesting,
but I had to keep the list short. Plus the usual fixes and cleanups as
always.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh,
Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly,
Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang,
Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes
Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall,
LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring,
Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo,
Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff,
Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu,
Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui"
* tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits)
powerpc/xive: Fix section __init warning
powerpc: Fix kernel crash in emulation of vector loads and stores
powerpc/xive: improve debugging macros
powerpc/xive: add XIVE Exploitation Mode to CAS
powerpc/xive: introduce H_INT_ESB hcall
powerpc/xive: add the HW IRQ number under xive_irq_data
powerpc/xive: introduce xive_esb_write()
powerpc/xive: rename xive_poke_esb() in xive_esb_read()
powerpc/xive: guest exploitation of the XIVE interrupt controller
powerpc/xive: introduce a common routine xive_queue_page_alloc()
powerpc/sstep: Avoid used uninitialized error
axonram: Return directly after a failed kzalloc() in axon_ram_probe()
axonram: Improve a size determination in axon_ram_probe()
axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
powerpc/powernv/npu: Move tlb flush before launching ATSD
powerpc/macintosh: constify wf_sensor_ops structures
powerpc/iommu: Use permission-specific DEVICE_ATTR variants
powerpc/eeh: Delete an error out of memory message at init time
powerpc/mm: Use seq_putc() in two functions
macintosh: Convert to using %pOF instead of full_name
...
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Transparently fall back to other poweroff method(s) if EFI poweroff
fails (and returns)
- Use separate PE/COFF section headers for the RX and RW parts of the
ARM stub loader so that the firmware can use strict mapping
permissions
- Add support for requesting the firmware to wipe RAM at warm reboot
- Increase the size of the random seed obtained from UEFI so CRNG
fast init can complete earlier
- Update the EFI framebuffer address if it points to a BAR that gets
moved by the PCI resource allocation code
- Enable "reset attack mitigation" of TPM environments: this is
enabled if the kernel is configured with
CONFIG_RESET_ATTACK_MITIGATION=y.
- Clang related fixes
- Misc cleanups, constification, refactoring, etc"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/bgrt: Use efi_mem_type()
efi: Move efi_mem_type() to common code
efi/reboot: Make function pointer orig_pm_power_off static
efi/random: Increase size of firmware supplied randomness
efi/libstub: Enable reset attack mitigation
firmware/efi/esrt: Constify attribute_group structures
firmware/efi: Constify attribute_group structures
firmware/dcdbas: Constify attribute_group structures
arm/efi: Split zImage code and data into separate PE/COFF sections
arm/efi: Replace open coded constants with symbolic ones
arm/efi: Remove pointless dummy .reloc section
arm/efi: Remove forbidden values from the PE/COFF header
drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
efi/arm/arm64: Add missing assignment of efi.config_table
efi/libstub/arm64: Set -fpie when building the EFI stub
efi/libstub/arm64: Force 'hidden' visibility for section markers
efi/libstub/arm64: Use hidden attribute for struct screen_info reference
efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP
Pull x86 platform updates from Ingo Molnar:
"The main changes include various Hyper-V optimizations such as faster
hypercalls and faster/better TLB flushes - and there's also some
Intel-MID cleanups"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others()
x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls
x86/platform/intel-mid: Make several arrays static, to make code smaller
MAINTAINERS: Add missed file for Hyper-V
x86/hyper-v: Use hypercall for remote TLB flush
hyper-v: Globalize vp_index
x86/hyper-v: Implement rep hypercalls
hyper-v: Use fast hypercall for HVCALL_SIGNAL_EVENT
x86/hyper-v: Introduce fast hypercall implementation
x86/hyper-v: Make hv_do_hypercall() inline
x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is set
x86/platform/intel-mid: Make 'bt_sfi_data' const
x86/platform/intel-mid: Make IRQ allocation a bit more flexible
x86/platform/intel-mid: Group timers callbacks together
Pull libata updates from Tejun Heo:
"Except for the ahci fix that fixes a boot issue, nothing major in this
pull request. Some new platform controller support and device specific
changes"
* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: zpodd: make arrays cdb static, reduces object code size
ahci: don't use MSI for devices with the silly Intel NVMe remapping scheme
dt-bindings: ata: add DT bindings for MediaTek SATA controller
ata: mediatek: add support for MediaTek SATA controller
pata_octeon_cf: use of_property_read_{bool|u32}()
cs5536: add support for IDE controller variant
ata: sata_gemini: Introduce explicit IDE pin control
ata: sata_gemini: Retire custom pin control
ata: ahci_platform: Add shutdown handler
ata: sata_gemini: explicitly request exclusive reset control
ata: Drop unnecessary static
ata: Convert to using %pOF instead of full_name