This patch enables to use thermal-zone on DT if it was calles as
"renesas,rcar-thermal-gen2".
Previous style (= non thermal-zone) is still supported by
"renesas,rcar-thermal" to keep compatibility for "git bisect".
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Use for_each_available_child_of_node() for iterating over each
available child instead of iterating over each child and then
checking their status.
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
This reverts commit 874bbfe600.
Workqueue used to implicity guarantee that work items queued without
explicit CPU specified are put on the local CPU. Recent changes in
timer broke the guarantee and led to vmstat breakage which was fixed
by 176bed1de5 ("vmstat: explicitly schedule per-cpu work on the CPU
we need it to run on").
vmstat is the most likely to expose the issue and it's quite possible
that there are other similar problems which are a lot more difficult
to trigger. As a preventive measure, 874bbfe600 ("workqueue: make
sure delayed work run in local cpu") was applied to restore the local
CPU guarnatee. Unfortunately, the change exposed a bug in timer code
which got fixed by 22b886dd10 ("timers: Use proper base migration in
add_timer_on()"). Due to code restructuring, the commit couldn't be
backported beyond certain point and stable kernels which only had
874bbfe600 started crashing.
The local CPU guarantee was accidental more than anything else and we
want to get rid of it anyway. As, with the vmstat case fixed,
874bbfe600 is causing more problems than it's fixing, it has been
decided to take the chance and officially break the guarantee by
reverting the commit. A debug feature will be added to force foreign
CPU assignment to expose cases relying on the guarantee and fixes for
the individual cases will be backported to stable as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 874bbfe600 ("workqueue: make sure delayed work run in local cpu")
Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
Cc: stable@vger.kernel.org
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shli@fb.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
get_disk(),get_gendisk() calls have non explicit side effect: they
increase the reference on the disk owner module.
The following is the correct sequence how to get a disk reference and
to put it:
disk = get_gendisk(...);
/* use disk */
owner = disk->fops->owner;
put_disk(disk);
module_put(owner);
fs/block_dev.c is aware of this required module_put() call, but f.e.
blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put
a module reference. To see a leakage in action cgroups throttle config
can be used. In the following script I'm removing throttle for /dev/ram0
(actually this is NOP, because throttle was never set for this device):
# lsmod | grep brd
brd 5175 0
# i=100; while [ $i -gt 0 ]; do echo "1:0 0" > \
/sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \
done
# lsmod | grep brd
brd 5175 100
Now brd module has 100 references.
The issue is fixed by calling module_put() just right away put_disk().
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Gi-Oh Kim <gi-oh.kim@profitbricks.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
API:
- Fix async algif_skcipher, it was broken by recent fixes.
- Fix potential race condition in algif_skcipher with ctx.
- Fix potential memory corruption in algif_skcipher.
- Add missing lock to crypto_user when doing an alg dump.
Drivers:
- marvell/cesa was testing the wrong variable for NULL after
allocation.
- Fix potential double-free in atmel-sha.
- Fix illegal call to sleepin function from atomic context in
atmel-sha"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell/cesa - fix test in mv_cesa_dev_dma_init()
crypto: atmel-sha - remove calls of clk_prepare() from atomic contexts
crypto: atmel-sha - fix atmel_sha_remove()
crypto: algif_skcipher - Do not set MAY_BACKLOG on the async path
crypto: algif_skcipher - Do not dereference ctx without socket lock
crypto: algif_skcipher - Do not assume that req is unchanged
crypto: user - lock crypto_alg_list on alg dump
Long ago, Dave Jones complained about CONFIG_LOCALVERSION_AUTO:
"I don't use the auto config, because I end up filling up /boot unless
I go through and clean them out by hand every time I install a new one
(which I do probably a dozen or so times a day). Is there some easy
way to prune old builds I'm missing?"
To which Bruce replied:
"I run this by hand every now and then. I'm probably doing it all wrong"
And if he is running it wrong, then so am I - because I've been using
this script ever since. It is true that CONFIG_LOCALVERSION_AUTO easily
ends up filling your /boot partition if you don't clean up old versions
regularly, and this script helps make that easier.
Checked with Bruce to see that it's fine to add this to the kernel
scripts. Maybe people will come up with enhancements, but more
importantly, this way I won't misplace this script whenever I install a
new machine and start doing custom kernels for it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a process is doing Random Write with O_DSYNC flag
the I/O wait are not accounted in the kernel (get_cpu_iowait_time_us).
This is preventing the governor or the cpufreq driver to account for
I/O wait and thus use the right pstate
Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com>
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit c014d164f2 ("MIPS: Add platform callback before initializing
the L2 cache") added a platform_early_l2_init function in order to allow
platforms to probe for the CM before L2 initialisation is performed, so
that CM GCRs are available to mips_sc_probe.
That commit actually fails to do anything useful, since it checks
mips_cm_revision to determine whether it should call mips_cm_probe but
the result of mips_cm_revision will always be 0 until mips_cm_probe has
been called. Thus the "early" mips_cm_probe call never occurs.
Fix this & drop the useless weak platform_early_l2_init function by
simply calling mips_cm_probe from setup_arch. For platforms that don't
select CONFIG_MIPS_CM this will be a no-op, and for those that do it
removes the requirement for them to call mips_cm_probe manually
(although doing so isn't harmful for now).
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Jaedon Shin <jaedon.shin@gmail.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12475/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
We have eager and lazy FPU modes, introduced in:
304bceda6a ("x86, fpu: use non-lazy fpu restore for processors supporting xsave")
The result is rather messy. There are two code paths in almost all
of the FPU code, and only one of them (the eager case) is tested
frequently, since most kernel developers have new enough hardware
that we use eagerfpu.
It seems that, on any remotely recent hardware, eagerfpu is a win:
glibc uses SSE2, so laziness is probably overoptimistic, and, in any
case, manipulating TS is far slower that saving and restoring the
full state. (Stores to CR0.TS are serializing and are poorly
optimized.)
To try to shake out any latent issues on old hardware, this changes
the default to eager on all CPUs. If no performance or functionality
problems show up, a subsequent patch could remove lazy mode entirely.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/ac290de61bf08d9cfc2664a4f5080257ffc1075a.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we have an FPU, there's no need to check CR0 for FPU emulation.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/980004297e233c27066d54e71382c44cdd36ef7c.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In eager fpu mode, having deactivated FPU without immediately
reloading some other context is illegal. Therefore, to recover from
FNSAVE, we can't just deactivate the state -- we need to reload it
if we're not actively context switching.
We had this wrong in fpu__save() and fpu__copy(). Fix both.
__kernel_fpu_begin() was fine -- add a comment.
This fixes a warning triggerable with nofxsr eagerfpu=on.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/60662444e13c76f06e23c15c5dcdba31b4ac3d67.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Systems without an FPU are generally old and therefore use lazy FPU
switching. Unsurprisingly, math emulation in eager FPU mode is a
bit buggy. Fix it.
There were two bugs involving kernel code trying to use the FPU
registers in eager mode even if they didn't exist and one BUG_ON()
that was incorrect.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
track_pfn_insert() overwrites the pgprot that is passed in with a value
based on the VMA's page_prot. This is a problem for people trying to
do clever things with the new vm_insert_pfn_prot() as it will simply
overwrite the passed protection flags. If we use the current value of
the pgprot as the base, then it will behave as people are expecting.
Also fix track_pfn_remap() in the same way.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1453742717-10326-2-git-send-email-matthew.r.wilcox@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I changed open coded zeroing loops to explicit memset()s in the
following commit:
5e9ebbd87a ("x86/boot: Micro-optimize reset_early_page_tables()")
The base for the size argument of memset was sizeof(pud_p/pmd_p), which
are pointers - but the initialized array has pud_t/pmd_t elements.
Luckily the two types had the same size, so this did not result in any
runtime misbehavior.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455025494-4063-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The comment is out of data. Also point out the performance drawback
of the barrier();__builtin_memcpy(); barrier() followed by another
copy from stack (__u) to lvalue;
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453757600-11441-1-git-send-email-konrad.wilk@oracle.com
[ Made it a bit more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The pseudo-interleaving in NUMA placement has a fundamental problem:
using hard usage thresholds to spread memory equally between nodes
can prevent workloads from converging, or keep memory "trapped" on
nodes where the workload is barely running any more.
In order for workloads to properly converge, the memory migration
should not be stopped when nodes reach parity, but instead be
distributed according to how heavily memory is used from each node.
This way memory migration and task migration reinforce each other,
instead of one putting the brakes on the other.
Remove the hard thresholds from the pseudo-interleaving code, and
instead use a more gradual policy on memory placement. This also
seems to improve convergence of workloads that do not run flat out,
but sleep in between bursts of activity.
We still want to slow down NUMA scanning and migration once a workload
has settled on a few actively used nodes, so keep the 3/4 hysteresis
in place. Keep track of whether a workload is actively running on
multiple nodes, so task_numa_migrate does a full scan of the system
for better task placement.
In the case of running 3 SPECjbb2005 instances on a 4 node system,
this code seems to result in fairer distribution of memory between
nodes, with more memory bandwidth for each instance.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mgorman@suse.de
Link: http://lkml.kernel.org/r/20160125170739.2fc9a641@annuminas.surriel.com
[ Minor readability tweaks. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
DMI cacheability is very confused on x86.
dmi_early_remap() uses early_ioremap(), which uses FIXMAP_PAGE_IO,
which is __PAGE_KERNEL_IO, which is __PAGE_KERNEL, which is cached.
Don't ask me why this makes any sense.
dmi_remap() uses ioremap(), which requests an uncached mapping.
However, on non-EFI systems, the DMI data generally lives between
0xf0000 and 0x100000, which is in the legacy ISA range, which
triggers a special case in the PAT code that overrides the cache
mode requested by ioremap() and forces a WB mapping.
On a UEFI boot, however, the DMI table can live at any physical
address. On my laptop, it's around 0x77dd0000. That's nowhere near
the legacy ISA range, so the ioremap() implicit uncached type is
honored and we end up with a UC- mapping.
UC- is a very, very slow way to read from main memory, so dmi_walk()
is likely to take much longer than necessary.
Given that, even on UEFI, we do early cached DMI reads, it seems
safe to just ask for cached access. Switch to ioremap_cache().
I haven't tried to benchmark this, but I'd guess it saves several
milliseconds of boot time.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/3147c38e51f439f3c8911db34c7d4ab22d854915.1453791969.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds a chicken bit to turn off INVPCID in case something goes
wrong. It's an early_param() because we do TLB flushes before we
parse __setup() parameters.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds helpers for each of the four currently-specified INVPCID
modes.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After kasan_init() executed, no one is allowed to write to kasan_zero_page,
so write protect it.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1452516679-32040-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently we clear kasan_zero_page before __flush_tlb_all(). This
works with current implementation of native_flush_tlb[_global]()
because it doesn't cause do any writes to kasan shadow memory.
But any subtle change made in native_flush_tlb*() could break this.
Also current code seems doesn't work for paravirt guests (lguest).
Only after the TLB flush we can be sure that kasan_zero_page is not
used as early shadow anymore (instrumented code will not write to it).
So it should cleared it only after the TLB flush.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.
v2: Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
cause us to check for the flow label.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
snd_timer_user_read() has a potential race among parallel reads, as
qhead and qused are updated outside the critical section due to
copy_to_user() calls. Move them into the critical section, and also
sanitize the relevant code a bit.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
sound/firewire/digi00x/amdtp-dot.c:67: warning: type qualifiers ignored on function return type
Drop the bogus "const" type qualifier on the return type of dot_scrt()
to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The hda_jack_tbl entries are managed by snd_array for allowing
multiple jacks. It's good per se, but the problem is that struct
hda_jack_callback keeps the hda_jack_tbl pointer. Since snd_array
doesn't preserve each pointer at resizing the array, we can't keep the
original pointer but have to deduce the pointer at each time via
snd_array_entry() instead. Actually, this resulted in the deference
to the wrong pointer on codecs that have many pins such as CS4208.
This patch replaces the pointer to the NID value as the search key.
As an unexpected good side effect, this even simplifies the code, as
only NID is needed in most cases.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Lockdep is initialized at compile time now. Get rid of lockdep_init().
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: mm-commits@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mike said:
: CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.e.
: kernel with CONFIG_UBSAN_ALIGNMENT=y fails to load without even any error
: message.
:
: The problem is that ubsan callbacks use spinlocks and might be called
: before lockdep is initialized. Particularly this line in the
: reserve_ebda_region function causes problem:
:
: lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
:
: If i put lockdep_init() before reserve_ebda_region call in
: x86_64_start_reservations kernel loads well.
Fix this ordering issue permanently: change lockdep so that it uses hlists
for the hash tables. Unlike a list_head, an hlist_head is in its
initialized state when it is all-zeroes, so lockdep is ready for operation
immediately upon boot - lockdep_init() need not have run.
The patch will also save some memory.
Probably lockdep_init() and lockdep_initialized can be done away with now.
Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: mm-commits@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A slave timer element also unlinks at snd_timer_stop() but it takes
only slave_active_lock. When a slave is assigned to a master,
however, this may become a race against the master's interrupt
handling, eventually resulting in a list corruption. The actual bug
could be seen with a syzkaller fuzzer test case in BugLink below.
As a fix, we need to take timeri->timer->lock when timer isn't NULL,
i.e. assigned to a master, while the assignment to a master itself is
protected by slave_active_lock.
BugLink: http://lkml.kernel.org/r/CACT4Y+Y_Bm+7epAb=8Wi=AaWd+DYS7qawX52qxdCfOfY49vozQ@mail.gmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
schedstats is very useful during debugging and performance tuning but it
incurs overhead to calculate the stats. As such, even though it can be
disabled at build time, it is often enabled as the information is useful.
This patch adds a kernel command-line and sysctl tunable to enable or
disable schedstats on demand (when it's built in). It is disabled
by default as someone who knows they need it can also learn to enable
it when necessary.
The benefits are dependent on how scheduler-intensive the workload is.
If it is then the patch reduces the number of cycles spent calculating
the stats with a small benefit from reducing the cache footprint of the
scheduler.
These measurements were taken from a 48-core 2-socket
machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a
single socket machine 8-core machine with Intel i7-3770 processors.
netperf-tcp
4.5.0-rc1 4.5.0-rc1
vanilla nostats-v3r1
Hmean 64 560.45 ( 0.00%) 575.98 ( 2.77%)
Hmean 128 766.66 ( 0.00%) 795.79 ( 3.80%)
Hmean 256 950.51 ( 0.00%) 981.50 ( 3.26%)
Hmean 1024 1433.25 ( 0.00%) 1466.51 ( 2.32%)
Hmean 2048 2810.54 ( 0.00%) 2879.75 ( 2.46%)
Hmean 3312 4618.18 ( 0.00%) 4682.09 ( 1.38%)
Hmean 4096 5306.42 ( 0.00%) 5346.39 ( 0.75%)
Hmean 8192 10581.44 ( 0.00%) 10698.15 ( 1.10%)
Hmean 16384 18857.70 ( 0.00%) 18937.61 ( 0.42%)
Small gains here, UDP_STREAM showed nothing intresting and neither did
the TCP_RR tests. The gains on the 8-core machine were very similar.
tbench4
4.5.0-rc1 4.5.0-rc1
vanilla nostats-v3r1
Hmean mb/sec-1 500.85 ( 0.00%) 522.43 ( 4.31%)
Hmean mb/sec-2 984.66 ( 0.00%) 1018.19 ( 3.41%)
Hmean mb/sec-4 1827.91 ( 0.00%) 1847.78 ( 1.09%)
Hmean mb/sec-8 3561.36 ( 0.00%) 3611.28 ( 1.40%)
Hmean mb/sec-16 5824.52 ( 0.00%) 5929.03 ( 1.79%)
Hmean mb/sec-32 10943.10 ( 0.00%) 10802.83 ( -1.28%)
Hmean mb/sec-64 15950.81 ( 0.00%) 16211.31 ( 1.63%)
Hmean mb/sec-128 15302.17 ( 0.00%) 15445.11 ( 0.93%)
Hmean mb/sec-256 14866.18 ( 0.00%) 15088.73 ( 1.50%)
Hmean mb/sec-512 15223.31 ( 0.00%) 15373.69 ( 0.99%)
Hmean mb/sec-1024 14574.25 ( 0.00%) 14598.02 ( 0.16%)
Hmean mb/sec-2048 13569.02 ( 0.00%) 13733.86 ( 1.21%)
Hmean mb/sec-3072 12865.98 ( 0.00%) 13209.23 ( 2.67%)
Small gains of 2-4% at low thread counts and otherwise flat. The
gains on the 8-core machine were slightly different
tbench4 on 8-core i7-3770 single socket machine
Hmean mb/sec-1 442.59 ( 0.00%) 448.73 ( 1.39%)
Hmean mb/sec-2 796.68 ( 0.00%) 794.39 ( -0.29%)
Hmean mb/sec-4 1322.52 ( 0.00%) 1343.66 ( 1.60%)
Hmean mb/sec-8 2611.65 ( 0.00%) 2694.86 ( 3.19%)
Hmean mb/sec-16 2537.07 ( 0.00%) 2609.34 ( 2.85%)
Hmean mb/sec-32 2506.02 ( 0.00%) 2578.18 ( 2.88%)
Hmean mb/sec-64 2511.06 ( 0.00%) 2569.16 ( 2.31%)
Hmean mb/sec-128 2313.38 ( 0.00%) 2395.50 ( 3.55%)
Hmean mb/sec-256 2110.04 ( 0.00%) 2177.45 ( 3.19%)
Hmean mb/sec-512 2072.51 ( 0.00%) 2053.97 ( -0.89%)
In constract, this shows a relatively steady 2-3% gain at higher thread
counts. Due to the nature of the patch and the type of workload, it's
not a surprise that the result will depend on the CPU used.
hackbench-pipes
4.5.0-rc1 4.5.0-rc1
vanilla nostats-v3r1
Amean 1 0.0637 ( 0.00%) 0.0660 ( -3.59%)
Amean 4 0.1229 ( 0.00%) 0.1181 ( 3.84%)
Amean 7 0.1921 ( 0.00%) 0.1911 ( 0.52%)
Amean 12 0.3117 ( 0.00%) 0.2923 ( 6.23%)
Amean 21 0.4050 ( 0.00%) 0.3899 ( 3.74%)
Amean 30 0.4586 ( 0.00%) 0.4433 ( 3.33%)
Amean 48 0.5910 ( 0.00%) 0.5694 ( 3.65%)
Amean 79 0.8663 ( 0.00%) 0.8626 ( 0.43%)
Amean 110 1.1543 ( 0.00%) 1.1517 ( 0.22%)
Amean 141 1.4457 ( 0.00%) 1.4290 ( 1.16%)
Amean 172 1.7090 ( 0.00%) 1.6924 ( 0.97%)
Amean 192 1.9126 ( 0.00%) 1.9089 ( 0.19%)
Some small gains and losses and while the variance data is not included,
it's close to the noise. The UMA machine did not show anything particularly
different
pipetest
4.5.0-rc1 4.5.0-rc1
vanilla nostats-v2r2
Min Time 4.13 ( 0.00%) 3.99 ( 3.39%)
1st-qrtle Time 4.38 ( 0.00%) 4.27 ( 2.51%)
2nd-qrtle Time 4.46 ( 0.00%) 4.39 ( 1.57%)
3rd-qrtle Time 4.56 ( 0.00%) 4.51 ( 1.10%)
Max-90% Time 4.67 ( 0.00%) 4.60 ( 1.50%)
Max-93% Time 4.71 ( 0.00%) 4.65 ( 1.27%)
Max-95% Time 4.74 ( 0.00%) 4.71 ( 0.63%)
Max-99% Time 4.88 ( 0.00%) 4.79 ( 1.84%)
Max Time 4.93 ( 0.00%) 4.83 ( 2.03%)
Mean Time 4.48 ( 0.00%) 4.39 ( 1.91%)
Best99%Mean Time 4.47 ( 0.00%) 4.39 ( 1.91%)
Best95%Mean Time 4.46 ( 0.00%) 4.38 ( 1.93%)
Best90%Mean Time 4.45 ( 0.00%) 4.36 ( 1.98%)
Best50%Mean Time 4.36 ( 0.00%) 4.25 ( 2.49%)
Best10%Mean Time 4.23 ( 0.00%) 4.10 ( 3.13%)
Best5%Mean Time 4.19 ( 0.00%) 4.06 ( 3.20%)
Best1%Mean Time 4.13 ( 0.00%) 4.00 ( 3.39%)
Small improvement and similar gains were seen on the UMA machine.
The gain is small but it stands to reason that doing less work in the
scheduler is a good thing. The downside is that the lack of schedstats and
tracepoints may be surprising to experts doing performance analysis until
they find the existence of the schedstats= parameter or schedstats sysctl.
It will be automatically activated for latencytop and sleep profiling to
alleviate the problem. For tracepoints, there is a simple warning as it's
not safe to activate schedstats in the context when it's known the tracepoint
may be wanted but is unavailable.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <mgalbraith@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add some text and an example to Documentation/x86/early-microcode.txt
explaining how to build in microcode.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-18-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Before this, we issued this message from save_microcode_in_initrd()
which is called from free_initrd_mem(), i.e., only when we have an
initrd enabled. However, we can update from builtin microcode too but
then we don't issue the update message.
Fix it by issuing that message on the generic driver init path.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename it to mc_tmp_ptrs to denote better what it is - a temporary array
for saving pointers to microcode blobs. And "initrd" is not accurate
anymore since initrd is not the only source for early microcode.
Therefore, rename copy_initrd_ptrs() to copy_ptrs() simply and
"initrd_start" to "offset".
And then do the following convention: the global variable is called
"mc_tmp_ptrs" and the local function arguments "mc_ptrs" for
differentiation.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
... and drop the 32-bit casting games which we had to do at the time
because wrmsr() was unforgiving then, see c3fd0bd5e19a from the
full history tree:
commit c3fd0bd5e19aaff9cdd104edff136a2023db657e
Author: Linus Torvalds <torvalds@home.osdl.org>
Date: Tue Feb 17 23:23:41 2004 -0800
Fix up the microcode update on regular 32-bit x86. Our wrmsr()
is a bit unforgiving and really doesn't like 64-bit values.
...
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Get rid of local variable cpu_num as it is equal to @cpu now. Deref
cpu_data() only when it is really needed at the end.
No functionality change.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we're going to BUG_ON() because we're running on the wrong CPU, we
better do it as the first thing we do when entering that function. And
also, turn it into a WARN_ON() because it is not worth to panic the
system if we apply the microcode on the wrong CPU - we're simply going
to exit early.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Well, it is apparent what it points to - microcode. And since it is the
intel loader, no need for the "_intel" suffix. Use "!" for the 0/NULL
checks, while at it.
No functionality change.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is shorter and easier on the eyes. Change the "== 0" tests to "!..."
while at it.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So it is always a head-twister when trying to stare at code which has a
bunch of
struct mc_saved_data *mc_saved_data;
local function variables *and* a global mc_saved_data of the same name.
Rename all locals to "mcs" to differentiate from the global one.
No functionality change.
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is especially annoying on large boxes:
x86: Booting SMP configuration:
.... node #0, CPUs: #1
microcode: CPU1 microcode updated early to revision 0x428, date = 2014-05-29
#2
microcode: CPU2 microcode updated early to revision 0x428, date = 2014-05-29
#3
...
so issue the update message only once.
$ grep microcode /proc/cpuinfo
shows whether every core got updated properly.
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We do parse for the disable microcode loader chicken bit very early.
After the driver merge, the __setup() param parsing method is not needed
anymore so get rid of it.
In addition, fix a compiler warning from an old SLES11 gcc (4.3.4)
reported by Jan Beulich <jbeulich@suse.com>:
arch/x86/kernel/cpu/microcode/core.c: In function ‘load_ucode_bsp’:
arch/x86/kernel/cpu/microcode/core.c:96: warning: array subscript is above array bounds
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>