When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared
asynchronously. If cgroup_subtree_control_write() is requested to
enable the subsystem again before the entry is cleared, it has to wait
for the previous offlining to finish and clear the @cgrp->subsys[]
entry before trying to enable the subsystem again.
This is currently done while verifying the input enable / disable
parameters. This used to be correct but f63070d350 ("cgroup: make
interface files visible iff enabled on cgroup->subtree_control")
breaks it. The commit is one of the commits implementing subsystem
dependency.
Through subsystem dependency, some subsystems may be enabled and
disabled implicitly in addition to the explicitly requested ones. The
actual subsystems to be enabled and disabled are determined during
@css_enable/disable calculation. The current offline wait logic skips
the ones which are already implicitly enabled and then waits for
subsystems in @enable; however, this misses the subsystems which may
be implicitly enabled through dependency from @enable. If such
implicitly subsystem hasn't yet finished offlining yet, the function
ends up trying to create a css when its @cgrp->subsys[] slot is
already occupied triggering BUG_ON() in init_and_link_css().
Fix it by moving the wait logic after @css_enable is calculated and
waiting for all the subsystems in @css_enable. This fixes the above
bug as the mask contains all subsystems which are to be enabled
including the ones enabled through dependencies.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: f63070d350 ("cgroup: make interface files visible iff enabled on cgroup->subtree_control")
Acked-by: Zefan Li <lizefan@huawei.com>
Make cgroup_subtree_control_write() first calculate new
subtree_control (new_sc), child_subsys_mask (new_ss) and
css_enable/disable masks before applying them to the cgroup. Also,
store the original subtree_control (old_sc) and child_subsys_mask
(old_ss) and use them to restore the orignal state after failure.
This patch shouldn't cause any behavior changes. This prepares for a
fix for a bug in the async css offline wait logic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
cgroup_refresh_child_subsys_mask() calculates and updates the
effective @cgrp->child_subsys_maks according to the current
@cgrp->subtree_control. Separate out the calculation part into
cgroup_calc_child_subsys_mask(). This will be used to fix a bug in
the async css offline wait logic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
Doing things like hci_conn_hash_flush() while holding the hdev lock is
risky since its synchronous pending work cancellation could cause the
L2CAP layer to try to reacquire the hdev lock. Right now there doesn't
seem to be any obvious places where this would for certain happen but
it's already enough to cause lockdep to start warning against the hdev
and the work struct locks being taken in the "wrong" order:
[ +0.000373] mgmt-tester/1603 is trying to acquire lock:
[ +0.000292] ((&conn->pending_rx_work)){+.+.+.}, at: [<c104266d>] flush_work+0x0/0x181
[ +0.000270]
but task is already holding lock:
[ +0.000000] (&hdev->lock){+.+.+.}, at: [<c13b9a80>] hci_dev_do_close+0x166/0x359
[ +0.000000]
which lock already depends on the new lock.
[ +0.000000]
the existing dependency chain (in reverse order) is:
[ +0.000000]
-> #1 (&hdev->lock){+.+.+.}:
[ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156
[ +0.000000] [<c140c663>] mutex_lock_nested+0x54/0x375
[ +0.000000] [<c13d644b>] l2cap_recv_frame+0x293/0x1a9c
[ +0.000000] [<c13d7ca4>] process_pending_rx+0x50/0x5e
[ +0.000000] [<c1041a3f>] process_one_work+0x21c/0x436
[ +0.000000] [<c1041e3d>] worker_thread+0x1be/0x251
[ +0.000000] [<c1045a22>] kthread+0x94/0x99
[ +0.000000] [<c140f801>] ret_from_kernel_thread+0x21/0x30
[ +0.000000]
-> #0 ((&conn->pending_rx_work)){+.+.+.}:
[ +0.000000] [<c105e158>] __lock_acquire+0xa07/0xc89
[ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156
[ +0.000000] [<c1042696>] flush_work+0x29/0x181
[ +0.000000] [<c1042864>] __cancel_work_timer+0x76/0x8f
[ +0.000000] [<c104288c>] cancel_work_sync+0xf/0x11
[ +0.000000] [<c13d4c18>] l2cap_conn_del+0x72/0x183
[ +0.000000] [<c13d8953>] l2cap_disconn_cfm+0x49/0x55
[ +0.000000] [<c13be37a>] hci_conn_hash_flush+0x7a/0xc3
[ +0.000000] [<c13b9af6>] hci_dev_do_close+0x1dc/0x359
[ +0.012038] [<c13bbe38>] hci_unregister_dev+0x6e/0x1a3
[ +0.000000] [<c12d33c1>] vhci_release+0x28/0x47
[ +0.000000] [<c10dd6a9>] __fput+0xd6/0x154
[ +0.000000] [<c10dd757>] ____fput+0xd/0xf
[ +0.000000] [<c1044bb2>] task_work_run+0x6b/0x8d
[ +0.000000] [<c1001bd2>] do_notify_resume+0x3c/0x3f
[ +0.000000] [<c140fa70>] work_notifysig+0x29/0x31
[ +0.000000]
other info that might help us debug this:
[ +0.000000] Possible unsafe locking scenario:
[ +0.000000] CPU0 CPU1
[ +0.000000] ---- ----
[ +0.000000] lock(&hdev->lock);
[ +0.000000] lock((&conn->pending_rx_work));
[ +0.000000] lock(&hdev->lock);
[ +0.000000] lock((&conn->pending_rx_work));
[ +0.000000]
*** DEADLOCK ***
Fully fixing this would require some quite heavy refactoring to change
how the hdev lock and hci_conn instances are handled together. A simpler
solution for now which this patch takes is to try ensure that the hdev
workqueue is empty before proceeding with the various cleanup calls,
including hci_conn_hash_flush().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Scott says:
"Highlights include a bunch of 8xx optimizations, device tree bindings
for Freescale BMan, QMan, and FMan datapath components, misc device tree
updates, and inbound rio window support."
integrity_kernel_read() duplicates the file read operations code
in vfs_read(). This patch refactors vfs_read() code creating a
helper function __vfs_read(). It is used by both vfs_read() and
integrity_kernel_read().
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch provides CONFIG_IMA_APPRAISE_SIGNED_INIT kernel configuration
option to force IMA appraisal using signatures. This is useful, when EVM
key is not initialized yet and we want securely initialize integrity or
any other functionality.
It forces embedded policy to require signature. Signed initialization
script can initialize EVM key, update the IMA policy and change further
requirement of everything to be signed.
Changes in v3:
* kernel parameter fixed to configuration option in the patch description
Changes in v2:
* policy change of this patch separated from the key loading patch
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Keys can only be loaded once the rootfs is mounted. Initcalls
are not suitable for that. This patch defines a special hook
to load the x509 public keys onto the IMA keyring, before
attempting to access any file. The keys are required for
verifying the file's signature. The hook is called after the
root filesystem is mounted and before the kernel calls 'init'.
Changes in v3:
* added more explanation to the patch description (Mimi)
Changes in v2:
* Hook renamed as 'integrity_load_keys()' to handle both IMA and EVM
keys by integrity subsystem.
* Hook patch moved after defining loading functions
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Define configuration option to load X509 certificate into the
IMA trusted kernel keyring. It implements ima_load_x509() hook
to load X509 certificate into the .ima trusted kernel keyring
from the root filesystem.
Changes in v3:
* use ima_policy_flag in ima_get_action()
ima_load_x509 temporarily clears ima_policy_flag to disable
appraisal to load key. Use it to skip appraisal rules.
* Key directory path changed to /etc/keys (Mimi)
* Expand IMA_LOAD_X509 Kconfig help
Changes in v2:
* added '__init'
* use ima_policy_flag to disable appraisal to load keys
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Provide the function to load x509 certificates from the kernel into the
integrity kernel keyring.
Changes in v2:
* configuration option removed
* function declared as '__init'
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch defines a new function called integrity_read_file()
to read file from the kernel into a buffer. Subsequent patches
will read a file containing the public keys and load them onto
the IMA keyring.
This patch moves and renames ima_kernel_read(), the non-security
checking version of kernel_read(), to integrity_kernel_read().
Changes in v3:
* Patch descriptions improved (Mimi)
* Add missing cast (kbuild test robot)
Changes in v2:
* configuration option removed
* function declared as '__init'
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The commit 543c043cba ("powerpc/fsl_msi: change the irq handler from
chained to normal") changes the msi cascade handler from chained to
normal. Since cascade handler must run in hard interrupt context, this
will cause kernel panic if we force threading of all the interrupt
handler via kernel command parameter 'threadirqs'. So mark the irq
handler IRQF_NO_THREAD explicitly.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Currently all interrupts generated by cxl are named "cxl". This is not very
informative as we can't distinguish between cards, AFUs, error interrupts, user
contexts and user interrupts numbers. Being able to distinguish them is useful
for setting affinity.
This patch gives each of these names in /proc/interrupts.
A two card CAPI system, with afu0.0 having 2 active contexts each with 4 user
IRQs each, will now look like this:
% grep cxl /proc/interrupts
444: 0 OPAL ICS 141312 Level cxl-card1-err
445: 0 OPAL ICS 141313 Level cxl-afu1.0-err
446: 0 OPAL ICS 141314 Level cxl-afu1.0
462: 0 OPAL ICS 2052 Level cxl-afu0.0-pe0-1
463: 75517 OPAL ICS 2053 Level cxl-afu0.0-pe0-2
468: 0 OPAL ICS 2054 Level cxl-afu0.0-pe0-3
469: 0 OPAL ICS 2055 Level cxl-afu0.0-pe0-4
470: 0 OPAL ICS 2056 Level cxl-afu0.0-pe1-1
471: 75506 OPAL ICS 2057 Level cxl-afu0.0-pe1-2
472: 0 OPAL ICS 2058 Level cxl-afu0.0-pe1-3
473: 0 OPAL ICS 2059 Level cxl-afu0.0-pe1-4
502: 1066 OPAL ICS 2050 Level cxl-afu0.0
514: 0 OPAL ICS 2048 Level cxl-card0-err
515: 0 OPAL ICS 2049 Level cxl-afu0.0-err
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If an AFU has a hardware bug that causes it to acknowledge a context
terminate or remove while that context has outstanding transactions, it
is possible for the kernel to receive an interrupt for that context
after we have removed it from the context list.
The kernel will not be able to demultiplex the interrupt (or worse - if
we have already reallocated the process handle we could mis-attribute it
to the new context), and printed a big scary warning.
It did not acknowledge the interrupt, which would effectively halt
further translation fault processing on the PSL.
This patch makes the warning clearer about the likely cause of the issue
(i.e. hardware bug) to make it obvious to future AFU designers of what
needs to be fixed. It also prints out the process handle which can then
be matched up with hardware and software traces for debugging.
It also acknowledges the interrupt to the PSL with either an address
error or acknowledge, so that the PSL can continue with other
translations.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If BL_SWITCHER is enabled but SUSPEND and CPU_IDLE is not enabled
we are getting following config warning.
warning: (BL_SWITCHER) selects CPU_PM which has unmet direct
dependencies (SUSPEND || CPU_IDLE)
It has been noticed that CPU_PM dependencies in this file are not really
required so let's remove these dependencies from CPU_PM.
Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The vfree() function performs also input parameter validation. Thus the test
around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPUFreq driver's Kconfig entries are added in Kconfig.<arch> files and they are
all included from the main Kconfig file using a menu entry. This creates another
level of (unnecessary) hierarchy within the menuconfig entries.
The problem occurs when there are drivers usable across architectures. Either
their config entry is duplicated in all the supported architectures or is put
into the main Kconfig entry. With the later one, we have menuconfig entries for
drivers at two levels then.
Fix these issues by getting rid of another level of menuconfig hierarchy and
populate all drivers within the main cpufreq menu. To clearly distinguish where
the drivers start from, also add a comment that will appear in menuconfig.
Reported-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Suggested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As Freescale IFC controller has been moved to driver to driver/memory.
So enable memory driver in powerpc config
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
We no longer need mpx.h in exec.c. This will obviously also
break the build for non-x86 builds. We get the MPX includes that
we need from mmu_context.h now.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141118003608.837015B3@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The common short form of "randomizer" is "rand" in many places
(including the Bluetooth specification). The shorter version also makes
for easier to read code with less forced line breaks. This patch renames
all occurences of "randomizer" to "rand" in the Bluetooth subsystem
code.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For now the mgmt commands dealing with remote OOB data are strictly
BR/EDR-only. This patch fixes missing checks for the passed address type
so that any non-BR/EDR value triggers the appropriate error response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since config ACPI_REDUCED_HARDWARE_ONLY is already depended
on ACPI (inside if ACPI / endif), so depdens on ACPI is redundant,
remove it and fix the minor syntax problem also.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The definition of the struct pm_domain_data better belongs in the
header for the PM domains, let's move it there.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The previous patch allocates bounds tables on-demand. As noted in
an earlier description, these can add up to *HUGE* amounts of
memory. This has caused OOMs in practice when running tests.
This patch adds support for freeing bounds tables when they are no
longer in use.
There are two types of mappings in play when unmapping tables:
1. The mapping with the actual data, which userspace is
munmap()ing or brk()ing away, etc...
2. The mapping for the bounds table *backing* the data
(is tagged with VM_MPX, see the patch "add MPX specific
mmap interface").
If userspace use the prctl() indroduced earlier in this patchset
to enable the management of bounds tables in kernel, when it
unmaps the first type of mapping with the actual data, the kernel
needs to free the mapping for the bounds table backing the data.
This patch hooks in at the very end of do_unmap() to do so.
We look at the addresses being unmapped and find the bounds
directory entries and tables which cover those addresses. If
an entire table is unused, we clear associated directory entry
and free the table.
Once we unmap the bounds table, we would have a bounds directory
entry pointing at empty address space. That address space might
now be allocated for some other (random) use, and the MPX
hardware might now try to walk it as if it were a bounds table.
That would be bad. So any unmapping of an enture bounds table
has to be accompanied by a corresponding write to the bounds
directory entry to invalidate it. That write to the bounds
directory can fault, which causes the following problem:
Since we are doing the freeing from munmap() (and other paths
like it), we hold mmap_sem for write. If we fault, the page
fault handler will attempt to acquire mmap_sem for read and
we will deadlock. To avoid the deadlock, we pagefault_disable()
when touching the bounds directory entry and use a
get_user_pages() to resolve the fault.
The unmapping of bounds tables happends under vm_munmap(). We
also (indirectly) call vm_munmap() to _do_ the unmapping of the
bounds tables. We avoid unbounded recursion by disallowing
freeing of bounds tables *for* bounds tables. This would not
occur normally, so should not have any practical impact. Being
strict about it here helps ensure that we do not have an
exploitable stack overflow.
Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is really the meat of the MPX patch set. If there is one patch to
review in the entire series, this is the one. There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner. (small FAQ below).
Long Description:
This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers. The prctl() is an
explicit signal from userspace.
PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.
PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.
PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address. Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.
Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive. Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.
==== Why does the hardware even have these Bounds Tables? ====
MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".
They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.
The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.
==== Why not do this in userspace? ====
This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.
Q: Can virtual space simply be reserved for the bounds tables so
that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
area needs 4*X GB of virtual space, plus 2GB for the bounds
directory. If we were to preallocate them for the 128TB of
user virtual address space, we would need to reserve 512TB+2GB,
which is larger than the entire virtual address space today.
This means they can not be reserved ahead of time. Also, a
single process's pre-popualated bounds directory consumes 2GB
of virtual *AND* physical memory. IOW, it's completely
infeasible to prepopulate bounds directories.
Q: Can we preallocate bounds table space at the same time memory
is allocated which might contain pointers that might eventually
need bounds tables?
A: This would work if we could hook the site of each and every
memory allocation syscall. This can be done for small,
constrained applications. But, it isn't practical at a larger
scale since a given app has no way of controlling how all the
parts of the app might allocate memory (think libraries). The
kernel is really the only place to intercept these calls.
Q: Could a bounds fault be handed to userspace and the tables
allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
handler functions and even if mmap() would work it still
requires locking or nasty tricks to keep track of the
allocation state there.
Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.
Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch sets bound violation fields of siginfo struct in #BR
exception handler by decoding the user instruction and constructing
the faulting pointer.
We have to be very careful when decoding these instructions. They
are completely controlled by userspace and may be changed at any
time up to and including the point where we try to copy them in to
the kernel. They may or may not be MPX instructions and could be
completely invalid for all we know.
Note: This code is based on Qiaowei Ren's specialized MPX
decoder, but uses the generic decoder whenever possible. It was
tested for robustness by generating a completely random data
stream and trying to decode that stream. I also unmapped random
pages inside the stream to test the "partial instruction" short
read code.
We kzalloc() the siginfo instead of stack allocating it because
we need to memset() it anyway, and doing this makes it much more
clear when it got initialized by the MPX instruction decoder.
Changes from the old decoder:
* Use the generic decoder instead of custom functions. Saved
~70 lines of code overall.
* Remove insn->addr_bytes code (never used??)
* Make sure never to possibly overflow the regoff[] array, plus
check the register range correctly in 32 and 64-bit modes.
* Allow get_reg() to return an error and have mpx_get_addr_ref()
handle when it sees errors.
* Only call insn_get_*() near where we actually use the values
instead if trying to call them all at once.
* Handle short reads from copy_from_user() and check the actual
number of read bytes against what we expect from
insn_get_length(). If a read stops in the middle of an
instruction, we error out.
* Actually check the opcodes intead of ignoring them.
* Dynamically kzalloc() siginfo_t so we don't leak any stack
data.
* Detect and handle decoder failures instead of ignoring them.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We have chosen to perform the allocation of bounds tables in
kernel (See the patch "on-demand kernel allocation of bounds
tables") and to mark these VMAs with VM_MPX.
However, there is currently no suitable interface to actually do
this. Existing interfaces, like do_mmap_pgoff(), have no way to
set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem
long enough to let a caller do it.
This patch wraps mmap_region() and hold mmap_sem long enough to
make the modifications to the VMA which we need.
Also note the 32/64-bit #ifdef in the header. We actually need
to do this at runtime eventually. But, for now, we don't support
running 32-bit binaries on 64-bit kernels. Support for this will
come in later patches.
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
MPX-enabled applications using large swaths of memory can
potentially have large numbers of bounds tables in process
address space to save bounds information. These tables can take
up huge swaths of memory (as much as 80% of the memory on the
system) even if we clean them up aggressively. In the worst-case
scenario, the tables can be 4x the size of the data structure
being tracked. IOW, a 1-page structure can require 4 bounds-table
pages.
Being this huge, our expectation is that folks using MPX are
going to be keen on figuring out how much memory is being
dedicated to it. So we need a way to track memory use for MPX.
If we want to specifically track MPX VMAs we need to be able to
distinguish them from normal VMAs, and keep them from getting
merged with normal VMAs. A new VM_ flag set only on MPX VMAs does
both of those things. With this flag, MPX bounds-table VMAs can
be distinguished from other VMAs, and userspace can also walk
/proc/$pid/smaps to get memory usage for MPX.
In addition to this flag, we also introduce a special ->vm_ops
specific to MPX VMAs (see the patch "add MPX specific mmap
interface"), but currently different ->vm_ops do not by
themselves prevent VMA merging, so we still need this flag.
We understand that VM_ flags are scarce and are open to other
options.
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151825.565625B3@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as
both a runtime and compile-time check.
When CONFIG_X86_INTEL_MPX is disabled,
cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at
compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid
flag will be checked at runtime.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151823.B358EAD2@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
New fields about bound violation are added into general struct
siginfo. This will impact MIPS and IA64, which extend general
struct siginfo. This patch syncs this struct for IA64 with
general version.
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151822.82B3B486@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
New fields about bound violation are added into general struct
siginfo. This will impact MIPS and IA64, which extend general
struct siginfo. This patch syncs this struct for MIPS with
general version.
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151820.F7EDC3CC@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151819.1908C900@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
According to Intel SDM extension, MPX configuration and status registers
should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and
status_reg to bndcfgu and bndstatus.
[ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Consider the bndX MPX registers. There 4 registers each
containing a 64-bit lower and a 64-bit upper bound. That's 8*64
bits and we declare it thusly:
struct bndregs_struct {
u64 bndregs[8];
}
Let's say you want to read the upper bound from the MPX register
bnd2 out of the xsave buf. You do:
bndregno = 2;
upper_bound = xsave_buf->bndregs.bndregs[2*bndregno+1];
That kinda sucks. Every time you access it, you need to know:
1. Each bndX register is two entries wide in "bndregs"
2. The lower comes first followed by upper. We do the +1 to get
upper vs. lower.
This replaces the old definition. You can now access them
indexed by the register number directly, and with a meaningful
name for the lower and upper bound:
bndregno = 2;
xsave_buf->bndreg[bndregno].upper_bound;
It's now *VERY* clear that there are 4 registers. The programmer
now doesn't have to care what order the lower and upper bounds
are in, and it's harder to get it wrong.
[ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct
bndreg_struct to struct bndreg ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current x86 instruction decoder steps along through the
instruction stream but always ensures that it never steps farther
than the largest possible instruction size (MAX_INSN_SIZE).
The MPX code is now going to be doing some decoding of userspace
instructions. We copy those from userspace in to the kernel and
they're obviously completely untrusted coming from userspace. In
addition to the constraint that instructions can only be so long,
we also have to be aware of how long the buffer is that came in
from userspace. This _looks_ to be similar to what the perf and
kprobes is doing, but it's unclear to me whether they are
affected.
The whole reason we need this is that it is perfectly valid to be
executing an instruction within MAX_INSN_SIZE bytes of an
unreadable page. We should be able to gracefully handle short
reads in those cases.
This adds support to the decoder to record how long the buffer
being decoded is and to refuse to "validate" the instruction if
we would have gone over the end of the buffer to decode it.
The kprobes code probably needs to be looked at here a bit more
carefully. This patch still respects the MAX_INSN_SIZE limit
there but the kprobes code does look like it might be able to
be a bit more strict than it currently is.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Jarkko Nikula <jarkko.nikula@bitmer.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Current driver can also run in I2S master mode, so remove the old comment.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Timur Tabi <timur@tabi.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Modify the default value of the MX-8E[4] to 1 for ASRC function. It could
prevent the pop noise with ASRC function.
Signed-off-by: Oder Chiou <oder_chiou@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The version field defined in the audit status structure was found to have
limitations in terms of its expressibility of features supported. This is
distict from the get/set features call to be able to command those features
that are present.
Converting this field from a version number to a feature bitmap will allow
distributions to selectively backport and support certain features and will
allow upstream to be able to deprecate features in the future. It will allow
userspace clients to first query the kernel for which features are actually
present and supported. Currently, EINVAL is returned rather than EOPNOTSUP,
which isn't helpful in determining if there was an error in the command, or if
it simply isn't supported yet. Past features are not represented by this
bitmap, but their use may be converted to EOPNOTSUP if needed in the future.
Since "version" is too generic to convert with a #define, use a union in the
struct status, introducing the member "feature_bitmap" unionized with
"version".
Convert existing AUDIT_VERSION_* macros over to AUDIT_FEATURE_BITMAP*
counterparts, leaving the former for backwards compatibility.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: minor whitespace tweaks]
Signed-off-by: Paul Moore <pmoore@redhat.com>
As well as the usual driver fixes there's a few other things here:
One is a fix for a race in DPCM which is unfortuantely a rather large
diffstat, this is the result of growing usage of the mainline code and
hence more detailed testing so I'm relatively happy.
The other is a fix for non-DT machine driver matching following some of
the componentization work which is much more focused.
Both have had a while to cook in -next.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUaifFAAoJECTWi3JdVIfQ9V0H/iE0R+vu2VyvTPt9PXcWNn2N
zCwwljFEc9p2tAKqCHXkq2JDZZDIUQBxQPMJF7m43iQ+JGg/6BjPnPCdNJsi7hZV
/prLtMUPhdExxRL/mdfvIS26FTHF3GhOZvE4ePAw/jJWlCCMWGt3iEruuoeDzbxg
TwHTlgVqBKcdDAAMe1kTkBCTcamt8SaLYOaAgICICJtwO6AFpf8UYrYnT4Y9l1XK
jOAPjfLNOtuIh0IKp7NBJlIdV3PZPpnx7l1wg8DN+63kwLStFrQjYvueu1UAArh8
Y65oQpkIxgqlJOvy59fY3J3F5HF9u6dWjtQv5wMOJm0Fr4y2GP+EsaHBlPYM0co=
=8pIU
-----END PGP SIGNATURE-----
Merge tag 'asoc-v3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v3.18
As well as the usual driver fixes there's a few other things here:
One is a fix for a race in DPCM which is unfortuantely a rather large
diffstat, this is the result of growing usage of the mainline code and
hence more detailed testing so I'm relatively happy.
The other is a fix for non-DT machine driver matching following some of
the componentization work which is much more focused.
Both have had a while to cook in -next.
Certain ARM CPU implementations (e.g. Cortex-A15) may not raise a
floating- point exception whenever deprecated short-vector VFP
instructions are executed. Instead these instructions are treated
as UNALLOCATED. Change the VFP exception handling code to emulate
short-vector instructions even if FPEXC exception bits are not
set.
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch remove clear_thread_flag(TIF_UPROBE) in do_work_pending(),
because uprobe_notify_resume() have do this.
Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Currently, when a roc period expires, the offchannel
timer calls ieee80211_remain_on_channel_expired(), but
the roc state is cleared only when the queued work
to switch to the operating channel gets a chance to run.
This race is a problem because mac80211 can issue a
new roc request in this window. To avoid this, handle
roc completion in the offchannel timer itself.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In a GO/STA setup, when we switch to the STA context,
the channel context timer is scheduled with a period of
half the beacon interval. If a beacon is received in
this duration, the timer is adjusted to accommodate
TSF sync done by the HW.
But, if the actual channel switch is delayed for some
reason, we end up rearming the timer every time a new
beacon is received. Avoid this by doing the adjustment
only once.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>