Store the data version number indicated by an FS.FetchData op into the read
request structure so that it's accessible by the page reader.
Signed-off-by: David Howells <dhowells@redhat.com>
We no longer parse symlinks when we get the inode to determine if this
symlink is actually a mountpoint as we detect that by examining the mode
instead (symlinks are always 0777 and mountpoints 0644).
Access the cache after mapping the status so that we don't have to manually
set the inode size now.
Note that this may need adjusting if the disconnected operation is
implemented as the file metadata may have to be obtained from the cache.
Signed-off-by: David Howells <dhowells@redhat.com>
Introduce a proc file that displays a bunch of statistics for the AFS
filesystem in the current network namespace.
Signed-off-by: David Howells <dhowells@redhat.com>
The code that fixes the crashes in the following commit introduced a small
memory leak:
commit 6a2cf8d366 ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
Fixing this requires a bit of reworking, which I've explained. Also provide
some code cleanup.
There is a small window in qla2x00_probe_one where if qla2x00_alloc_queues
fails, we end up never freeing req and rsp and leak 0xc0 and 0xc8 bytes
respectively (the sizes of req and rsp).
I originally put in checks to test for this condition which were based on
the incorrect assumption that if ha->rsp_q_map and ha->req_q_map were
allocated, then rsp and req were allocated as well. This is incorrect.
There is a window between these allocations:
ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
goto probe_hw_failed;
[if successful, both rsp and req allocated]
base_vha = qla2x00_create_host(sht, ha);
goto probe_hw_failed;
ret = qla2x00_request_irqs(ha, rsp);
goto probe_failed;
if (qla2x00_alloc_queues(ha, req, rsp)) {
goto probe_failed;
[if successful, now ha->rsp_q_map and ha->req_q_map allocated]
To simplify this, we should just set req and rsp to NULL after we free
them. Sounds simple enough? The problem is that req and rsp are pointers
defined in the qla2x00_probe_one and they are not always passed by reference
to the routines that free them.
Here are paths which can free req and rsp:
PATH 1:
qla2x00_probe_one
ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
[req and rsp are passed by reference, but if this fails, we currently
do not NULL out req and rsp. Easily fixed]
PATH 2:
qla2x00_probe_one
failing in qla2x00_request_irqs or qla2x00_alloc_queues
probe_failed:
qla2x00_free_device(base_vha);
qla2x00_free_req_que(ha, req)
qla2x00_free_rsp_que(ha, rsp)
PATH 3:
qla2x00_probe_one:
failing in qla2x00_mem_alloc or qla2x00_create_host
probe_hw_failed:
qla2x00_free_req_que(ha, req)
qla2x00_free_rsp_que(ha, rsp)
PATH 1: This should currently work, but it doesn't because rsp and rsp are
not set to NULL in qla2x00_mem_alloc. Easily remedied.
PATH 2: req and rsp aren't passed in at all to qla2x00_free_device but are
derived from ha->req_q_map[0] and ha->rsp_q_map[0]. These are only set up if
qla2x00_alloc_queues succeeds.
In qla2x00_free_queues, we are protected from crashing if these don't exist
because req_qid_map and rsp_qid_map are only set on their allocation. We are
guarded in this way:
for (cnt = 0; cnt < ha->max_req_queues; cnt++) {
if (!test_bit(cnt, ha->req_qid_map))
continue;
PATH 3: This works. We haven't freed req or rsp yet (or they were never
allocated if qla2x00_mem_alloc failed), so we'll attempt to free them here.
To summarize, there are a few small changes to make this work correctly and
(and for some cleanup):
1) (For PATH 1) Set *rsp and *req to NULL in case of failure in
qla2x00_mem_alloc so these are correctly set to NULL back in
qla2x00_probe_one
2) After jumping to probe_failed: and calling qla2x00_free_device,
explicitly set rsp and req to NULL so further calls with these pointers do
not crash, i.e. the free queue calls in the probe_hw_failed section we fall
through to.
3) Fix return code check in the call to qla2x00_alloc_queues. We currently
drop the return code on the floor. The probe fails but the caller of the
probe doesn't have an error code, so it attaches to pci. This can result in
a crash on module shutdown.
4) Remove unnecessary NULL checks in qla2x00_free_req_que,
qla2x00_free_rsp_que, and the egregious NULL checks before kfrees and vfrees
in qla2x00_mem_free.
I tested this out running a scenario where the card breaks at various times
during initialization. I made sure I forced every error exit path in
qla2x00_probe_one.
Cc: <stable@vger.kernel.org> # v4.16
Fixes: 6a2cf8d366 ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently scsi_dh_lookup() doesn't check for NULL as a device name. This
combined with nvme over dm-mpath results in the following messages
emitted by device-mapper:
device-mapper: multipath: Could not failover device 259:67: Handler scsi_dh_(null) error 14.
Let scsi_dh_lookup() fail fast on NULL names.
[mkp: typo fix]
Cc: <stable@vger.kernel.org> # v4.16
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The first assignment to shost->use_blk_mq is redundant as it is
overwritten by the following statement. Remove this redundant code.
Detected by CoverityScan, CID#1466993 ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement @cell substitution handling such that if @cell is seen as a name
in a dynamic root mount, then the name of the root cell for that network
namespace will be substituted for @cell during lookup.
The substitution of @cell for the current net namespace is set by writing
the cell name to /proc/fs/afs/rootcell. The value can be obtained by
reading the file.
For example:
# mount -t afs none /kafs -o dyn
# echo grand.central.org >/proc/fs/afs/rootcell
# ls /kafs/@cell
archive/ cvs/ doc/ local/ project/ service/ software/ user/ www/
# cat /proc/fs/afs/rootcell
grand.central.org
Signed-off-by: David Howells <dhowells@redhat.com>
Implement the AFS feature by which @sys at the end of a pathname component
may be substituted for one of a list of values, typically naming the
operating system. Up to 16 alternatives may be specified and these are
tried in turn until one works. Each network namespace has[*] a separate
independent list.
Upon creation of a new network namespace, the list of values is
initialised[*] to a single OpenAFS-compatible string representing arch type
plus "_linux26". For example, on x86_64, the sysname is "amd64_linux26".
[*] Or will, once network namespace support is finalised in kAFS.
The list may be set by:
# for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname
for which separate writes to the same fd are amalgamated and applied on
close. The LF character may be used as a separator to specify multiple
items in the same write() call.
The list may be cleared by:
# echo >/proc/fs/afs/sysname
and read by:
# cat /proc/fs/afs/sysname
foo
bar
linux-x86_64
Signed-off-by: David Howells <dhowells@redhat.com>
When afs_lookup() is called, prospectively look up the next 50 uncached
fids also from that same directory and cache the results, rather than just
looking up the one file requested.
This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
by fetching up to 50 file statuses at a time.
Signed-off-by: David Howells <dhowells@redhat.com>
AFS cells that are added or set as the workstation cell through /proc are
pinned against removal by setting the AFS_CELL_FL_NO_GC flag on them and
taking a ref. The ref should be only taken if the flag wasn't already set.
Fix this by making it conditional.
Without this an assertion failure will occur during module removal
indicating that the refcount is too elevated.
Signed-off-by: David Howells <dhowells@redhat.com>
Fix warnings raised by checker, including:
(*) Warnings raised by unequal comparison for the purposes of sorting,
where the endianness doesn't matter:
fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer
(*) afs_set_cb_interest() is not actually used and can be removed.
(*) afs_cell_gc_delay() should be provided with a sysctl.
(*) afs_cell_destroy() needs to use rcu_access_pointer() to read
cell->vl_addrs.
(*) afs_init_fs_cursor() should be static.
(*) struct afs_vnode::permit_cache needs to be marked __rcu.
(*) afs_server_rcu() needs to use rcu_access_pointer().
(*) afs_destroy_server() should use rcu_access_pointer() on
server->addresses as the server object is no longer accessible.
(*) afs_find_server() casts __be16/__be32 values to int in order to
directly compare them for the purpose of finding a match in a list,
but is should also annotate the cast with __force to avoid checker
warnings.
(*) afs_check_permit() accesses vnode->permit_cache outside of the RCU
readlock, though it doesn't then access the value; the extraneous
access is deleted.
False positives:
(*) Conditional locking around the code in xdr_decode_AFSFetchStatus. This
can be dealt with in a separate patch.
fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block
(*) Incorrect handling of seq-retry lock context balance:
fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different
lock contexts for basic block
fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block
fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block
Errors:
(*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go
round again if it successfully found the workstation cell.
(*) Fix UUID decode in afs_deliver_cb_probe_uuid().
(*) afs_cache_permit() has a missing rcu_read_unlock() before one of the
jumps to the someone_else_changed_it label. Move the unlock to after
the label.
(*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when
encoding to XDR.
(*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl()
when decoding from XDR.
Signed-off-by: David Howells <dhowells@redhat.com>
Remove the const marking from the actor function pointer in the dir_context
struct. The const prevents the structure from being used as part of a
kmalloc'd object as it makes the compiler require that the actor member be
set at object initialisation time (or not at all), incuring something like
the following error if you try and set it later:
fs/afs/dir.c:556:20: error: assignment of read-only member 'actor'
Marking the member const like this adds very little in the way of sanity
checking as the type checking system is likely to provide sufficient - and
if not, the kernel is very likely to oops repeatably in this case.
Fixes: ac6614b764 ("[readdir] constify ->actor")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs namei updates from Al Viro:
- make lookup_one_len() safe with parent locked only shared(incoming
afs series wants that)
- fix of getname_kernel() regression from 2015 (-stable fodder, that
one).
* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
getname_kernel() needs to make sure that ->name != ->iname in long case
make lookup_one_len() safe to use with directory locked shared
new helper: __lookup_slow()
merge common parts of lookup_one_len{,_unlocked} into common helper
+ Documentation cleanups
+ removal of unused code
+ cause some structs to be static
+ implement Orangefs vm_operations fault callout
+ eliminate two single-use functions and put their cleaned up code in line.
+ replace a vmalloc/memset instance with vzalloc
+ fix a race condition bug in wait code.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJayk8pAAoJEM9EDqnrzg2+92IP/2AqMOUodqF+AfKrAebQIAEk
DvQefRC2eQXN2Ak8KYSO7oMdqfY7xeVSaKllWsSHdnktI5L9l6AOBa7uYQYF4OQt
sqT5Wj5sbnDS54gOfFt95sUHEQXs5wegTXBE15t4uxaLSpv22mPjzqlTzyZcK7IX
nhSIxEi+utXOW1WodT2xLgS08ac693KXciNNN14NKi5DJ+h6a8Z2kZBq/6ssC1oV
nW2tIfRVWNVk6cfJ+Wq12JksEZOPemzjv2zojv3PZwCcwObOyxvf16nxnLjdkJvw
obYWmR8ZcF3TItBbkxOTfanr+UAhZ55YVYDpitJmV9fpPM9eoPAR0gNWBgq45kaS
s27pvMYf5uvGRJHgEO6hBpmi1dExDDMLfH6Y4IN+zyHKjNfgvfQKwIkzrjlmG6q6
AgO3mbbMJfid2x7lkShuP1uYqUfv03fo2xW3zxWjjxo3tEbADSgzJQ5kjPixDvcB
0ru2SlRVqgYIFqBN8VgWkXKbfcpSEcDITmFEkBKg/SkA08ry4iLk744UqIPa5n76
5DYB0u99nHPMf2BBVsz6L/mUfVuHBYUBx1iII95PqJhVlj2Yi0sIj3AHUzQ4neKS
YWGMk4VQ+B5zCObSohCRv9l3ECUqugpiJzCuZjZZS5LUkj9RO5EJ1LDhOU/31xDL
OUTgZl7f5NykxwIHh3bD
=zJ1Z
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Fixes and cleanups:
- Documentation cleanups
- removal of unused code
- make some structs static
- implement Orangefs vm_operations fault callout
- eliminate two single-use functions and put their cleaned up code in
line.
- replace a vmalloc/memset instance with vzalloc
- fix a race condition bug in wait code"
* tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
Orangefs: documentation updates
orangefs: document package install and xfstests procedure
orangefs: remove unused code
orangefs: make several *_operations structs static
orangefs: implement vm_ops->fault
orangefs: open code short single-use functions
orangefs: replace vmalloc and memset with vzalloc
orangefs: bug fix for a race condition when getting a slot
During BO teardown, an indirect list 'uniform_addr_offsets' wasn't being
freed leading to leaking many 128B allocations. Fix the memory leak by
releasing it at teardown time.
Cc: stable@vger.kernel.org
Fixes: 6d45c81d22 ("drm/vc4: Add support for branching in shader validation.")
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180402071035.25356-1-daniel@quora.org
Commit 0619f0f5e3 ("selinux: wrap selinuxfs state") triggers a BUG
when SELinux is runtime-disabled (i.e. systemd or equivalent disables
SELinux before initial policy load via /sys/fs/selinux/disable based on
/etc/selinux/config SELINUX=disabled).
This does not manifest if SELinux is disabled via kernel command line
argument or if SELinux is enabled (permissive or enforcing).
Before:
SELinux: Disabled at runtime.
BUG: Dentry 000000006d77e5c7{i=17,n=null} still in use (1) [unmount of selinuxfs selinuxfs]
After:
SELinux: Disabled at runtime.
Fixes: 0619f0f5e3 ("selinux: wrap selinuxfs state")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past invalid
privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as of now)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
=bPlD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past
invalid privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as
of now)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
kvm: x86: fix a prototype warning
kvm: selftests: add sync_regs_test
kvm: selftests: add API testing infrastructure
kvm: x86: fix a compile warning
KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
KVM: X86: Introduce handle_ud()
KVM: vmx: unify adjacent #ifdefs
x86: kvm: hide the unused 'cpu' variable
KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
kvm: Add emulation for movups/movupd
KVM: VMX: raise internal error for exception during invalid protected mode state
KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
KVM: x86: Fix misleading comments on handling pending exceptions
KVM: x86: Rename interrupt.pending to interrupt.injected
KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
x86/kvm: use Enlightened VMCS when running on Hyper-V
x86/hyper-v: detect nested features
x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
...
Commit 3c8ba0d61d ("kernel.h: Retain constant expression output for
max()/min()") rewrote our min/max macros to be very clever, but in the
meantime resurrected a variable name shadow issue that we had had
previously fixed in commit 589a9785ee ("min/max: remove sparse
warnings when they're nested").
That commit talks about the sparse warnings that this shadowing causes,
which we ignored as just a minor annoyance. But it turns out that the
sparse warning is the least of our problems. We actually have a real
bug due to the shadowing through the interaction with "min_not_zero()",
which ends up doing
min(__x, __y)
internally, and then the new declaration of "__x" and "__y" as new
variables in __cmp_once() results in a complete mess of an expression,
and "min_not_zero()" doesn't work at all.
For some odd reason, this only ever caused (reported) problems on s390,
even though it is a generic issue and most of the (obviously successful)
testing of the problematic commit had happened on other architectures.
Quoting Sebastian Ott:
"What happened is that the bio build by the partition detection code
was attempted to be split by the block layer because the block queue
had a max_sector setting of 0. blk_queue_max_hw_sectors uses
min_not_zero."
So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
macros do not have these kinds of clashes.
[ That said, __UNIQUE_ID() itself has several issues that make it less
than wonderful.
In particular, the "uniqueness" has a fallback on the line number,
which means that it's not actually unique in more complex cases if you
don't build with gcc or clang (which have working unique counters that
aren't tied to line numbers).
That historical broken fallback also means that we have that pointless
"prefix" argument that doesn't actually make much sense _except_ for
the known-broken case. Oh well. ]
Fixes: 3c8ba0d61d ("kernel.h: Retain constant expression output for max()/min()")
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The filestreams allocator stores an xfs_fstrm_item structure in the MRU to
cache inode number to agno mappings for a particular length of time. Each
xfs_fstrm_item contains the internal MRU structure, an inode pointer and
agno value.
The inode pointer stored in the xfs_fstrm_item is not referenced, however,
which means the inode itself can be removed and reclaimed before the MRU
item is freed. If this occurs, xfs_fstrm_free_func() can access freed or
unrelated memory through xfs_fstrm_item->ip and crash.
The obvious solution is to grab an inode reference for xfs_fstrm_item.
The filestream mechanism only actually uses the inode pointer as a means
to access the xfs_mount, however. Rather than add unnecessary
complexity, simplify the implementation to store an xfs_mount pointer in
struct xfs_mru_cache, and pass it to the free callback. This also
requires updates to the tracepoint class to provide the associated data
via parameters rather than the inode and a minor hack to peek at the MRU
key to establish the inode number at free time.
Based on debugging work and an earlier patch from Brian Foster, who
also wrote most of this changelog.
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The "normal" kernel page table creation mechanisms using
PAGE_KERNEL_* page protections will never set _PAGE_GLOBAL with PTI.
The few places in the kernel that always want _PAGE_GLOBAL must
avoid using PAGE_KERNEL_*.
Document that we want it here and its use is not accidental.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205507.BCF4D4F0@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The __PAGE_KERNEL_* page permissions are "raw". They contain bits
that may or may not be supported on the current processor. They need
to be filtered by a mask (currently __supported_pte_mask) to turn them
into a value that we can actually set in a PTE.
These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL. But, with PTI,
we want to be able to support _PAGE_GLOBAL (have the bit set in
__supported_pte_mask) but not have it appear in any of these masks by
default.
This patch creates a new mask, __default_kernel_pte_mask, and applies
it when creating all of the PAGE_KERNEL_* masks. This makes
PAGE_KERNEL_* safe to use anywhere (they only contain supported bits).
It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n
kernels but clears _PAGE_GLOBAL when PTI=y.
We also make __default_kernel_pte_mask a non-GPL exported symbol
because there are plenty of driver-available interfaces that take
PAGE_KERNEL_* permissions.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205506.030DB6B6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When clearing _PAGE_PRESENT on a huge page, we need to be careful
to also clear _PAGE_PSE, otherwise it might still get confused
for a valid large page table entry.
We do that near the spot where we *set* _PAGE_PSE. That's fine,
but it's unnecessary. pgprot_large_2_4k() already did it.
BTW, I also noticed that pgprot_large_2_4k() and
pgprot_4k_2_large() are not symmetric. pgprot_large_2_4k() clears
_PAGE_PSE (because it is aliased to _PAGE_PAT) but
pgprot_4k_2_large() does not put _PAGE_PSE back. Bummer.
Also, add some comments and change "promote" to "move". "Promote"
seems an odd word to move when we are logically moving a bit to a
lower bit position. Also add an extra line return to make it clear
to which line the comment applies.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205504.9B0F44A9@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The pageattr code has a pattern repeated where it sets _PAGE_GLOBAL
for present PTEs but clears it for non-present PTEs. The intention
is to keep _PAGE_GLOBAL from getting confused with _PAGE_PROTNONE
since _PAGE_GLOBAL is for present PTEs and _PAGE_PROTNONE is for
non-present
But, this pattern makes no sense. Effectively, it says, if you use
the pageattr code, always set _PAGE_GLOBAL when _PAGE_PRESENT.
canon_pgprot() will clear it if unsupported (because it masks the
value with __supported_pte_mask) but we *always* set it. Even if
canon_pgprot() did not filter _PAGE_GLOBAL, it would be OK.
_PAGE_GLOBAL is ignored when CR4.PGE=0 by the hardware.
This unconditional setting of _PAGE_GLOBAL is a problem when we have
PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
not.
This updated version of the code says:
1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
2. Never set _PAGE_GLOBAL implicitly
3. Allow _PAGE_GLOBAL to be in cpa.set_mask
4. Allow _PAGE_GLOBAL to be inherited from previous PTE
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205502.86E199DA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ARM SA1100 updates from Russell King:
"We have support for arbitary MMIO registers providing platform GPIOs,
which allows us to abstract some of the SA11x0 CF support.
This set of updates makes that change"
* 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: sa1100/simpad: switch simpad CF to use gpiod APIs
ARM: sa1100/shannon: convert to generic CF sockets
ARM: sa1100/nanoengine: convert to generic CF sockets
ARM: sa1100/h3xxx: switch h3xxx PCMCIA to use gpiod APIs
ARM: sa1100/cerf: convert to generic CF sockets
ARM: sa1100/assabet: convert to generic CF sockets
ARM: sa1100: provide infrastructure to support generic CF sockets
pcmcia: sa1100: provide generic CF support
For non-paranoid entries, idtentry knows how to switch from the
kernel stack to the user stack, as does error_entry. This results
in pointless duplication and code bloat. Make idtentry stop
thinking about stacks for non-paranoid entries.
This reduces text size by 5377 bytes.
This goes back to the following commit:
7f2590a110 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/90aab80c1f906e70742eaa4512e3c9b5e62d59d4.1522794757.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This Kconfig warning appeared after a fix to the Kconfig validation.
The GPIO_CS5535 driver depends on the MFD_CS5535 driver, but the former
is selected in places where the latter is not:
WARNING: unmet direct dependencies detected for GPIO_CS5535
Depends on [m]: GPIOLIB [=y] && (X86 [=y] || MIPS || COMPILE_TEST [=y]) && MFD_CS5535 [=m]
Selected by [y]:
- OLPC_XO1_SCI [=y] && X86_32 [=y] && OLPC [=y] && OLPC_XO1_PM [=y] && INPUT [=y]=y
The warning does seem appropriate, since the GPIO_CS5535 driver won't
work unless MFD_CS5535 is also present. However, there is no link time
dependency between the two, so this caused no problems during randconfig
testing before.
This changes the 'select GPIO_CS5535' to 'depends on GPIO_CS5535' to
avoid the issue, at the expense of making it harder to configure the
driver (one now has to select the dependencies first).
The 'select MFD_CORE' part is completely redundant, since we already
depend on MFD_CS5535 here, so I'm removing that as well.
Ideally, the private symbols exported by that cs5535 gpio driver would
just be converted to gpiolib interfaces so we could expletely avoid
this dependency.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kbuild@vger.kernel.org
Fixes: f622f82795 ("kconfig: warn unmet direct dependency of tristate symbols selected by y")
Link: http://lkml.kernel.org/r/20180404124539.3817101-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
swiotlb_alloc() calls dma_direct_alloc(), which can satisfy lower than 32-bit
DMA mask requests using GFP_DMA if the architecture supports it. Various
x86 drivers rely on that, so we need to support that. At the same time
the whole kernel expects a 32-bit DMA mask to just work, so the other magic
in swiotlb_dma_supported() isn't actually needed either.
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org
Fixes: 6e4bf58677 ("x86/dma: Use generic swiotlb_ops")
Link: http://lkml.kernel.org/r/20180409091517.6619-2-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ARM updates from Russell King:
"A number of core ARM changes:
- Refactoring linker script by Nicolas Pitre
- Enable source fortification
- Add support for Cortex R8"
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: decompressor: fix warning introduced in fortify patch
ARM: 8751/1: Add support for Cortex-R8 processor
ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE
ARM: simplify and fix linker script for TCM
ARM: linker script: factor out TCM bits
ARM: linker script: factor out vectors and stubs
ARM: linker script: factor out unwinding table sections
ARM: linker script: factor out stuff for the .text section
ARM: linker script: factor out stuff for the DISCARD section
ARM: linker script: factor out some common definitions between XIP and non-XIP
Pull m68knommu update from Greg Ungerer:
"Only a single fix to set the DMA masks in the ColdFire FEC platform
data structure.
This stops the warning from dma-mapping.h at boot time"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: set dma and coherent masks for platform FEC ethernets
Pull alpha updates from Matt Turner:
"A few small changes for alpha"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering
alpha: Implement CPU vulnerabilities sysfs functions.
alpha: rtc: stop validating rtc_time in .read_time
alpha: rtc: remove unused set_mmss ops
Stephen reports that an x86 allmodconfig build fails to build the
of_pmem driver due to a missing definition of of_node_to_nid(). That
helper is currently only exported in the OF_NUMA=y case. In other cases,
ppc and sparc, it is a weak symbol, and outside of those platforms it is
a static inline.
Until an OF_NUMA=n configuration can reliably support usage of
of_node_to_nid() in modules across architectures, mark this driver as
'bool' instead of 'tristate'.
Cc: Rob Herring <robh@kernel.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull s390 updates from Martin Schwidefsky:
- Improvements for the spectre defense:
* The spectre related code is consolidated to a single file
nospec-branch.c
* Automatic enable/disable for the spectre v2 defenses (expoline vs.
nobp)
* Syslog messages for specve v2 are added
* Enable CONFIG_GENERIC_CPU_VULNERABILITIES and define the attribute
functions for spectre v1 and v2
- Add helper macros for assembler alternatives and use them to shorten
the code in entry.S.
- Add support for persistent configuration data via the SCLP Store Data
interface. The H/W interface requires a page table that uses 4K pages
only, the code to setup such an address space is added as well.
- Enable virtio GPU emulation in QEMU. To do this the depends
statements for a few common Kconfig options are modified.
- Add support for format-3 channel path descriptors and add a binary
sysfs interface to export the associated utility strings.
- Add a sysfs attribute to control the IFCC handling in case of
constant channel errors.
- The vfio-ccw changes from Cornelia.
- Bug fixes and cleanups.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (40 commits)
s390/kvm: improve stack frame constants in entry.S
s390/lpp: use assembler alternatives for the LPP instruction
s390/entry.S: use assembler alternatives
s390: add assembler macros for CPU alternatives
s390: add sysfs attributes for spectre
s390: report spectre mitigation via syslog
s390: add automatic detection of the spectre defense
s390: move nobp parameter functions to nospec-branch.c
s390/cio: add util_string sysfs attribute
s390/chsc: query utility strings via fmt3 channel path descriptor
s390/cio: rename struct channel_path_desc
s390/cio: fix unbind of io_subchannel_driver
s390/qdio: split up CCQ handling for EQBS / SQBS
s390/qdio: don't retry EQBS after CCQ 96
s390/qdio: restrict buffer merging to eligible devices
s390/qdio: don't merge ERROR output buffers
s390/qdio: simplify math in get_*_buffer_frontier()
s390/decompressor: trim uncompressed image head during the build
s390/crypto: Fix kernel crash on aes_s390 module remove.
s390/defkeymap: fix global init to zero
...
Since our device is regulatory self managed it maintains its regulatory
rules by its own. However the wmm_rules values can't be set by the
device itself but only the indication about the need to set it.
In case the device set wmm indication, proactively query the regulatory
data base to get these values
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
When nvram loading fails a double free occurred. Fix this and reorg the
code a little.
Fixes: d09ae51a4b ("brcmfmac: pass struct in brcmf_fw_get_firmwares()")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
dquot_init() is never called in atomic context.
This function is only set as a parameter of fs_initcall().
Despite never getting called from atomic context,
dquot_init() calls __get_free_pages() with GFP_ATOMIC,
which waits busily for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
to avoid busy waiting and improve the possibility of sucessful allocation.
This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We met a sync thread stuck as follows:
raid1_sync_request+0x2c9/0xb50
md_do_sync+0x983/0xfa0
md_thread+0x11c/0x160
kthread+0x111/0x130
ret_from_fork+0x35/0x40
0xffffffffffffffff
At the same time, there is a stuck mdadm thread (mdadm --manage
/dev/md2 --add /dev/sda). It is trying to stop the sync thread:
kthread_stop+0x42/0xf0
md_unregister_thread+0x3a/0x70
md_reap_sync_thread+0x15/0x160
action_store+0x142/0x2a0
md_attr_store+0x6c/0xb0
kernfs_fop_write+0x102/0x180
__vfs_write+0x33/0x170
vfs_write+0xad/0x1a0
SyS_write+0x52/0xc0
do_syscall_64+0x6e/0x190
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Debug tools show that the sync thread is waiting in raise_barrier(),
until raid1d() end all normal IO bios into bio_end_io_list(introduced
in commit 55ce74d4bf). But, raid1d() cannot end these bios if
MD_CHANGE_PENDING bit is set. It needs to get mddev->reconfig_mutex lock
and then clear the bit in md_check_recovery().
However, the lock is holding by mdadm in action_store().
Thus, there is a loop:
mdadm waiting for sync thread to stop, sync thread waiting for
raid1d() to end bios, raid1d() waiting for mdadm to release
mddev->reconfig_mutex lock and then it can end bios.
Fix this by checking MD_RECOVERY_INTR while waiting in raise_barrier(),
so that sync thread can exit while mdadm is stoping the sync thread.
Fixes: 55ce74d4bf ("md/raid1: ensure device failure recorded before write request returns.")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Device could become faulty when clustered array handling
METADATA_UPDATED msg, so we don't need to call read_rdev
for this device.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
snd_pcm_hw_params() (more exactly snd_pcm_hw_params_choose()) contains
a check of the return error from snd_pcm_hw_param_first() and _last()
with snd_BUG_ON() -- i.e. it may trigger WARN_ON() depending on the
kconfig.
This was a valid check in the past, as these functions shouldn't
return any error if the parameters have been already refined via
snd_pcm_hw_refine() beforehand. However, the recent rewrite
introduced a kmalloc() in snd_pcm_hw_refine() for removing VLA, and
this brought a possibility to trigger an error. As a result, syzbot
caught lots of superfluous kernel WARN_ON() and paniced via fault
injection.
As the WARN_ON() is no longer valid with the introduction of
kmalloc(), let's drop snd_BUG_ON() check, in order to make the world
peaceful place again.
Reported-by: syzbot+803e0047ac3a3096bb4f@syzkaller.appspotmail.com
Fixes: 5730f9f744 ("ALSA: pcm: Remove VLA usage")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dmitry reports 32bit ebtables on 64bit kernel got broken by
a recent change that returns -EINVAL when ruleset has no entries.
ebtables however only counts user-defined chains, so for the
initial table nentries will be 0.
Don't try to allocate the compat array in this case, as no user
defined rules exist no rule will need 64bit translation.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 7d7d7e0211 ("netfilter: compat: reject huge allocation requests")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Callum Sinclair reported SIP IP Phone errors that he tracked down to
such phones sending session descriptions for different media types but
with same port numbers.
The expect core will only 'refresh' existing expectation if it is
from same master AND same expectation class (media type).
As expectation class is different, we get an error.
The SIP connection tracking code will then
1). drop the SDP packet
2). if an rtp expectation was already installed successfully,
error on rtcp expectation will cancel the rtp one.
Make the expect core report back to caller when the conflict is due
to different expectation class and have SIP tracker ignore soft-error.
Reported-by: Callum Sinclair <Callum.Sinclair@alliedtelesis.co.nz>
Tested-by: Callum Sinclair <Callum.Sinclair@alliedtelesis.co.nz>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>