linux-xiaomi-chiron

Author	SHA1	Message	Date
Christoph Hellwig	23475ad49e	h8300: use dma-direct Replace the bare-bones h8300 direct dma mapping implementation with the fully featured generic dma-direct one. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-15 09:35:17 +01:00
Christoph Hellwig	f73337f4ce	cris: use dma-direct cris currently has an incomplete direct mapping dma_map_ops implementation if PCI support is enabled. Replace it with the fully feature generic dma-direct implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>	2018-01-15 09:35:16 +01:00
Christoph Hellwig	1a9777a8a0	dma-direct: reject too small dma masks Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>	2018-01-15 09:35:15 +01:00
Christoph Hellwig	19dca8c0ef	dma-direct: make dma_direct_{alloc,free} available to other implementations So that they don't need to indirect through the operation vector. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>	2018-01-15 09:35:14 +01:00
Christoph Hellwig	95f183916d	dma-direct: retry allocations using GFP_DMA for small masks If an attempt to allocate memory succeeded, but isn't inside the supported DMA mask, retry the allocation with GFP_DMA set as a last resort. Based on the x86 code, but an off by one error in what is now dma_coherent_ok has been fixed vs the x86 code. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-15 09:35:13 +01:00
Christoph Hellwig	c61e963734	dma-direct: add support for allocation from ZONE_DMA and ZONE_DMA32 This allows to dip into zones for lower memory if they are available. If one of the zones is not available the corresponding GFP_* flag will evaluate to 0 so they won't change anything. We provide an arch tunable for those architectures that do not use GFP_DMA for the lowest 24-bits, given that there are a few. Roughly based on the x86 code. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-15 09:35:12 +01:00
Christoph Hellwig	21f237e4d0	dma-direct: use node local allocations for coherent memory To preserve the x86 behavior. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>	2018-01-15 09:35:11 +01:00
Christoph Hellwig	080321d3b3	dma-direct: add support for CMA allocation Try the CMA allocator for coherent allocations if supported. Roughly modelled after the x86 code. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-15 09:35:09 +01:00
Christoph Hellwig	2797596992	dma-direct: add dma address sanity checks Roughly based on the x86 pci-nommu implementation. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-15 09:35:08 +01:00
Christoph Hellwig	2e86a04780	dma-direct: use phys_to_dma This means it uses whatever linear remapping scheme that the architecture provides is used in the generic dma_direct ops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>	2018-01-15 09:35:07 +01:00
Christoph Hellwig	002e67454f	dma-direct: rename dma_noop to dma_direct The trivial direct mapping implementation already does a virtual to physical translation which isn't strictly a noop, and will soon learn to do non-direct but linear physical to dma translations through the device offset and a few small tricks. Rename it to a better fitting name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>	2018-01-15 09:35:06 +01:00
Christoph Hellwig	c5cd037d1c	dma-mapping: provide a generic asm/dma-mapping.h For architectures that just use the generic dma_noop_ops we can provide a generic version of dma-mapping.h. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-15 09:35:05 +01:00
Christoph Hellwig	cea9d03c82	dma-mapping: add an arch_dma_supported hook To implement the x86 forbid_dac and iommu_sac_force we want an arch hook so that it can apply the global options across all dma_map_ops implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-01-15 09:34:59 +01:00
Christoph Hellwig	57bf5a8963	dma-mapping: clear harmful GFP_* flags in common code Lift the code from x86 so that we behave consistently. In the future we should probably warn if any of these is set. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]	2018-01-15 09:34:55 +01:00
Johannes Berg	51a1aaa631	mac80211_hwsim: validate number of different channels When creating a new radio on the fly, hwsim allows this to be done with an arbitrary number of channels, but cfg80211 only supports a limited number of simultaneous channels, leading to a warning. Fix this by validating the number - this requires moving the define for the maximum out to a visible header file. Reported-by: syzbot+8dd9051ff19940290931@syzkaller.appspotmail.com Fixes: `b59ec8dd43` ("mac80211_hwsim: fix number of channels in interface combinations") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2018-01-15 09:34:45 +01:00
Christoph Hellwig	205e1b7f51	dma-mapping: warn when there is no coherent_dma_mask These days all devices should have a DMA coherent mask, and most dma_ops implementations rely on that fact. But just to be sure add an assert to ring the warning bell if that is not the case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-01-15 09:34:43 +01:00
Benjamin Beichler	b71d856ab5	mac80211_hwsim: add workqueue to wait for deferred radio deletion on mod unload When closing multiple wmediumd instances with many radios and try to unload the mac80211_hwsim module, it may happen that the work items live longer than the module. To wait especially for this deletion work items, add a work queue, otherwise flush_scheduled_work would be necessary. Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2018-01-15 09:18:27 +01:00
Dominik Brodowski	7a94b8c2ee	nl80211: take RCU read lock when calling ieee80211_bss_get_ie() As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both the call to this function and any operation on this variable need protection by the RCU read lock. Fixes: `44905265bc` ("nl80211: don't expose wdev->ssid for most interfaces") Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2018-01-15 09:15:04 +01:00
Johannes Berg	a48a52b7be	cfg80211: fully initialize old channel for event Paul reported that he got a report about undefined behaviour that seems to me to originate in using uninitialized memory when the channel structure here is used in the event code in nl80211 later. He never reported whether this fixed it, and I wasn't able to trigger this so far, but we should do the right thing and fully initialize the on-stack structure anyway. Reported-by: Paul Menzel <pmenzel+linux-wireless@molgen.mpg.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2018-01-15 09:15:03 +01:00
Vinod Koul	1ab8da1182	dmaengine: sprd: statify 'sprd_dma_prep_dma_memcpy' Sparse warns that 'sprd_dma_prep_dma_memcpy' should be static so make it static. drivers/dma/sprd-dma.c:713:32: warning: symbol'sprd_dma_prep_dma_memcpy' was not declared. Should it be static? Reviewed-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2018-01-15 11:33:11 +05:30
Zhang Rui	134f401079	Merge branches 'thermal-core', 'thermal-intel' and 'thermal-soc' into next	2018-01-15 13:57:18 +08:00
Fabio Estevam	e782bc169c	thermal: thermal_hwmon: Convert to hwmon_device_register_with_info() Booting Linux on a mx6q based board leads to the following warning: (NULL device *): hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). , so do the conversion as suggested. Also, this results in the core taking care of creating the 'name' attribute, so drop the code doing that from the thermal driver. The initial attempt to convert this driver to hwmon_device_register_with_info() caused issues on the N900 platform in commit `7611fb6806` ("thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()"): bq27xxx-battery 2-0055: failed to register battery bq27xxx-battery: probe of 2-0055 failed with error -22 ... rx51-battery: probe of n900-battery failed with error -22 , leading to a revert in commit `3feb479cea` ("Revert "thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()""). The probe errors happened due to the '-' character being present in the name of the power supply devices: bq27200-0 and rx51-battery. Since commit `74d3b64197` ("hwmon: Relax name attribute validation for new APIs") hwmon will no longer treat these names as errors, allowing the transition for hwmon_device_register_with_info() to happen in a safely manner. Cc: Pavel Machek <pavel@ucw.cz> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2018-01-15 13:56:33 +08:00
Trond Myklebust	0af3442af7	SUNRPC: Add explicit rescheduling points in the receive path When reading the reply from the server, insert an explicit cond_resched() to avoid starving higher priority tasks. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Trond Myklebust	3d188805f8	SUNRPC: Chunk reading of replies from the server Read the TCP data in chunks of max 2MB so that we do not hog the socket lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Chuck Lever	024fbf9c2e	SUNRPC: Remove rpc_protocol() Since nfs4_create_referral_server was the only call site of rpc_protocol, rpc_protocol can now be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Chuck Lever	801b564309	nfs: Update server port after referral or migration After traversing a referral or recovering from a migration event, ensure that the server port reported in /proc/mounts is updated to the correct port setting for the new submount. Reported-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Chuck Lever	530ea42192	nfs: Referrals should use the same proto setting as their parent Helen Chao <helen.chao@oracle.com> noticed that when a user traverses a referral on an NFS/RDMA mount, the resulting submount always uses TCP. This behavior does not match the vers= setting when traversing a referral (vers=4.1 is preserved). It also does not match the behavior of crossing from the pseudofs into a real filesystem (proto=rdma is preserved in that case). The Linux NFS client does not currently support the fs_locations_info attribute. The situation is similar for all NFSv4 servers I know of. Therefore until the community has broad support for fs_locations_info, when following a referral: - First try to connect with RPC-over-RDMA. This will fail quickly if the client has no RDMA-capable interfaces. - If connecting with RPC-over-RDMA fails, or the RPC-over-RDMA transport is not available, use TCP. Reported-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Chuck Lever	fb455baad6	nfs: Define NFS_RDMA_PORT The NFS/RDMA port assignment is specified in Section 9 of RFC 8267. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Elena Reshetova	fbca30c513	lockd: convert nlm_rqst.a_count from atomic_t to refcount_t atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_rqst.a_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_rqst.a_count it might make a difference in following places: - nlmclnt_release_call() and nlmsvc_release_call(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Elena Reshetova	431f125b67	lockd: convert nlm_lockowner.count from atomic_t to refcount_t atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_lockowner.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_lockowner.count it might make a difference in following places: - nlm_put_lockowner(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No changes in spin lock guarantees. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Elena Reshetova	c751082cef	lockd: convert nsm_handle.sm_count from atomic_t to refcount_t atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsm_handle.sm_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsm_handle.sm_count it might make a difference in following places: - nsm_release(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No change for the spin lock guarantees. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Elena Reshetova	fee21fb587	lockd: convert nlm_host.h_count from atomic_t to refcount_t atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_host.h_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_host.h_count it might make a difference in following places: - nlmsvc_release_host(): decrement in refcount_dec() provides RELEASE ordering, while original atomic_dec() was fully unordered. Since the change is for better, it should not matter. - nlmclnt_release_host(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart. It doesn't seem to matter in this case since object freeing happens under mutex lock anyway. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Scott Mayhew	ba4a76f703	nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds Currently when falling back to doing I/O through the MDS (via pnfs_{read\|write}_through_mds), the client frees the nfs_pgio_header without releasing the reference taken on the dreq via pnfs_generic_pg_{read\|write}pages -> nfs_pgheader_init -> nfs_direct_pgio_init. It then takes another reference on the dreq via nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and as a result the requester will become stuck in inode_dio_wait. Once that happens, other processes accessing the inode will become stuck as well. Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean up correctly by calling hdr->completion_ops->completion() instead of calling hdr->release() directly. This can be reproduced (sometimes) by performing "storage failover takeover" commands on NetApp filer while doing direct I/O from a client. This can also be reproduced using SystemTap to simulate a failure while doing direct I/O from a client (from Dave Wysochanski <dwysocha@redhat.com>): stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: `1ca018d28d` ("pNFS: Fix a memory leak when attempted pnfs fails") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Benjamin Coddington	b3dce6a2f0	pnfs/blocklayout: handle transient devices PNFS block/SCSI layouts should gracefully handle cases where block devices are not available when a layout is retrieved, or the block devices are removed while the client holds a layout. While setting up a layout segment, keep a record of an unavailable or un-parsable block device in cache with a flag so that subsequent layouts do not spam the server with GETDEVINFO. We can reuse the current NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing the device, we will discard it and send a fresh GETDEVINFO after the timeout, since the lookup and validation of the device occurs within the GETDEVINFO response handling. A lookup of a layout segment that references an unavailable device will return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow the pgio layer to mark the layout with the appropriate fail bit, which forces subsequent IO to the MDS, and prevents spamming the server with LAYOUTGET, LAYOUTRETURN. Finally, when IO to a block device fails, look up the block device(s) referenced by the pgio header, and mark them as unavailable. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Benjamin Coddington	d78471d32b	pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR If there's an error doing I/O to block device, and the client resends the I/O to the MDS, the MDS must recall the layout from the client before processing the I/O. Let's preempt that exchange by returning the layout before falling back to the MDS when there's an error. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Benjamin Coddington	ad6b0241c9	pnfs/blocklayout: Add module alias for LAYOUT4_SCSI The blocklayout module contains the client support for both block and SCSI layouts. Add a module alias for the SCSI layout type so that the module will be loaded for SCSI layouts. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Benjamin Coddington	e545735a32	NFS: remove unused offset arg in nfs_pgio_rpcsetup nfs_pgio_rpcsetup() is always called with an offset of 0, so we should be able to drop the arguement altogether. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
NeilBrown	dce2630c7d	NFSv4: always set NFS_LOCK_LOST when a lock is lost. There are 2 comments in the NFSv4 code which suggest that SIGLOST should possibly be sent to a process. In these cases a lock has been lost. The current practice is to set NFS_LOCK_LOST so that read/write returns EIO when a lock is lost. So change these comments to code when sets NFS_LOCK_LOST. One case is when lock recovery after apparent server restart fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock attempt as part of lease recovery fails with NFS4ERR_DENIED. In an ideal world, these should not happen. However I have a packet trace showing an NFSv4.1 session getting NFS4ERR_BADSESSION after an extended network parition. The NFSv4.1 client treats this like server reboot until/unless it get NFS4ERR_NO_GRACE, in which case it switches over to "nograce" recovery mode. In this network trace, the client attempts to recover a lock and the server (incorrectly) reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This leads to the ineffective comment and the client then continues to write using the OPEN stateid. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
NeilBrown	aaa1500894	nfs: remove dead code from nfs_encode_fh() This code can never be used as the IS_AUTOMOUNT(inode) case has already been handled. So remove it to avoid confusion. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Trond Myklebust	9ccee940bd	Support statx() mask and query flags parameters Support the query flags AT_STATX_FORCE_SYNC by forcing an attribute revalidation, and AT_STATX_DONT_SYNC by returning cached attributes only. Use the mask to optimise away server revalidation for attributes that are not being requested by the user. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Trond Myklebust	8634ef5e05	NFS: Fix nfsstat breakage due to LOOKUPP The LOOKUPP operation was inserted into the nfs4_procedures array rather than being appended, which put /proc/net/rpc/nfs out of whack, and broke the nfsstat utility. Fix by moving the LOOKUPP operation to the end of the array, and by ensuring that it keeps the same length whether or not NFSV4.1 and NFSv4.2 are compiled in. Fixes: `5b5faaf6df` ("nfs4: add NFSv4 LOOKUPP handlers") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Trond Myklebust	82571552a0	NFSv4: Convert LOCKU to use nfs4_async_handle_exception() Convert CLOSE so that it specifies the correct stateid and inode for the error handling. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Trond Myklebust	e0dba0128a	NFSv4: Convert DELEGRETURN to use nfs4_handle_exception() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:29 -05:00
Trond Myklebust	b8b8d22109	NFSv4: Convert CLOSE to use nfs4_async_handle_exception() Convert CLOSE so that it specifies the correct stateid, state and inode for the error handling. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:28 -05:00
Trond Myklebust	7f1bda447c	NFS: Add a cond_resched() to nfs_commit_release_pages() The commit list can get very large, and so we need a cond_resched() in nfs_commit_release_pages() in order to ensure we don't hog the CPU for excessive periods of time. Reported-by: Mike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:28 -05:00
Rafael J. Wysocki	4918e1f87c	PM / runtime: Rework pm_runtime_force_suspend/resume() One of the limitations of pm_runtime_force_suspend/resume() is that if a parent driver wants to use these functions, all of its child drivers generally have to do that too because of the parent usage counter manipulations necessary to get the correct state of the parent during system-wide transitions to the working state (system resume). However, that limitation turns out to be artificial, so remove it. Namely, pm_runtime_force_suspend() only needs to update the children counter of its parent (if there's is a parent) when the device can stay in suspend after the subsequent system resume transition, as that counter is correct already otherwise. Now, if the parent's children counter is not updated, it is not necessary to increment the parent's usage counter in that case any more, as long as the children counters of devices are checked along with their usage counters in order to decide whether or not the devices may be left in suspend after the subsequent system resume transition. Accordingly, modify pm_runtime_force_suspend() to only call pm_runtime_set_suspended() for devices whose usage and children counters are at the "no references" level (the runtime PM status of the device needs to be updated to "suspended" anyway in case this function is called once again for the same device during the transition under way), drop the parent usage counter incrementation from it and update pm_runtime_force_resume() to compensate for these changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>	2018-01-15 01:36:33 +01:00
Rafael J. Wysocki	46fa233acf	Merge generic power domains (genpd) material for v4.16 into pm-core	2018-01-15 01:34:17 +01:00
Rafael J. Wysocki	17218e0092	PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume() There are problems with calling pm_runtime_force_suspend/resume() to "stop" and "start" devices in genpd_finish_suspend() and genpd_resume_noirq() (and in analogous hibernation-specific genpd callbacks) after commit `122a22377a` (PM / Domains: Stop/start devices during system PM suspend/resume in genpd) as those routines do much more than just "stopping" and "starting" devices (which was the stated purpose of that commit) unnecessarily and may not play well with system-wide PM driver callbacks. First, consider the pm_runtime_force_suspend() in genpd_finish_suspend(). If the current runtime PM status of the device is "suspended", that function most likely does the right thing by ignoring the device, because it should have been "stopped" already and whatever needed to be done to deactivate it shoud have been done. In turn, if the runtime PM status of the device is "active", genpd_runtime_suspend() is called for it (indirectly) and (1) runs the ->runtime_suspend callback provided by the device's driver (assuming no bus type with ->runtime_suspend of its own), (2) "stops" the device and (3) checks if the domain can be powered down, and then (4) the device's runtime PM status is changed to "suspended". Out of the four actions above (1) is not necessary and it may be outright harmful, (3) is pointless and (4) is questionable. The only operation that needs to be carried out here is (2). The reason why (1) is not necessary is because the system-wide PM callbacks provided by the device driver for the transition in question have been run and they should have taken care of the driver's part of device suspend already. Moreover, it may be harmful, because the ->runtime_suspend callback may want to access the device which is partially suspended at that point and may not be responsive. Also, system-wide PM callbacks may have been run already (in the previous phases of the system transition under way) for the device's parent or for its supplier devices (if any) and the device may not be accessible because of that. There also is no reason to do (3), because genpd_finish_suspend() will repeat it anyway, and (4) potentially causes confusion to ensue during the subsequent system transition to the working state. Consider pm_runtime_force_resume() in genpd_resume_noirq() now. It runs genpd_runtime_resume() for all devices with runtime PM status set to "suspended", which includes all of the devices whose runtime PM status was changed by pm_runtime_force_suspend() before and may include some devices already suspended when the pm_runtime_force_suspend() was running, which may be confusing. The genpd_runtime_resume() first tries to power up the domain, which (again) is pointless, because genpd_resume_noirq() has done that already. Then, it "starts" the device and runs the ->runtime_resume callback (from the driver, say) for it. If all is well, the device is left with the runtime PM status set to "active". Unfortunately, running the driver's ->runtime_resume callback before its system-wide PM callbacks and possibly before some system-wide PM callbacks of the parent device's driver (let alone supplier drivers) is asking for trouble, especially if the device had been suspended before pm_runtime_force_suspend() ran previously or if the callbacks in question expect to be run back-to-back with their suspend-side counterparts. It also should not be necessary, because the system-wide PM driver callbacks that will be invoked for the device subsequently should take care of resuming it just fine. [Running the driver's ->runtime_resume callback in the "noirq" phase of the transition to the working state may be problematic even for devices whose drivers do use pm_runtime_force_resume() in (or as) their system-wide PM callbacks if they have suppliers other than their parents, because it may cause the supplier to be resumed after the consumer in some cases.] Because of the above, modify genpd as follows: 1. Change genpd_finish_suspend() to only "stop" devices with runtime PM status set to "active" (without invoking runtime PM callbacks for them, changing their runtime PM status and so on). That doesn't change the handling of devices whose drivers use pm_runtime_force_suspend/resume() in (or as) their system-wide PM callbacks and addresses the issues described above for the other devices. 2. Change genpd_resume_noirq() to only "start" devices with runtime PM status set to "active" (without invoking runtime PM callbacks for them, changing their runtime PM status and so on). Again, that doesn't change the handling of devices whose drivers use pm_runtime_force_suspend/resume() in (or as) their system-wide PM callbacks and addresses the described issues for the other devices. Devices with runtime PM status set to "suspended" are not started with the assumption that they will be resumed later, either by pm_runtime_force_resume() or via runtime PM. 3. Change genpd_restore_noirq() to follow genpd_resume_noirq(). That causes devices already suspended before hibernation to be left alone (which also is the case without the change) and avoids running the ->runtime_resume driver callback too early for the other devices. 4. Change genpd_freeze_noirq() and genpd_thaw_noirq() in accordance with the above modifications. Fixes: `122a22377a` (PM / Domains: Stop/start devices during system PM suspend/resume in genpd) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>	2018-01-15 01:33:37 +01:00
Tom Lendacky	28d437d550	x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros The PAUSE instruction is currently used in the retpoline and RSB filling macros as a speculation trap. The use of PAUSE was originally suggested because it showed a very, very small difference in the amount of cycles/time used to execute the retpoline as compared to LFENCE. On AMD, the PAUSE instruction is not a serializing instruction, so the pause/jmp loop will use excess power as it is speculated over waiting for return to mispredict to the correct target. The RSB filling macro is applicable to AMD, and, if software is unable to verify that LFENCE is serializing on AMD (possible when running under a hypervisor), the generic retpoline support will be used and, so, is also applicable to AMD. Keep the current usage of PAUSE for Intel, but add an LFENCE instruction to the speculation trap for AMD. The same sequence has been adopted by GCC for the GCC generated retpolines. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Kees Cook <keescook@google.com> Link: https://lkml.kernel.org/r/20180113232730.31060.36287.stgit@tlendack-t1.amdoffice.net	2018-01-15 00:32:55 +01:00
David Woodhouse	c995efd5a7	x86/retpoline: Fill RSB on context switch for affected CPUs On context switch from a shallow call stack to a deeper one, as the CPU does 'ret' up the deeper side it may encounter RSB entries (predictions for where the 'ret' goes to) which were populated in userspace. This is problematic if neither SMEP nor KPTI (the latter of which marks userspace pages as NX for the kernel) are active, as malicious code in userspace may then be executed speculatively. Overwrite the CPU's return prediction stack with calls which are predicted to return to an infinite loop, to "capture" speculation if this happens. This is required both for retpoline, and also in conjunction with IBRS for !SMEP && !KPTI. On Skylake+ the problem is slightly different, and an underflow of the RSB may cause errant branch predictions to occur. So there it's not so much overwrite, as filling the RSB to attempt to prevent it getting empty. This is only a partial solution for Skylake+ since there are many other conditions which may result in the RSB becoming empty. The full solution on Skylake+ is to use IBRS, which will prevent the problem even when the RSB becomes empty. With IBRS, the RSB-stuffing will not be required on context switch. [ tglx: Added missing vendor check and slighty massaged comments and changelog ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk	2018-01-15 00:32:44 +01:00

... 92 93 94 95 96 ...

737480 commits