Commit graph

900375 commits

Author SHA1 Message Date
Dmitry Osipenko
9f42de8d4e i2c: tegra: Fix suspending in active runtime PM state
I noticed that sometime I2C clock is kept enabled during suspend-resume.
This happens because runtime PM defers dynamic suspension and thus it may
happen that runtime PM is in active state when system enters into suspend.
In particular I2C controller that is used for CPU's DVFS is often kept ON
during suspend because CPU's voltage scaling happens quite often.

Fixes: 8ebf15e9c8 ("i2c: tegra: Move suspend handling to NOIRQ phase")
Cc: <stable@vger.kernel.org> # v5.4+
Tested-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-01-15 18:16:51 +01:00
Kelvin Cao
7a30ebb9f2 PCI/switchtec: Add Gen4 device IDs
Now that Gen4 is properly supported, advertise support in the module's
device ID table and add the same IDs to the list of switchtec quirks.

[logang@deltatee.com: add commit message and quirk IDs]
Link: https://lore.kernel.org/r/20200115035648.2578-8-logang@deltatee.com
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:40 -06:00
Kelvin Cao
ce7c88600b PCI/switchtec: Add Gen4 MRPC GAS access permission check
Gen4 hardware provides new MRPC commands to read and write directly from
any address in the PCI BAR (which Microsemi refers to as GAS). Since
accessing BARs can be dangerous and break the driver, we don't want
unprivileged users to have this ability.

Therefore, require CAP_SYS_ADMIN for the local and remote GAS access MRPC
commands. Privileged processes will already have access to the BAR through
the sysfs resource file so this doesn't give userspace any capabilities it
didn't already have.

[logang@deltatee.com: rework commit message]
Link: https://lore.kernel.org/r/20200106190337.2428-11-logang@deltatee.com
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:39 -06:00
Kelvin Cao
4efa1d2e36 PCI/switchtec: Add Gen4 flash information interface support
Add the new flash_info registers struct and the implementation of
ioctl_flash_part_info() for the new Gen4 hardware.

[logang@deltatee.com: rewrote commit message]
Link: https://lore.kernel.org/r/20200115035648.2578-7-logang@deltatee.com
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:39 -06:00
Logan Gunthorpe
a3321ca394 PCI/switchtec: Add Gen4 system info register support
Add the Gen4-specific system info registers and ensure their usage is
guarded by a check on the device's generation.

Link: https://lore.kernel.org/r/20200115035648.2578-6-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:38 -06:00
Logan Gunthorpe
993d208daa PCI/switchtec: Separate Gen3 register structures into unions
Since the sys_info and flash_info registers differ significantly in Gen4
hardware, separate out the Gen3 registers into their own structure with a
union in the main structure.

No functional changes intended.

Link: https://lore.kernel.org/r/20200115035648.2578-5-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:38 -06:00
Logan Gunthorpe
6a3d1b542c PCI/switchtec: Factor out Gen3 ioctl_flash_part_info()
Refactor ioctl_flash_part_info() into a Gen3-specific function because the
registers for flash partition information have changed significantly in
Gen4 and will require a completely different implementation.

No functional changes intended.

Co-developed-by: Kelvin Cao <kelvin.cao@microchip.com>
Link: https://lore.kernel.org/r/20200115035648.2578-4-logang@deltatee.com
Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:37 -06:00
Logan Gunthorpe
b13313a01a PCI/switchtec: Add 'generation' variable
Add a generation variable passed through the device ID table and test for
Gen3-specific registers.  This will allow us to add Gen4 and other devices
that extend the programming model.

Link: https://lore.kernel.org/r/20200115035648.2578-3-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:37 -06:00
Logan Gunthorpe
fcccd282b6 PCI/switchtec: Rename generation-specific constants
Gen4 hardware will have different values for the SWITCHTEC_X_RUNNING and
SWITCHTEC_IOCTL_NUM_PARTITIONS, so rename them with GEN3 in their name.

No functional changes intended.

Link: https://lore.kernel.org/r/20200115035648.2578-2-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:37 -06:00
Wesley Sheng
2085747d53 PCI/switchtec: Move check event ID from mask_event() to switchtec_event_isr()
The event ID check doesn't depend on anything in the mask_all_events() to
mask_event() path.  Do it in switchtec_event_isr() to avoid the CSR read in
mask_event().

Link: https://lore.kernel.org/r/20200106190337.2428-6-logang@deltatee.com
Signed-off-by: Wesley Sheng <wesley.sheng@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:36 -06:00
Wesley Sheng
7501a02a9d PCI/switchtec: Remove redundant valid PFF number count
Remove the redundant valid PFF number count from ioctl_event_summary(),
since init_pff() has already counted the valid PFFs.

Link: https://lore.kernel.org/r/20200106190337.2428-5-logang@deltatee.com
Signed-off-by: Wesley Sheng <wesley.sheng@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:36 -06:00
Logan Gunthorpe
a6b0ef9a7d PCI/switchtec: Add support for Intercomm Notify and Upstream Error Containment
Add support for the Inter Fabric Manager Communication (Intercomm) Notify
event in PAX variants of Switchtec hardware and the Upstream Error
Containment port in the MR1 release of Gen3 firmware.

Link: https://lore.kernel.org/r/20200106190337.2428-4-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-15 11:00:27 -06:00
Will Deacon
a569f5f372 arm64: Use register field helper in kaslr_requires_kpti()
Rather than open-code the extraction of the E0PD field from the MMFR2
register, we can use the cpuid_feature_extract_unsigned_field() helper
instead.

Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:49:48 +00:00
Will Deacon
ebac96ede6 arm64: Simplify early check for broken TX1 when KASLR is enabled
Now that the decision to use non-global mappings is stored in a variable,
the check to avoid enabling them for the terminally broken ThunderX1
platform can be simplified so that it is only keyed off the MIDR value.

Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:49:27 +00:00
Vincenzo Frascino
ca78eee7b4 xfs: Add __packed to xfs_dir2_sf_entry_t definition
xfs_check_ondisk_structs() verifies that the sizes of the data types
used by xfs are correct via the XFS_CHECK_STRUCT_SIZE() macro.

Since the structures padding can vary depending on the ABI (e.g. on
ARM OABI structures are padded to multiple of 32 bits), it may happen
that xfs_dir2_sf_entry_t size check breaks the compilation with the
assertion below:

In file included from linux/include/linux/string.h:6,
                 from linux/include/linux/uuid.h:12,
                 from linux/fs/xfs/xfs_linux.h:10,
                 from linux/fs/xfs/xfs.h:22,
                 from linux/fs/xfs/xfs_super.c:7:
In function ‘xfs_check_ondisk_structs’,
    inlined from ‘init_xfs_fs’ at linux/fs/xfs/xfs_super.c:2025:2:
linux/include/linux/compiler.h:350:38:
    error: call to ‘__compiletime_assert_107’ declared with attribute
    error: XFS: sizeof(xfs_dir2_sf_entry_t) is wrong, expected 3
    _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)

Restore the correct behavior adding __packed to the structure definition.

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-15 08:45:51 -08:00
Vladimir Murzin
8bf9284d99 arm64: Turn "broken gas inst" into real config option
Use the new 'as-instr' Kconfig macro to define CONFIG_BROKEN_GAS_INST
directly, making it available everywhere.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
[will: Drop redundant 'y if' logic]
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:35:12 +00:00
Jean-Philippe Brucker
5a4549fd79 PCI/ATS: Add PASID stubs
The SMMUv3 driver, which may be built without CONFIG_PCI, will soon gain
PASID support.  Partially revert commit c6e9aefbf9 ("PCI/ATS: Remove
unused PRI and PASID stubs") to re-introduce the PASID stubs, and avoid
adding more #ifdefs to the SMMU driver.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:30:28 +00:00
Will Deacon
92c1d360dc iommu/arm-smmu-v3: Return -EBUSY when trying to re-add a device
Although we WARN in arm_smmu_add_device() if the device being added has
been added already without a subsequent call to arm_smmu_remove_device(),
we still continue half-heartedly, initialising the stream-table for any
new StreamIDs that may have magically appeared and re-establishing device
links that should still be there from last time.

Given that calling ->add_device() twice without removing the device in the
meantime is indicative of an error in the caller, just return -EBUSY after
warning.

Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jean Philippe-Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:30:28 +00:00
Jean-Philippe Brucker
a2be6218e6 iommu/arm-smmu-v3: Improve add_device() error handling
Let add_device() clean up after itself. The iommu_bus_init() function
does call remove_device() on error, but other sites (e.g. of_iommu) do
not.

Don't free level-2 stream tables because we'd have to track if we
allocated each of them or if they are used by other endpoints. It's not
worth the hassle since they are managed resources.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:30:27 +00:00
Will Deacon
d71e01716b iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
If, for some bizarre reason, the compiler decided to split up the write
of STE DWORD 0, we could end up making a partial structure valid.

Although this probably won't happen, follow the example of the
context-descriptor code and use WRITE_ONCE() to ensure atomicity of the
write.

Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:30:27 +00:00
Masahiro Yamada
9c9aa8fdf3 kbuild: remove 'Building modules, stage 2.' log
This log is displayed every time modules are built, but it is not
so important.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-01-16 01:18:35 +09:00
Jean-Philippe Brucker
73af06f589 iommu/arm-smmu-v3: Add second level of context descriptor table
The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:06:50 +00:00
Jean-Philippe Brucker
492ddc79e0 iommu/arm-smmu-v3: Prepare for handling arm_smmu_write_ctx_desc() failure
Second-level context descriptor tables will be allocated lazily in
arm_smmu_write_ctx_desc(). Help with handling allocation failure by
moving the CD write into arm_smmu_domain_finalise_s1().

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
[will: Add comment per discussion on list]
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:05:54 +00:00
Jean-Philippe Brucker
2505ec6f85 iommu/arm-smmu-v3: Propagate ssid_bits
Now that we support substream IDs, initialize s1cdmax with the number of
SSID bits supported by a master and the SMMU.

Context descriptor tables are allocated once for the first master
attached to a domain. Therefore attaching multiple devices with
different SSID sizes is tricky, and we currently don't support it.

As a future improvement it would be nice to at least support attaching a
SSID-capable device to a domain that isn't using SSID, by reallocating
the SSID table. This would allow supporting a SSID-capable device that
is in the same IOMMU group as a bridge, for example. Varying SSID size
is less of a concern, since the PCIe specification "highly recommends"
that devices supporting PASID implement all 20 bits of it.

Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:05:53 +00:00
Jean-Philippe Brucker
87f42391f6 iommu/arm-smmu-v3: Add support for Substream IDs
At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
page directory per device. However SMMUv3 allows more than one address
space for some devices, by providing multiple stage-1 page directories. In
addition to the Stream ID (SID), that identifies a device, we can now have
Substream IDs (SSID) identifying an address space. In PCIe, SID is called
Requester ID (RID) and SSID is called Process Address-Space ID (PASID).
A complete stage-1 walk goes through the context descriptor table:

      Stream tables       Ctx. Desc. tables       Page tables
        +--------+   ,------->+-------+   ,------->+-------+
        :        :   |        :       :   |        :       :
        +--------+   |        +-------+   |        +-------+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
        +--------+            +-------+            +-------+
        :        :            :       :            :       :
        +--------+            +-------+            +-------+

Rewrite arm_smmu_write_ctx_desc() to modify context descriptor table
entries. To keep things simple we only implement one level of context
descriptor tables here, but as with stream and page tables, an SSID can
be split to index multiple levels of tables.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:05:53 +00:00
Michal Koutný
3bc0bb36fa cgroup: Prevent double killing of css when enabling threaded cgroup
The test_cgcore_no_internal_process_constraint_on_threads selftest when
running with subsystem controlling noise triggers two warnings:

> [  597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0
> [  597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160

Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.

When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.

The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.

The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.

Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-01-15 08:04:29 -08:00
Daniel Jordan
e8ab20d9bc workqueue: remove workqueue_work event class
The trace event class workqueue_work now has only one consumer, so get
rid of it.  No functional change.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-01-15 08:02:59 -08:00
Daniel Jordan
1c5da0ec7f workqueue: add worker function to workqueue_execute_end tracepoint
It's surprising that workqueue_execute_end includes only the work when
its counterpart workqueue_execute_start has both the work and the worker
function.

You can't set a tracing filter or trigger based on the function, and
postprocessing scripts interested in specific functions are harder to
write since they have to remember the work from _start and match it up
with the same field in _end.

Add the function name, taking care to use the copy stashed in the
worker since the work is no longer safe to touch.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-01-15 08:02:47 -08:00
Jean-Philippe Brucker
a557aff0c7 iommu/arm-smmu-v3: Add context descriptor tables allocators
Support for SSID will require allocating context descriptor tables. Move
the context descriptor allocation to separate functions.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:00:57 +00:00
Jean-Philippe Brucker
7bc4f3fae9 iommu/arm-smmu-v3: Prepare arm_smmu_s1_cfg for SSID support
When adding SSID support to the SMMUv3 driver, we'll need to manipulate
leaf pasid tables and context descriptors. Extract the context
descriptor structure and align with the way stream tables are handled.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:00:57 +00:00
Jean-Philippe Brucker
da22565d1d ACPI/IORT: Parse SSID property of named component node
Named component nodes in the IORT tables describe the number of
Substream ID bits (aka. PASID) supported by the device. Propagate this
value to the fwspec structure in order to enable PASID for platform
devices.

Acked-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:00:57 +00:00
Jean-Philippe Brucker
89535821c0 iommu/arm-smmu-v3: Parse PASID devicetree property of platform devices
For platform devices that support SubstreamID (SSID), firmware provides
the number of supported SSID bits. Restrict it to what the SMMU supports
and cache it into master->ssid_bits, which will also be used for PCI
PASID.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 16:00:57 +00:00
Jean-Philippe Brucker
2e981b9468 dt-bindings: document PASID property for IOMMU masters
On Arm systems, some platform devices behind an SMMU may support the PASID
feature, which offers multiple address space. Let the firmware tell us
when a device supports PASID.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 15:59:13 +00:00
Jean-Philippe Brucker
9bb9069cfb iommu/arm-smmu-v3: Drop __GFP_ZERO flag from DMA allocation
Since commit 518a2f1925 ("dma-mapping: zero memory returned from
dma_alloc_*"), dma_alloc_* always initializes memory to zero, so there
is no need to use dma_zalloc_* or pass the __GFP_ZERO flag anymore.

The flag was introduced by commit 04fa26c71b ("iommu/arm-smmu: Convert
DMA buffer allocations to the managed API"), since the managed API
didn't provide a dmam_zalloc_coherent() function.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-15 15:59:13 +00:00
Chen Zhou
75ea91cd3e cgroup: fix function name in comment
Function name cgroup_rstat_cpu_pop_upated() in comment should be
cgroup_rstat_cpu_pop_updated().

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-01-15 07:58:13 -08:00
Su Yanjun
fe1e8dbec1 NFSv3: FIx bug when using chacl and chmod to change acl
We find a bug when running test under nfsv3  as below.
1)
chacl u::r--,g::rwx,o:rw- file1
2)
chmod u+w file1
3)
chacl -l file1

We expect u::rw-, but it shows u::r--, more likely it returns the
cached acl in inode.

We dig the code find that the code path is different.

chacl->..->__nfs3_proc_setacls->nfs_zap_acl_cache
Then nfs_zap_acl_cache clears the NFS_INO_INVALID_ACL in
NFS_I(inode)->cache_validity.

chmod->..->nfs3_proc_setattr
Because NFS_INO_INVALID_ACL has been cleared by chacl path,
nfs_zap_acl_cache wont be called.

nfs_setattr_update_inode will set NFS_INO_INVALID_ACL so let it
before nfs_zap_acl_cache call.

Signed-off-by: Su Yanjun <suyanjun218@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Olga Kornievskaia
d826e5b827 NFSv4.x recover from pre-mature loss of openstateid
Ever since the commit 0e0cb35b41, it's possible to lose an open stateid
while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens,
operations that require openstateid fail with EAGAIN which is propagated
to the application then tests like generic/446 and generic/168 fail with
"Resource temporarily unavailable".

Instead of returning this error, initiate state recovery when possible to
recover the open stateid and then try calling nfs4_select_rw_stateid()
again.

Fixes: 0e0cb35b41 ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Olga Kornievskaia
62a1573fcf NFSv4 fix acl retrieval over krb5i/krb5p mounts
For the krb5i and krb5p mount, it was problematic to truncate the
received ACL to the provided buffer because an integrity check
could not be preformed.

Instead, provide enough pages to accommodate the largest buffer
bounded by the largest RPC receive buffer size.

Note: I don't think it's possible for the ACL to be truncated now.
Thus NFS4_ACL_TRUNC flag and related code could be possibly
removed but since I'm unsure, I'm leaving it.

v2: needs +1 page.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
c74dfe97c1 NFS: Add mount option 'softreval'
Add a mount option 'softreval' that allows attribute revalidation 'getattr'
calls to time out, and causes them to fall back to using the cached
attributes.
The use case for this option is for ensuring that we can still (slowly)
traverse paths and use cached information even when the server is down.
Once the server comes back up again, the getattr calls start succeeding,
and the caches will revalidate as usual.

The 'softreval' mount option is automatically enabled if you have
specified 'softerr'.  It can be turned off using the options
'nosoftreval', or 'hard'.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
5c965db86e NFS: Trust cached access if we've already revalidated the inode once
If we've already revalidated the inode once then don't distrust the
access cache unless the NFS_INO_INVALID_ACCESS flag is actually set.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
4daaeba938 NFS: Fix nfs_direct_write_reschedule_io()
The 'hdr->good_bytes' is defined as the number of bytes we expect to
read or write starting at offset hdr->io_start. In the case of a partial
read/write we may end up adjusting hdr->args.offset and hdr->args.count
to skip I/O for data that was already read/written, and so we must ensure
the calculation takes that into account.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
8c9cb71491 NFS: When resending after a short write, reset the reply count to zero
If we're resending a write due to a short read or write, ensure we
reset the reply count to zero.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
e8194b7dd3 NFS: Improve tracing of permission calls
On exit from nfs_do_access(), record the mask representing the requested
permissions, as well as the server-supplied set of access rights for
this user.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
088f3e68d8 pNFS/flexfiles: Add tracing for layout errors
Trace layout errors for pNFS/flexfiles on read/write/commit operations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
7bdd297ea6 NFS: Clean up generic file commit tracepoint
Clean up the generic file commit tracepoints to use a 64-bit value
for the verifier, and to display the pNFS filehandle, if it exists.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
5bb2a7cb9f NFS: Clean up generic writeback tracepoints
Clean up the generic writeback tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually written.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
2343172d34 NFS: Clean up generic file read tracepoints
Clean up the generic file read tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually read.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
0722dc9fea pNFS/flexfiles: Record resend attempts on I/O failure
If the attempt to do pNFS fails, then record what action we
take to recover (resend, reset to pnfs or reset to mds).

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
118b629219 NFS: Fix fix of show_nfs_errors
Casting a negative value to an unsigned long is not the same as
converting it to its absolute value.

Fixes: 96650e2eff ("NFS: Fix show_nfs_errors macros again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
25925b00a9 NFSv4: Improve read/write/commit tracing
Ensure we always return the number of bytes read/written. Also display
the pnfs filehandle if it is in use.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00