Required for follow-up patch adding new UUID needing new function
mask.
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Refactor common code to prepare for upcoming changes.
* Remove unused struct.
* Print error before returning.
* Frees ACPI obj if _DSM type is not as expected.
* Treat lps0_dsm_func_mask as an integer rather than character
* Remove extra out_obj
* Move rev_id
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AMD spec mentions only revision 0. With this change,
device constraint list is populated properly.
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
arm64 cache management function cleanup from Fuad Tabba,
shared with the arm64 tree.
* arm64/for-next/caches:
arm64: Rename arm64-internal cache maintenance functions
arm64: Fix cache maintenance function comments
arm64: sync_icache_aliases to take end parameter instead of size
arm64: __clean_dcache_area_pou to take end parameter instead of size
arm64: __clean_dcache_area_pop to take end parameter instead of size
arm64: __clean_dcache_area_poc to take end parameter instead of size
arm64: __flush_dcache_area to take end parameter instead of size
arm64: dcache_by_line_op to take end parameter instead of size
arm64: __inval_dcache_area to take end parameter instead of size
arm64: Fix comments to refer to correct function __flush_icache_range
arm64: Move documentation of dcache_by_line_op
arm64: assembler: remove user_alt
arm64: Downgrade flush_icache_range to invalidate
arm64: Do not enable uaccess for invalidate_icache_range
arm64: Do not enable uaccess for flush_icache_range
arm64: Apply errata to swsusp_arch_suspend_exit
arm64: assembler: add conditional cache fixups
arm64: assembler: replace `kaddr` with `addr`
Signed-off-by: Marc Zyngier <maz@kernel.org>
The tail return statement is redundant in void functions. Remove it.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We have a few open-coded __ATTR_RO() and __ATTR_RW() macros.
Replace the custom code with generic macros.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove the unused ptype struct value, which makes table init easier for
the zero entries, and use ranged initializer to remove a bunch of code
(works with gcc and clang). There is no significant functional change.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove the unused ptype struct value, which makes table init easier for
the zero entries, and use ranged initializer to remove a bunch of code
(works with gcc and clang). There is no significant functional change.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The hardware is reporting the type of the hash used for RSS
as a PTYPE field in the receive descriptor. Use this value to set
the skb packet hash type by extending the hash type table to
cover all 10-bits of possible values (requiring some variables
to be changed from u8 to u16), and then use that table to convert
to one of the possible values in enum pkt_hash_types.
While we're here, remove the unused ptype struct value, which
makes table init easier for the zero entries, and use ranged
initializer to remove a bunch of code (works with gcc and clang).
Without this change, the kernel will recalculate the hash in software,
which can consume extra CPU cycles.
Co-developed-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ARM timer (Jisheng Zhang)
- Minor cleanups (whitespace, constification, ...) for the Samsung pwm
timer (Krzysztof Kozlowski)
- Acknowledge and disable the timer interrupt at suspend time to
prevent the suspend to be aborted by the ATF if there is a pending
one on the Mediatek timer (Evan Benn)
- Save and restore the configuration register at suspend/resume time
for TI dm timer (Tony Lindgren)
- Set the scene for the next timers support by renaming the array
variables on the Ingenic time (Zhou Yanjie)
- Add the clock rate change notification to adjust the prescalar value
and compensate the clock source on the ARM global timer (Andrea
Merello)
- Add missing variable static annotation on the ARM global timer (Zou
Wei)
- Remove a duplicate argument when building the bits field on the ARM
global timer (Wan Jiabing)
- Improve the timer workaround function by reducing the loop on the
Allwinner A64 timer (Samuel Holland)
- Do no restore the register context in case of error on the TI dm
timer (Tony Lindgren)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmDKNmQACgkQqDIjiipP
6E9iAAf+Jbh1rIdxRBFYvJz7R78/AR8H4JTPNXdyrHGkPa4Dx/1tuYJRVsLzLELk
28yWRjU/rl0lBzZjoUmszsYuyTXghjqEdTMU8KYkU5/9n8LfGXuA5d0y1qulYh6a
UoK0RNiVzgDs49v7gj5khDBQQDCDf8RVdgaUyNiE7ciXsBxE6jir3UMiUT5fvVwW
0utGbXLUEr/FpVXOb/9XBem6BUGAWvkNM2566qXgVRp7t+Y0py75GZklLrJ3/3lI
+UI6uAQrBwOr54kqKp5/+GMuRIXtnGy4jud391X9rbIwr/tPgnGJXwlqg+Ups7x5
ICCU42f5JX9ukKBT+XwoXcp3sk91OA==
=XeyC
-----END PGP SIGNATURE-----
Merge tag 'timers-v5.14' of https://git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull clockevent/source updates from Daniel Lezcano:
- Remove arch_timer_rate1 variable as it is unused in the architected
ARM timer (Jisheng Zhang)
- Minor cleanups (whitespace, constification, ...) for the Samsung pwm
timer (Krzysztof Kozlowski)
- Acknowledge and disable the timer interrupt at suspend time to
prevent the suspend to be aborted by the ATF if there is a pending
one on the Mediatek timer (Evan Benn)
- Save and restore the configuration register at suspend/resume time
for TI dm timer (Tony Lindgren)
- Set the scene for the next timers support by renaming the array
variables on the Ingenic time (Zhou Yanjie)
- Add the clock rate change notification to adjust the prescalar value
and compensate the clock source on the ARM global timer (Andrea
Merello)
- Add missing variable static annotation on the ARM global timer (Zou
Wei)
- Remove a duplicate argument when building the bits field on the ARM
global timer (Wan Jiabing)
- Improve the timer workaround function by reducing the loop on the
Allwinner A64 timer (Samuel Holland)
- Do no restore the register context in case of error on the TI dm
timer (Tony Lindgren)
Link: https://lore.kernel.org/r/65ed5f60-d7a5-b4ae-ff78-0382d4671cc5@linaro.org
Trying to start a new PIO transfer by writing value 0 in PIO_START register
when previous transfer has not yet completed (which is indicated by value 1
in PIO_START) causes an External Abort on CPU, which results in kernel
panic:
SError Interrupt on CPU0, code 0xbf000002 -- SError
Kernel panic - not syncing: Asynchronous SError Interrupt
To prevent kernel panic, it is required to reject a new PIO transfer when
previous one has not finished yet.
If previous PIO transfer is not finished yet, the kernel may issue a new
PIO request only if the previous PIO transfer timed out.
In the past the root cause of this issue was incorrectly identified (as it
often happens during link retraining or after link down event) and special
hack was implemented in Trusted Firmware to catch all SError events in EL3,
to ignore errors with code 0xbf000002 and not forwarding any other errors
to kernel and instead throw panic from EL3 Trusted Firmware handler.
Links to discussion and patches about this issue:
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
But the real cause was the fact that during link retraining or after link
down event the PIO transfer may take longer time, up to the 1.44s until it
times out. This increased probability that a new PIO transfer would be
issued by kernel while previous one has not finished yet.
After applying this change into the kernel, it is possible to revert the
mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Cc: stable@vger.kernel.org # 7fbcb5da81 ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
Although the AMD RS690 chipset has 64-bit DMA support, BIOS implementations
sometimes fail to configure the memory limit registers correctly.
The Acer F690GVM mainboard uses this chipset and a Marvell 88E8056 NIC. The
sky2 driver programs the NIC to use 64-bit DMA, which will not work:
sky2 0000:02:00.0: error interrupt status=0x8
sky2 0000:02:00.0 eth0: tx timeout
sky2 0000:02:00.0 eth0: transmit ring 0 .. 22 report=0 done=0
Other drivers required by this mainboard either don't support 64-bit DMA,
or have it disabled using driver specific quirks. For example, the ahci
driver has quirks to enable or disable 64-bit DMA depending on the BIOS
version (see ahci_sb600_enable_64bit() in ahci.c). This ahci quirk matches
against the SB600 SATA controller, but the real issue is almost certainly
with the RS690 PCI host that it was commonly attached to.
To avoid this issue in all drivers with 64-bit DMA support, fix the
configuration of the PCI host. If the kernel is aware of physical memory
above 4GB, but the BIOS never configured the PCI host with this
information, update the registers with our values.
[bhelgaas: drop PCI_DEVICE_ID_ATI_RS690 definition]
Link: https://lore.kernel.org/r/20210611214823.4898-1-mikel@mikelr.com
Signed-off-by: Mikel Rychliski <mikel@mikelr.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The Broadcom BCM57414 NIC may be a multi-function device. While it does
not advertise an ACS capability, peer-to-peer transactions are not possible
between the individual functions, so it is safe to treat them as fully
isolated.
Add an ACS quirk for this device so the functions can be in independent
IOMMU groups and attached individually to userspace applications using
VFIO.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/1621645997-16251-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Observed unexpected GPU hang during runpm stress test on 0x7341 rev 0x00.
Further debugging shows broken ATS is related.
Disable ATS on this part. Similar issues on other devices:
a2da5d8cc0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms")
45beb31d3a ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
5e89cd303e ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20210602021255.939090-1-evan.quan@amd.com
Signed-off-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Cc: stable@vger.kernel.org
pcie_flr() starts a Function Level Reset (FLR), waits 100ms (the maximum
time allowed for FLR completion by PCIe r5.0, sec 6.6.2), and waits for the
FLR to complete. It assumes the FLR is complete when a config read returns
valid data.
When we do an FLR on several Huawei Intelligent NIC VFs at the same time,
firmware on the NIC processes them serially. The VF may respond to config
reads before the firmware has completed its reset processing. If we bind a
driver to the VF (e.g., by assigning the VF to a virtual machine) in the
interval between the successful config read and completion of the firmware
reset processing, the NIC VF driver may fail to load.
Prevent this driver failure by waiting for the NIC firmware to complete its
reset processing. Not all NIC firmware supports this feature.
[bhelgaas: commit log]
Link: https://support.huawei.com/enterprise/en/doc/EDOC1100063073/87950645/vm-oss-occasionally-fail-to-load-the-in200-driver-when-the-vf-performs-flr
Link: https://lore.kernel.org/r/20210414132301.1793-1-chiqijun@huawei.com
Signed-off-by: Chiqijun <chiqijun@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Some NVIDIA GPU devices do not work with SBR. Triggering SBR leaves the
device inoperable for the current system boot. It requires a system
hard-reboot to get the GPU device back to normal operating condition
post-SBR. For the affected devices, enable NO_BUS_RESET quirk to avoid the
issue.
This issue will be fixed in the next generation of hardware.
Link: https://lore.kernel.org/r/20210608054857.18963-8-ameynarkhede03@gmail.com
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Cc: stable@vger.kernel.org
Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS
automatically disables LTSSM when Secondary Bus Reset is received and
device stops working. Prevent bus reset for these devices. With this
change, the device can be assigned to VMs with VFIO, but it will leak state
between VMs.
Reference: https://e2e.ti.com/support/processors/f/791/t/954382
Link: https://lore.kernel.org/r/20210315102606.17153-1-antti.jarvinen@gmail.com
Signed-off-by: Antti Järvinen <antti.jarvinen@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com>
Cc: stable@vger.kernel.org
7f10074474 ("PCI: tegra: Add Tegra194 MCFG quirks for ECAM errata")
caused a few build regressions:
- 7f10074474 removed the Makefile rule for CONFIG_PCIE_TEGRA194, so
pcie-tegra.c can no longer be built as a module. Restore that rule.
- 7f10074474 added "#ifdef CONFIG_PCIE_TEGRA194" around the native
driver, but that's only set when the driver is built-in (for a module,
CONFIG_PCIE_TEGRA194_MODULE is defined).
The ACPI quirk is completely independent of the rest of the native
driver, so move the quirk to its own file and remove the #ifdef in the
native driver.
- 7f10074474 added symbols that are always defined but used only when
CONFIG_PCIEASPM, which causes warnings when CONFIG_PCIEASPM is not set:
drivers/pci/controller/dwc/pcie-tegra194.c:259:18: warning: ‘event_cntr_data_offset’ defined but not used [-Wunused-const-variable=]
drivers/pci/controller/dwc/pcie-tegra194.c:250:18: warning: ‘event_cntr_ctrl_offset’ defined but not used [-Wunused-const-variable=]
drivers/pci/controller/dwc/pcie-tegra194.c:243:27: warning: ‘pcie_gen_freq’ defined but not used [-Wunused-const-variable=]
Fixes: 7f10074474 ("PCI: tegra: Add Tegra194 MCFG quirks for ECAM errata")
Link: https://lore.kernel.org/r/20210610064134.336781-1-jonathanh@nvidia.com
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Alexandru and Qu reported this resource allocation failure on ROCKPro64 v2
and ROCK Pi 4B, both based on the RK3399:
pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff 64bit]
pci 0000:00:00.0: PCI bridge to [bus 01]
pci 0000:00:00.0: BAR 14: no space for [mem size 0x00100000]
pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
"BAR 14" is the PCI bridge's 32-bit non-prefetchable window, and our PCI
allocation code isn't smart enough to allocate it in a host bridge window
marked as 64-bit, even though this should work fine.
A DT host bridge description includes the windows from the CPU address
space to the PCI bus space. On a few architectures (microblaze, powerpc,
sparc), the DT may also describe PCI devices themselves, including their
BARs.
Before 9d57e61bf7 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for
64-bit memory addresses"), of_bus_pci_get_flags() ignored the fact that
some DT addresses described 64-bit windows and BARs. That was a problem
because the virtio virtual NIC has a 32-bit BAR and a 64-bit BAR, and the
driver couldn't distinguish them.
9d57e61bf7 set IORESOURCE_MEM_64 for those 64-bit DT ranges, which fixed
the virtio driver. But it also set IORESOURCE_MEM_64 for host bridge
windows, which exposed the fact that the PCI allocator isn't smart enough
to put 32-bit resources in those 64-bit windows.
Clear IORESOURCE_MEM_64 from host bridge windows since we don't need that
information.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Fixes: 9d57e61bf7 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses")
Link: https://lore.kernel.org/r/20210614230457.752811-1-punitagrawal@gmail.com
Reported-at: https://lore.kernel.org/lkml/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com/
Reported-at: https://lore.kernel.org/lkml/YMyTUv7Jsd89PGci@m4/T/#u
Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reported-by: Qu Wenruo <wqu@suse.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
To allow for iclog IO device cache flush behaviour to be optimised,
we first need to separate out the commit record iclog IO from the
rest of the checkpoint so we can wait for the checkpoint IO to
complete before we issue the commit record.
This separation is only necessary if the commit record is being
written into a different iclog to the start of the checkpoint as the
upcoming cache flushing changes requires completion ordering against
the other iclogs submitted by the checkpoint.
If the entire checkpoint and commit is in the one iclog, then they
are both covered by the one set of cache flush primitives on the
iclog and hence there is no need to separate them for ordering.
Otherwise, we need to wait for all the previous iclogs to complete
so they are ordered correctly and made stable by the REQ_PREFLUSH
that the commit record iclog IO issues. This guarantees that if a
reader sees the commit record in the journal, they will also see the
entire checkpoint that commit record closes off.
This also provides the guarantee that when the commit record IO
completes, we can safely unpin all the log items in the checkpoint
so they can be written back because the entire checkpoint is stable
in the journal.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
On 32-bit (e.g. m68k):
ERROR: modpost: "__udivdi3" [fs/xfs/xfs.ko] undefined!
Fix this by using a uint32_t intermediate, like before.
Reported-by: noreply@ellerman.id.au
Fixes: 7660a5b48fbef958 ("xfs: log stripe roundoff is a property of the log")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
If task_state is cleared, io_req_task_work_add() will go the slow path
adding a task_work, setting the task_state, waking up the task and so
on. Not to mention it's expensive. tctx_task_work() first clears the
state and then executes all the work items queued, so if any of them
resubmits or adds new task_work items, it would unnecessarily go through
the slow path of io_req_task_work_add().
Let's clear the ->task_state at the end. We still have to check
->task_list for emptiness afterward to synchronise with
io_req_task_work_add(), do that, and set the state back if we're going
to retry, because clearing not-ours task_state on the next iteration
would be buggy.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ef72cdac7022adf0cd7ce4bfe3bb5c82a62eb93.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Entering tctx_task_work() with empty task_list is a strange scenario,
that can happen only on rare occasion during task exit, so let's not
check for task_list emptiness in advance and do it do-while style. The
code still correct for the empty case, just would do extra work about
which we don't care.
Do extra step and do the check before cond_resched(), so we don't
resched if have nothing to execute.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c4173e288e69793d03c7d7ce826f9d28afba718a.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We don't need a full copy of tctx->task_list in tctx_task_work(), but
only a first one, so just assign node directly.
Taking into account that task_works are run in a context of a task,
it's very unlikely to first see non-empty tctx->task_list and then
splice it empty, can only happen with task_work cancellations that is
not-normal slow path anyway. Hence, get rid of the check in the end,
it's there not for validity but "performance" purposes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d076c83fedb8253baf43acb23b8fafd7c5da1714.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
tctx_task_work() tries to fetch a next batch of requests, but before it
would flush completions from the previous batch that may be sub-optimal.
E.g. io_req_task_queue() executes a head of the link where all the
linked may be enqueued through the same io_req_task_queue(). And there
are more cases for that.
Do the flushing at the end, so it can cache completions of several waves
of a single tctx_task_work(), and do the flush at the very end.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3cac83934e4fbce520ff8025c3524398b3ae0270.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, if req->creds is not NULL, then there are creds assigned.
Track the invariant with a new flag in req->flags. No need to clear the
field at init, and also cleanup can be efficiently moved into
io_clean_op().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5f8baeb8d3b909487f555542350e2eac97005556.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We don't need to look at the xfs_mount and superblock every time we
need to do an iclog roundoff calculation. The property is fixed for
the life of the log, so store the roundoff in the log at mount time
and use that everywhere.
On a debug build:
$ size fs/xfs/xfs_log.o.*
text data bss dec hex filename
27360 560 8 27928 6d18 fs/xfs/xfs_log.o.orig
27219 560 8 27787 6c8b fs/xfs/xfs_log.o.patched
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
'error' will be initialized, so clean up the redundant initialization.
Cc: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Dan Carpenter's static checker reported:
The patch 7b13c51551: "xfs: use perag for ialloc btree cursors"
from Jun 2, 2021, leads to the following Smatch complaint:
fs/xfs/libxfs/xfs_ialloc.c:2403 xfs_imap()
error: we previously assumed 'pag' could be null (see line 2294)
And it's right. Fix it.
Fixes: 7b13c51551 ("xfs: use perag for ialloc btree cursors")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
xfs: Delay Ready Attributes
Hi all,
This set is a subset of a larger series for Dealyed Attributes. Which is a
subset of a yet larger series for parent pointers. Delayed attributes allow
attribute operations (set and remove) to be logged and committed in the same
way that other delayed operations do. This allows more complex operations (like
parent pointers) to be broken up into multiple smaller transactions. To do
this, the existing attr operations must be modified to operate as a delayed
operation. This means that they cannot roll, commit, or finish transactions.
Instead, they return -EAGAIN to allow the calling function to handle the
transaction. In this series, we focus on only the delayed attribute portion.
We will introduce parent pointers in a later set.
The set as a whole is a bit much to digest at once, so I usually send out the
smaller sub series to reduce reviewer burn out. But the entire extended series
is visible through the included github links.
Updates since v19: Added Darricks fix for the remote block accounting as well as
some minor nits about the default assert in xfs_attr_set_iter. Spent quite
a bit of time testing this cycle to weed out any more unexpected bugs. No new
test failures were observed with the addition of this set.
xfs: Fix default ASSERT in xfs_attr_set_iter
Replaced the assert with ASSERT(0);
xfs: Add delay ready attr remove routines
Added Darricks fix for remote block accounting
This series can be viewed on github here:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v20
As well as the extended delayed attribute and parent pointer series:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v20_extended
And the test cases:
https://github.com/allisonhenderson/xfs_work/tree/pptr_xfstestsv3
In order to run the test cases, you will need have the corresponding xfsprogs
changes as well. Which can be found here:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v20https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v20_extended
To run the xfs attributes tests run:
check -g attr
To run as delayed attributes run:
export MOUNT_OPTIONS="-o delattr"
check -g attr
To run parent pointer tests:
check -g parent
I've also made the corresponding updates to the user space side as well, and ported anything
they need to seat correctly.
Questions, comment and feedback appreciated!
Thanks all!
Allison
* tag 'xfs-delay-ready-attrs-v20.1' of https://github.com/allisonhenderson/xfs_work:
xfs: Make attr name schemes consistent
xfs: Fix default ASSERT in xfs_attr_set_iter
xfs: Clean up xfs_attr_node_addname_clear_incomplete
xfs: Remove xfs_attr_rmtval_set
xfs: Add delay ready attr set routines
xfs: Add delay ready attr remove routines
xfs: Hoist node transaction handling
xfs: Hoist xfs_attr_leaf_addname
xfs: Hoist xfs_attr_node_addname
xfs: Add helper xfs_attr_node_addname_find_attr
xfs: Separate xfs_attr_node_addname and xfs_attr_node_addname_clear_incomplete
xfs: Refactor xfs_attr_set_shortform
xfs: Add xfs_attr_node_remove_name
xfs: Reverse apply 72b97ea40d
The vmlinux ".BTF_ids" ELF section is declared in btf_ids.h to hold a list
of zero-filled BTF IDs, which is then patched at link-time with correct
values by resolv_btfids. The section is flagged as "allocable" to preclude
compression, but notably the section contents (BTF IDs) are untyped.
When patching the BTF IDs, resolve_btfids writes in host-native endianness
and relies on libelf for any required translation on reading and updating
vmlinux. However, since the type of the .BTF_ids section content defaults
to ELF_T_BYTE (i.e. unsigned char), no translation occurs. This results in
incorrect patched values when cross-compiling to non-native endianness,
and can manifest as kernel Oops and test failures which are difficult to
troubleshoot [1].
Explicitly set the type of patched data to ELF_T_WORD, the architecture-
neutral ELF type corresponding to the u32 BTF IDs. This enables libelf to
transparently perform any needed endian conversions.
Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frank Eigler <fche@redhat.com>
Cc: Mark Wielaard <mark@klomp.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/CAPGftE_eY-Zdi3wBcgDfkz_iOr1KF10n=9mJHm1_a_PykcsoeA@mail.gmail.com [1]
Link: https://lore.kernel.org/bpf/20210618061404.818569-1-Tony.Ambardar@gmail.com
Fix broken Tx ring validation for AF_XDP. The commit under the Fixes
tag, fixed an off-by-one error in the validation but introduced
another error. Descriptors are now let through even if they straddle a
chunk boundary which they are not allowed to do in aligned mode. Worse
is that they are let through even if they straddle the end of the umem
itself, tricking the kernel to read data outside the allowed umem
region which might or might not be mapped at all.
Fix this by reintroducing the old code, but subtract the length by one
to fix the off-by-one error that the original patch was
addressing. The test chunk != chunk_end makes sure packets do not
straddle chunk boundraries. Note that packets of zero length are
allowed in the interface, therefore the test if the length is
non-zero.
Fixes: ac31565c21 ("xsk: Fix for xp_aligned_validate_desc() when len == chunk_size")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20210618075805.14412-1-magnus.karlsson@gmail.com
Fix a missing validation of a Tx descriptor when executing in skb mode
and the umem is in unaligned mode. A descriptor could point to a
buffer straddling the end of the umem, thus effectively tricking the
kernel to read outside the allowed umem region. This could lead to a
kernel crash if that part of memory is not mapped.
In zero-copy mode, the descriptor validation code rejects such
descriptors by checking a bit in the DMA address that tells us if the
next page is physically contiguous or not. For the last page in the
umem, this bit is not set, therefore any descriptor pointing to a
packet straddling this last page boundary will be rejected. However,
the skb path does not use this bit since it copies out data and can do
so to two different pages. (It also does not have the array of DMA
address, so it cannot even store this bit.) The code just returned
that the packet is always physically contiguous. But this is
unfortunately also returned for the last page in the umem, which means
that packets that cross the end of the umem are being allowed, which
they should not be.
Fix this by introducing a check for this in the SKB path only, not
penalizing the zero-copy path.
Fixes: 2b43470add ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com
Without calling loop_config_discard() the discard flag and parameters
aren't set/updated for the loop device and worst-case they could
indicate discard support when it isn't the case (ex: if the
LOOP_SET_STATUS ioctl was used with a different file prior to
LOOP_CONFIGURE).
Cc: <stable@vger.kernel.org> # 5.8.x-
Fixes: 3448914e8c ("loop: Add LOOP_CONFIGURE ioctl")
Signed-off-by: Kristian Klausen <kristian@klausen.dk>
Link: https://lore.kernel.org/r/20210618115157.31452-1-kristian@klausen.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The insert_requests and dispatch_request elevator operations are
mandatory for the correct execution of an elevator, and all implemented
elevators (bfq, kyber and mq-deadline) implement them. As a result,
there is no need to check for these operations before calling them when
a queue has an elevator set. This simplifies the code in
__blk_mq_sched_dispatch_requests() and blk_mq_sched_insert_request().
To avoid out-of-tree elevators to crash the kernel in case of bad
implementation, add a check in elv_register() to verify that these
operations are implemented.
A small, probably not significant, IOPS improvement of 0.1% is observed
with this patch applied (4.117 MIOPS to 4.123 MIOPS, average of 20 fio
runs doing 4K random direct reads with psync and 32 jobs).
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210618015922.713999-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
tagset can't be used after blk_cleanup_queue() is returned because
freeing tagset usually follows blk_clenup_queue(). Commit d97e594c51
("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") adds
check on q->tag_set->flags in blk_mq_exit_sched(), and causes
use-after-free.
Fixes it by using hctx->flags.
Reported-by: syzbot+77ba3d171a25c56756ea@syzkaller.appspotmail.com
Fixes: d97e594c51 ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap")
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20210609063046.122843-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Two carveout reserved memory nodes each have been added for each of the
R5F remote processor devices within the MAIN domain on the TI AM642 EVM
and SK boards. These nodes are assigned to the respective rproc device
nodes as well. The first region will be used as the DMA pool for the rproc
devices, and the second region will furnish the static carveout regions
for the firmware memory.
An additional reserved memory node is also added to reserve a portion of
the DDR memory to be used for performing inter-processor communication
between all the remote processors running RTOS or baremetal firmwares.
8 MB of memory is reserved for this purpose, and this accounts for all
the vrings and vring buffers between all the possible pairs of remote
processors.
The current carveout addresses and sizes are defined statically for each
rproc device. The R5F processors do not have an MMU, and as such require
the exact memory used by the firmwares to be set-aside. The firmware
images do not require any RSC_CARVEOUT entries in their resource tables
to allocate the memory for firmware memory segments.
NOTE:
1. The R5F1 carveouts are needed only if the R5F cluster is running in
Split (non Single-CPU) mode. The reserved memory nodes can be disabled
later on if there is no use-case defined to use the corresponding
remote processor.
2. The AM64x SoCs do not have any DSPs and one less R5F cluster compared
to J721E SoCs. So, while the carveout memories reserved for the R5F
clusters present on the SoC match to those on J721E, the overall
memory map reserved for firmwares is quite different. The number of
R5F clusters on AM64x SoCs are same as on J7200 SoCs, but the AM64x
SoCs also have an additional M4F core, so the RTOS IPC memory region
is 1 MB higher than on J7200 SoCs.
Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Praneeth Bajjuri <praneeth@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20210615195718.15898-4-s-anna@ti.com
Add the required 'mboxes' property to all the R5F processors for the
TI AM642 EVM and SK boards. The mailboxes and some shared memory are
required for running the Remote Processor Messaging (RPMsg) stack
between the host processor and each of the R5Fs.
The chosen sub-mailboxes match the values used in the current firmware
images. This can be changed, if needed, as per the system integration
needs after making appropriate changes on the firmware side as well.
Note that any R5F Core1 resources are needed and used only when that
R5F cluster is configured for Split-mode.
Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Praneeth Bajjuri <praneeth@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20210615195718.15898-3-s-anna@ti.com
The AM64x SoCs have 2 dual-core Arm Cortex-R5F processor (R5FSS)
subsystems/clusters. Both the R5F clusters are present within the
MAIN domain (MAIN_R5FSS0 & MAIN_R5FSS1). Each of these can be
configured at boot time to be either run in a new "Single-CPU" mode
or in an Asymmetric Multi Processing (AMP) fashion in Split-mode.
The mode is restricted to "Single-CPU" on some devices with the
appropriate eFuse bit set, but the most common devices support both
modes. These subsystems have 64 KB each Tightly-Coupled Memory (TCM)
internal memories for each core split between two banks - ATCM and
BTCM (further interleaved into two banks). The TCMs of both Cores
are combined in Single-CPU mode to provide a larger 128 KB of memory.
The other notable difference is that the TCMs are spaced 1 MB apart
on these SoCs unlike the existing SoCs.
Add the DT nodes for both these MAIN domain R5F cluster/subsystems,
the two R5F cores are added as child nodes to each of the corresponding
R5F cluster node. Both the clusters are configured to run in Split mode
by default, with the ATCMs enabled to allow the R5 cores to execute
code from DDR with boot-strapping code from ATCM. The inter-processor
communication between the main A72 cores and these processors is
achieved through shared memory and Mailboxes.
The following firmware names are used by default for these cores, and
can be overridden in a board dts file if desired:
MAIN R5FSS0 Core0: am64-main-r5f0_0-fw (both in Single-CPU & Split modes)
MAIN R5FSS0 Core1: am64-main-r5f0_1-fw (needed only in Split mode)
MAIN R5FSS1 Core0: am64-main-r5f1_0-fw (both in Single-CPU & Split modes)
MAIN R5FSS1 Core1: am64-main-r5f1_1-fw (needed only in Split mode)
NOTE:
A R5FSS cluster can be configured in "Single-CPU" mode by using a
value of 2 for the "ti,cluster-mode" property. Value of 1 is not
permitted (fails the dtbs_check).
Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Praneeth Bajjuri <praneeth@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20210615195718.15898-2-s-anna@ti.com
The tprot() inline asm temporarily changes the program check new psw
to redirect a potential program check on the diag instruction.
Restoring of the program check new psw is done in C code behind the
inline asm.
This can be problematic, especially if the function is inlined, since
the compiler can reorder instructions in such a way that a different
instruction, which may result in a program check, might be executed
before the program check new psw has been restored.
To avoid such a scenario move restoring into the inline asm. For
consistency reasons move also saving of the original program check new
psw into the inline asm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>