div_u64_rem provides the result of the division and additionally the
remainder; don't use this function to solely calculate the remainder
while calculating the division again with div_u64.
A similar improvement was applied earlier to the 10nm pll in
5c191fef4c ("drm/msm/dsi_pll_10nm: Fix dividing the same numbers
twice").
Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org>
Reviewed-By: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20211011201642.167700-1-marijn.suijten@somainline.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
[why]
vsync_cnt atomic counter increments for every hw vsync. On the other
hand, frame count is a register that increments when the frame gets
actually pushed out. We cannnot read this register whenever the timing
engine is off, but vblank counter should still return a valid number.
This behavior also matches the downstream driver.
[How]
Read the encoder vsync count instead of the dpu_encoder_phys frame
count.
Suggested-by: Abhinav Kumar <abhinavk@codeaurora.org>
CC: Rob Clark <robdclark@chromium.org>
Signed-off-by: Mark Yacoub <markyacoub@chromium.org>
Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org>
Link: https://lore.kernel.org/r/20210830181359.124267-1-markyacoub@chromium.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
As dpu_core_irq was merged into dpu_hw_intr, merge data structures too,
removing the need for a separate data structure.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org>
Link: https://lore.kernel.org/r/20210617222029.463045-4-dmitry.baryshkov@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
We already clear the IRQ status register before processing IRQs, so do
not clear the register again. Especially do not clear the IRQ status
_after_ processing the IRQ as this way we can loose the event.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org>
Link: https://lore.kernel.org/r/20210617222029.463045-3-dmitry.baryshkov@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
With dpu_core_irq being the wrapper around dpu_hw_interrupts, there is
little sense in having them separate. Squash them together to remove
another layer of abstraction (hw_intr ops).
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org>
Link: https://lore.kernel.org/r/20210617222029.463045-2-dmitry.baryshkov@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Modernize how vfio is creating the group char dev and sysfs presence.
These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime.
This API requires the driver to put the struct device and struct cdev
inside its state struct (vfio_group), and then use the usual
device_initialize()/cdev_device_add()/cdev_device_del() sequence.
Split the code to make this possible:
- vfio_group_alloc()/vfio_group_release() are pair'd functions to
alloc/free the vfio_group. release is done under the struct device
kref.
- vfio_create_group()/vfio_group_put() are pairs that manage the
sysfs/cdev lifetime. Once the uses count is zero the vfio group's
userspace presence is destroyed.
- The IDR is replaced with an IDA. container_of(inode->i_cdev)
is used to get back to the vfio_group during fops open. The IDA
assigns unique minor numbers.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The next patch adds a struct device to the struct vfio_group, and it is
confusing/bad practice to have two krefs in the same struct. This kref is
controlling the period when the vfio_group is registered in sysfs, and
visible in the internal lookup. Switch it to a refcount_t instead.
The refcount_dec_and_mutex_lock() is still required because we need
atomicity of the list searches and sysfs presence.
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If vfio_create_group() searches the group list and returns an already
existing group it does not put back the iommu_group reference that the
caller passed in.
Change the semantic of vfio_create_group() to not move the reference in
from the caller, but instead obtain a new reference inside and leave the
caller's reference alone. The two callers must now call iommu_group_put().
This is an unlikely race as the only caller that could hit it has already
searched the group list before attempting to create the group.
Fixes: cba3345cc4 ("vfio: VFIO core")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so
that vfio_create_group() can call it to consolidate this duplicated code.
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
iommu_group_register_notifier()/iommu_group_unregister_notifier() are
built using a blocking_notifier_chain which integrates a rwsem. The
notifier function cannot be running outside its registration.
When considering how the notifier function interacts with create/destroy
of the group there are two fringe cases, the notifier starts before
list_add(&vfio.group_list) and the notifier runs after the kref
becomes 0.
Prior to vfio_create_group() unlocking and returning we have
container_users == 0
device_list == empty
And this cannot change until the mutex is unlocked.
After the kref goes to zero we must also have
container_users == 0
device_list == empty
Both are required because they are balanced operations and a 0 kref means
some caller became unbalanced. Add the missing assertion that
container_users must be zero as well.
These two facts are important because when checking each operation we see:
- IOMMU_GROUP_NOTIFY_ADD_DEVICE
Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
0 container_users ends the call
- IOMMU_GROUP_NOTIFY_BOUND_DRIVER
0 container_users ends the call
Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
items from the unbound list. During creation this list is empty, during
kref == 0 nothing can read this list, and it will be freed soon.
Since the vfio_group_release() doesn't hold the appropriate lock to
manipulate the unbound_list and could race with the notifier, move the
cleanup to directly before the kfree.
This allows deleting all of the deferred group put code.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Hi Mark
We already have Audio-Graph-Card which is Of-Graph base general sound
card driver. Basically it supports basic CPU-Codec connection, and is
also supporting DPCM connection. Because it was forcibly expanded to
DPCM, DT parsing is very limited and very difficult to add new features
on it, for example Multi-CPU/Codec support, Codec2Codec support, etc.
This patch adds more flexible new Audio-Graph-Card2 driver for it.
Audio-Graph-Card and Audio-Graph-Card2 are similar, but don't have
full compatibility.
The reason why I need Audio-Graph-Card2 instead of updating Audio-Graph-Card
is that it is very difficult to keep compatibility.
Audio-Graph-Card2 supports Normal/DPCM/Codec2Codec Connection wich
Single/Multi DAIs. And it is possible to Customizing.
This patch-set adds Audio-Graph-Card2 driver and its custom driver
sample, and DT settings sample which can be used for testing.
To enable testing/debuging, this patch-set also adds Test-Component
driver. We already have Dummy Component and/or Dummy DAI on soc-utils,
but 1) we can't use it from DT, 2) it do nothing.
Added new Test-Component can be used from DT, and it can indicate called
function name. We can use it to trace callback order, understanding
ALSA SoC behavior, etc, etc...
Sample DT settings of Audio Graph Card2 is using Test-Component as CPU/Codec DAI.
You can easily try to use/test it if you added below line to your DT file.
Your .config needs to have below CONFIGs to use/test it.
It will probe sample Sound Card which has Normal/DPCM/Multi/Codec2Codec
connections.
#include "../../../../../sound/soc/generic/audio-graph-card2-custom-sample.dtsi"
CONFIG_SND_AUDIO_GRAPH_CARD2
CONFIG_SND_AUDIO_GRAPH_CARD2_CUSTOM_SAMPLE
CONFIG_SND_TEST_COMPONENT
Because Audio Graph Card2 is still under experimental stage, it will
indicate such warning when probing, and the DT might be updated/exchanged.
It can use Codec2Codec, but it will start automatically when probed,
and can't stop it so far. It should be updated.
Link: https://lore.kernel.org/r/87k0xszlep.wl-kuninori.morimoto.gx@renesas.com
Link: https://lore.kernel.org/r/871r8u4s6q.wl-kuninori.morimoto.gx@renesas.com
Link: https://lore.kernel.org/r/87a6mhwyqn.wl-kuninori.morimoto.gx@renesas.com
Link: https://lore.kernel.org/r/87tuitusy4.wl-kuninori.morimoto.gx@renesas.com
Link: https://lore.kernel.org/r/87a6jn56x0.wl-kuninori.morimoto.gx@renesas.com
v1 -> v2
- don't use "port" base for_each loop
v2 -> v3
- Rename audio-graph-card2 to rich-graph-card
- Rename DSP to DPCM not to confuse
- Normal/DPCM/Codec2Codec can use Single/Multi DAIs.
- use dpcm/multi/codec2codec node instead of using extra compatible
- Sample DTSI patch is separated to Single/Multi.
v3 -> v4
- Rename rich-graph-card to audio-graph-card2
- fixup custom sample driver's connection bug
- test-component compatible uses "verbose" instead of "vv"
v4 -> v5
- tidyup git-log comment at
- tidyup Custom Sample comment
Kuninori Morimoto (16):
ASoC: test-component: add Test Component YAML bindings
ASoC: test-component: add Test Component for Sound debug/test
ASoC: simple-card-utils: add asoc_graph_is_ports0()
ASoC: simple-card-utils: add codec2codec support
ASoC: add Audio Graph Card2 driver
ASoC: audio-graph-card2: add Multi CPU/Codec support
ASoC: audio-graph-card2: add DPCM support
ASoC: audio-graph-card2: add Codec2Codec support
ASoC: add Audio Graph Card2 Yaml Document
ASoC: add Audio Graph Card2 Custom Sample
ASoC: audio-graph-card2-custom-sample.dtsi: add Sample DT for Normal (Single)
ASoC: audio-graph-card2-custom-sample.dtsi: add Sample DT for Normal (Nulti)
ASoC: audio-graph-card2-custom-sample.dtsi: add DPCM sample (Single)
ASoC: audio-graph-card2-custom-sample.dtsi: add DPCM sample (Multi)
ASoC: audio-graph-card2-custom-sample.dtsi: add Codec2Codec sample (Single)
ASoC: audio-graph-card2-custom-sample.dtsi: add Codec2Codec sample (Multi)
.../bindings/sound/audio-graph-card2.yaml | 57 +
.../bindings/sound/test-component.yaml | 33 +
include/sound/graph_card.h | 21 +
include/sound/simple_card_utils.h | 4 +
sound/soc/generic/Kconfig | 20 +
sound/soc/generic/Makefile | 6 +
.../generic/audio-graph-card2-custom-sample.c | 183 +++
.../audio-graph-card2-custom-sample.dtsi | 227 +++
sound/soc/generic/audio-graph-card2.c | 1281 +++++++++++++++++
sound/soc/generic/simple-card-utils.c | 46 +-
sound/soc/generic/test-component.c | 659 +++++++++
11 files changed, 2536 insertions(+), 1 deletion(-)
create mode 100644 Documentation/devicetree/bindings/sound/audio-graph-card2.yaml
create mode 100644 Documentation/devicetree/bindings/sound/test-component.yaml
create mode 100644 sound/soc/generic/audio-graph-card2-custom-sample.c
create mode 100644 sound/soc/generic/audio-graph-card2-custom-sample.dtsi
create mode 100644 sound/soc/generic/audio-graph-card2.c
create mode 100644 sound/soc/generic/test-component.c
--
2.25.1
Commit 3e41a317ae ("PCI/AER: Remove unused aer_error_resume()")
removed the resume err_handler from AER. Since no other port service
implements the callback, support for it can be removed from portdrv.
It can be revived later if need be, preferably by re-using the
pcie_port_device_iter() iterator.
Link: https://lore.kernel.org/r/25334149b604e005058aeb0fdf51e01f991d5d74.1627638184.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
Fixes: 6c2b374d74 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de
Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch@kernel.org>
Power-on reset after the insertion of a battery does not always complete
successfully, leading to corrupted register content. The EXT_TEST bit
will stop the clock from running, but currently the driver will never
recover.
Safely handle the erroneous state by clearing EXT_TEST as part of the
usual set_time method.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211015111208.1757110-1-phil@raspberrypi.com
TQ-Systems' TQMa8Mx module (SoM) uses a pcf85063 as RTC. The default output
is 32768Hz. This is to provide the i.MX8M CKIL clock. Once the RTC driver
is probed, the clock is disabled and all i.MX8M functionality depending on
the 32 KHz clock will halt. In our case the whole system halts and a power
cycle is required.
Referencing the pcf85063 directly results in a deadlock. The kernel
will see, that i.MX8M system clock needs the RTC clock and do probe
deferral. But the i.MX8M I2C module never becomes usable without the
i.MX8M CKIL clock and thus the RTC's clock will not be probed. So
from the kernel's perspective this is a chicken-and-egg problem.
Technically everything is fine by not touching anything, since
the RTC clock correctly enables the clock on reset (i.e. on
battery backup power loss).
A workaround for this issue is describing the square wave pin
as fixed-clock, which is registered early and basically how
this pin is used on the i.MX8M.
This addresses the exact same issue as in commit f765e349c3 ("rtc:
m41t80: add support for fixed clock").
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
[Fixed return value 0 -> NULL]
Link: https://lore.kernel.org/r/20211013074954.997445-1-alexander.stein@ew.tq-group.com
Do not call rv3032_exit_eerd() if the enter function fails but don't
forget to call the exit when the enter succeeds.
Fixes: 2eeaa532ac ("rtc: rv3032: Add a driver for Microcrystal RV-3032")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211012101028.GT2083@kadam
I got a null-ptr-deref report when doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strcmp+0xc/0x20
Call Trace:
__devm_rtc_register_device.cold.7+0x16a/0x2df
rv3029_probe+0x4b1/0x770 [rtc_rv3029c2]
rv3029_i2c_probe+0x141/0x180 [rtc_rv3029c2]
i2c_device_probe+0xa07/0xbb0
really_probe+0x285/0xc30
If dev_set_name() fails, dev_name() is null, it causes null-ptr-deref,
we need check the return value of dev_set_name().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211012041629.2504158-1-yangyingliang@huawei.com
I got a null-ptr-deref report when doing fault injection test:
general protection fault, probably for non-canonical address 0xdffffc0000000022: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000110-0x0000000000000117]
RIP: 0010:device_del+0x132/0xdc0
Call Trace:
cdev_device_del+0x1a/0x80
devm_rtc_unregister_device+0x37/0x80
release_nodes+0xc3/0x3b0
If cdev_device_add() fails, 'dev->p' is not set, it causes
null-ptr-deref when calling cdev_device_del(). Registering
character device is optional, we don't return error code
here, so introduce a new flag 'RTC_NO_CDEV' to indicate
if it has character device, cdev_device_del() is called
when this bit is not set.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211011132114.3663509-1-yangyingliang@huawei.com
I don't know if that Solaris behavior matters any more or if it's still
possible to look up that bug ID any more. The XFS behavior's definitely
still relevant, though; any but the most recent XFS filesystems will
lose the top bits.
Reported-by: Frank S. Filz <ffilzlnx@mindspring.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
During the review I focused on stop the using of the "+"
to reference the newer platforms, but I forgot that we are
in a process of making things more clear and differentiate
graphics and display versions. So, let me to clean up this
a bit. Also, we don't need any version mentioned in the
config menu entry, only in the help.
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211015090916.82968-1-rodrigo.vivi@intel.com
Parallel submission create composite fences (dma_fence_array) for excl /
shared slots in objects. The I915_GEM_BUSY IOCTL checks these slots to
determine the busyness of the object. Prior to patch it only check if
the fence in the slot was a i915_request. Update the check to understand
composite fences and correctly report the busyness.
v2:
(Tvrtko)
- Remove duplicate BUILD_BUG_ON
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-24-matthew.brost@intel.com
If an object in the excl or shared slot is a composite fence from a
parallel submit and the current request in the conflict tracking is from
the same parallel context there is no need to enforce ordering as the
ordering is already implicit. Make the request conflict tracking
understand this by comparing a parallel submit's parent context and
skipping conflict insertion if the values match.
v2:
(John Harrison)
- Reword commit message
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-23-matthew.brost@intel.com
If an error occurs in the front end when multi-lrc requests are getting
generated we need to skip these in the backend but we still need to
emit the breadcrumbs seqno. An issues arises because with multi-lrc
breadcrumbs there is a handshake between the parent and children to make
forward progress. If all the requests are not present this handshake
doesn't work. To work around this, if multi-lrc request has an error we
skip the handshake but still emit the breadcrumbs seqno.
v2:
(John Harrison)
- Add comment explaining the skipping of the handshake logic
- Fix typos in the commit message
v3:
(John Harrison)
- Fix up some comments about the math to NOP the ring
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-22-matthew.brost@intel.com
Allow multiple batch buffers to be submitted in a single execbuf IOCTL
after a context has been configured with the 'set_parallel' extension.
The number batches is implicit based on the contexts configuration.
This is implemented with a series of loops. First a loop is used to find
all the batches, a loop to pin all the HW contexts, a loop to create all
the requests, a loop to submit (emit BB start, etc...) all the requests,
a loop to tie the requests to the VMAs they touch, and finally a loop to
commit the requests to the backend.
A composite fence is also created for the generated requests to return
to the user and to stick in dma resv slots.
No behavior from the existing IOCTL should be changed aside from when
throttling because the ring for a context is full. In this situation,
i915 will now wait while holding the object locks. This change was done
because the code is much simpler to wait while holding the locks and we
believe there isn't a huge benefit of dropping these locks. If this
proves false we can restructure the code to drop the locks during the
wait.
IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252
v2:
(Matthew Brost)
- Return proper error value if i915_request_create fails
v3:
(John Harrison)
- Add comment explaining create / add order loops + locking
- Update commit message explaining different in IOCTL behavior
- Line wrap some comments
- eb_add_request returns void
- Return -EINVAL rather triggering BUG_ON if cmd parser used
(Checkpatch)
- Check eb->batch_len[*current_batch]
v4:
(CI)
- Set batch len if passed if via execbuf args
- Call __i915_request_skip after __i915_request_commit
(Kernel test robot)
- Initialize rq to NULL in eb_pin_timeline
v5:
(John Harrison)
- Fix typo in comments near bb order loops
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com
For some users of multi-lrc, e.g. split frame, it isn't safe to preempt
mid BB. To safely enable preemption at the BB boundary, a handshake
between parent and child is needed, syncing the set of BBs at the
beginning and end of each batch. This is implemented via custom
emit_bb_start & emit_fini_breadcrumb functions and enabled by default if
a context is configured by set parallel extension.
Lastly, this patch updates the process descriptor to the correct size as
the memory used in the handshake is directly after the process
descriptor.
v2:
(John Harrison)
- Fix a few comments wording
- Add struture for parent page layout
v3:
(John Harrison)
- A structure for sync semaphore
- Use offsetof to calc address
- Update commit message
v4:
(John Harrison)
- Fix typos in comment explaining memory map of scratch page
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-20-matthew.brost@intel.com
Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.
IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252
v2:
(Daniel Vetter)
- Add IGT link and placeholder for media UMD link
v3:
(Kernel test robot)
- Fix warning in unpin engines call
(John Harrison)
- Reword a bunch of the kernel doc
v4:
(John Harrison)
- Add comment why perma-pin is done after setting gem context
- Update some comments / docs for proto contexts
v5:
(John Harrison)
- Rework perma-pin comment
- Add BUG_IN if context is pinned when setting gem context
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-17-matthew.brost@intel.com
Display the workqueue status in debugfs for GuC contexts that are in
parent-child relationship.
v2:
(John Harrison)
- Output number children in debugfs
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-16-matthew.brost@intel.com
Update context and full GPU reset to work with multi-lrc. The idea is
parent context tracks all the active requests inflight for itself and
its children. The parent context owns the reset replaying / canceling
requests as needed.
v2:
(John Harrison)
- Simply loop in find active request
- Add comments to find ative request / reset loop
v3:
(John Harrison)
- s/its'/its/g
- Fix comment when searching for active request
- Reorder if state in __guc_reset_context
v4:
(Kernel test robot)
- Delete unused is_multi_lrc function
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-15-matthew.brost@intel.com
The GuC must receive requests in the order submitted for contexts in a
parent-child relationship to function correctly. To ensure this, insert
a submit fence between the current request and last request submitted
for requests / contexts in a parent child relationship. This is
conceptually similar to a single timeline.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-14-matthew.brost@intel.com
Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.
v2:
(John Harrison)
- s/wqe/wqi
- Use FIELD_PREP macros
- Add GEM_BUG_ONs ensures length fits within field
- Add comment / white space to intel_guc_write_barrier
(Kernel test robot)
- Make need_tasklet a static function
v3:
(Docs)
- A comment for submission_stall_reason
v4:
(Kernel test robot)
- Initialize return value in bypass tasklt submit function
(John Harrison)
- Add comment near work queue defs
- Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2
- Update write_barrier comment to talk about work queue
v5:
(John Harrison)
- Fix typo in work queue comment
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-13-matthew.brost@intel.com
Parallel contexts are perma-pinned by the upper layers which makes the
backend implementation rather simple. The parent pins the guc_id and
children increment the parent's pin count on pin to ensure all the
contexts are unpinned before we disable scheduling with the GuC / or
deregister the context.
v2:
(Daniel Vetter)
- Perma-pin parallel contexts
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-12-matthew.brost@intel.com
Assign contexts in parent-child relationship consecutive guc_ids. This
is accomplished by partitioning guc_id space between ones that need to
be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
available guc_ids). The consecutive search is implemented via the bitmap
API.
This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
when using the GuC multi-lrc interface.
v2:
(Daniel Vetter)
- Explicitly state why we assign consecutive guc_ids
v3:
(John Harrison)
- Bring back in spin lock
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-11-matthew.brost@intel.com
In GuC parent-child contexts the parent context controls the scheduling,
ensure only the parent does the scheduling operations.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-10-matthew.brost@intel.com
Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.
v2:
(John Harrison)
- Move GuC specific fields into sub-struct
- Clean up WQ defines
- Add comment explaining math to derive WQ / PD address
v3:
(John Harrison)
- Add PARENT_SCRATCH_SIZE define
- Update comment explaining multi-lrc register
v4:
(John Harrison)
- Move PARENT_SCRATCH_SIZE to common file
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-9-matthew.brost@intel.com
Introduce context parent-child relationship. Once this relationship is
created all pinning / unpinning operations are directed to the parent
context. The parent context is responsible for pinning all of its
children and itself.
This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - a single H2G is used
register / deregister all of the contexts simultaneously.
Subsequent patches in the series will implement the pinning / unpinning
operations for parent / child contexts.
v2:
(Daniel Vetter)
- Add kernel doc, add wrapper to access parent to ensure safety
v3:
(John Harrison)
- Fix comment explaing GEM_BUG_ON in to_parent()
- Make variable names generic (non-GuC specific)
v4:
(John Harrison)
- s/its'/its/g
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-8-matthew.brost@intel.com
Expose logical engine instance to user via query engine info IOCTL. This
is required for split-frame workloads as these needs to be placed on
engines in a logically contiguous order. The logical mapping can change
based on fusing. Rather than having user have knowledge of the fusing we
simply just expose the logical mapping with the existing query engine
info IOCTL.
IGT: https://patchwork.freedesktop.org/patch/445637/?series=92854&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252
v2:
(Daniel Vetter)
- Add IGT link, placeholder for media UMD
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-7-matthew.brost@intel.com
Add logical engine mapping. This is required for split-frame, as
workloads need to be placed on engines in a logically contiguous manner.
v2:
(Daniel Vetter)
- Add kernel doc for new fields
v3:
(Tvrtko)
- Update comment for new logical_mask field
v4:
(John Harrison)
- Update comment for new logical_mask field
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-6-matthew.brost@intel.com
Calling switch_to_kernel_context isn't needed if the engine PM reference
is taken while all user contexts are pinned as if don't have PM ref that
guarantees that all user contexts scheduling is disabled. By not calling
switch_to_kernel_context we save on issuing a request to the engine.
v2:
(Daniel Vetter)
- Add FIXME comment about pushing switch_to_kernel_context to backend
v3:
(John Harrison)
- Update commit message
- Fix workding comment
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-5-matthew.brost@intel.com
Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while any user context has scheduling enabled. Returning GT
idle when it is not can cause all sorts of issues throughout the stack.
v2:
(Daniel Vetter)
- Add might_lock annotations to pin / unpin function
v3:
(CI)
- Drop intel_engine_pm_might_put from unpin path as an async put is
used
v4:
(John Harrison)
- Make intel_engine_pm_might_get/put work with GuC virtual engines
- Update commit message
v5:
- Update commit message again
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-4-matthew.brost@intel.com
Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a deregister context H2G is in flight. To do this must
issue the deregister H2G from a worker as context can be destroyed from
an atomic context and taking GT PM ref blows up. Previously we took a
runtime PM from this atomic context which worked but will stop working
once runtime pm autosuspend in enabled.
So this patch is two fold, stop intel_gt_wait_for_idle from short
circuting and fix runtime pm autosuspend.
v2:
(John Harrison)
- Split structure changes out in different patch
(Tvrtko)
- Don't drop lock in deregister_destroyed_contexts
v3:
(John Harrison)
- Flush destroyed contexts before destroying context reg pool
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-3-matthew.brost@intel.com
Move guc_id allocation under submission state sub-struct as a future
patch will reuse the spin lock as a global submission state lock. Moving
this into sub-struct makes ownership of fields / lock clear.
v2:
(Docs)
- Add comment for submission_state sub-structure
v3:
(John Harrison)
- Fixup a few comments
v4:
(John Harrison)
- Fix typo
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-2-matthew.brost@intel.com
Struct pci_driver contains a struct device_driver, so for PCI devices, it's
easy to convert a device_driver * to a pci_driver * with to_pci_driver().
The device_driver * is in struct device, so we don't need to also keep
track of the pci_driver * in struct pci_dev.
Replace pdev->driver with to_pci_driver(). This is a step toward removing
pci_dev->driver.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20211004125935.2300113-11-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>