VFIO updates for v6.1-rc1

- Prune private items from vfio_pci_core.h to a new internal header,
    fix missed function rename, and refactor vfio-pci interrupt defines.
    (Jason Gunthorpe)
 
  - Create consistent naming and handling of ioctls with a function per
    ioctl for vfio-pci and vfio group handling, use proper type args
    where available. (Jason Gunthorpe)
 
  - Implement a set of low power device feature ioctls allowing userspace
    to make use of power states such as D3cold where supported.
    (Abhishek Sahu)
 
  - Remove device counter on vfio groups, which had restricted the page
    pinning interface to singleton groups to account for limitations in
    the type1 IOMMU backend.  Document usage as limited to emulated IOMMU
    devices, ie. traditional mdev devices where this restriction is
    consistent.  (Jason Gunthorpe)
 
  - Correct function prefix in hisi_acc driver incurred during previous
    refactoring. (Shameer Kolothum)
 
  - Correct typo and remove redundant warning triggers in vfio-fsl driver.
    (Christophe JAILLET)
 
  - Introduce device level DMA dirty tracking uAPI and implementation in
    the mlx5 variant driver (Yishai Hadas & Joao Martins)
 
  - Move much of the vfio_device life cycle management into vfio core,
    simplifying and avoiding duplication across drivers.  This also
    facilitates adding a struct device to vfio_device which begins the
    introduction of device rather than group level user support and fills
    a gap allowing userspace identify devices as vfio capable without
    implicit knowledge of the driver. (Kevin Tian & Yi Liu)
 
  - Split vfio container handling to a separate file, creating a more
    well defined API between the core and container code, masking IOMMU
    backend implementation from the core, allowing for an easier future
    transition to an iommufd based implementation of the same.
    (Jason Gunthorpe)
 
  - Attempt to resolve race accessing the iommu_group for a device
    between vfio releasing DMA ownership and removal of the device from
    the IOMMU driver.  Follow-up with support to allow vfio_group to
    exist with NULL iommu_group pointer to support existing userspace
    use cases of holding the group file open.  (Jason Gunthorpe)
 
  - Fix error code and hi/lo register manipulation issues in the hisi_acc
    variant driver, along with various code cleanups. (Longfang Liu)
 
  - Fix a prior regression in GVT-g group teardown, resulting in
    unreleased resources. (Jason Gunthorpe)
 
  - A significant cleanup and simplification of the mdev interface,
    consolidating much of the open coded per driver sysfs interface
    support into the mdev core. (Christoph Hellwig)
 
  - Simplification of tracking and locking around vfio_groups that
    fall out from previous refactoring. (Jason Gunthorpe)
 
  - Replace trivial open coded f_ops tests with new helper.
    (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmNGz2AbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiatYQAI+7bFjVsTKwCnWUhp/A
 WnFmLpnh/OsBIYiXRbXGZBgIO4iPmMyFkxqjnv6e8H1WnKhLbuPy/xCaAvPrtI8b
 YKCpzdrDnfrPfB4+0cyGLJx15Jqd3sOZy097kl2lQJTscELTjJxTl0uB/Fbf/s38
 t1K2nIhBm+sGK3rTf3JjY4Jc7vDbwX7HQt6rUVEbd3NoyLJV1T/HdeSgwSMdyiED
 WwkRZ0z/vU0hEDk5wk1ZyltkiUzdCSws3C8T0J39xRObPLHR1vYgKO8aeZhfQb4p
 luD1fzGRMt3JinSXCPPm5HfADXq2Rozx7Y7a454fvCa7lpX4MNAgaQdfIzI64lZj
 cMgSYAIskVq4vxCkO4bKec4FYrzJoxBMJwiXZvOZ4mF5SL4UIDwerMqQTA3fvtQ+
 puS6x+/DF9XXHrEewEX7teg6QYPQueneSS+fWeFpMGzDXSjdQB6qV+rMWS297t+4
 1KyITxkOxcZQ4+j1OLPGtxsRLKtWApawoNTpRMlaD+hSExxHLbUmKexOLXzuAoVP
 nhbjud+jzEbpCnwps24Og/iEBdRYJcl2KwEeSRPI856YRDrNa9jPtiDlsAtKZOK2
 gJnOixSss6R+wgVVYIyMDZ8tsvO+UDQruvqQ2kFku1FOlO86pvwD6UUVuTVosdNc
 fktw6Dx90N3fdb/o8jjAjssx
 =Z8+P
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Prune private items from vfio_pci_core.h to a new internal header,
   fix missed function rename, and refactor vfio-pci interrupt defines
   (Jason Gunthorpe)

 - Create consistent naming and handling of ioctls with a function per
   ioctl for vfio-pci and vfio group handling, use proper type args
   where available (Jason Gunthorpe)

 - Implement a set of low power device feature ioctls allowing userspace
   to make use of power states such as D3cold where supported (Abhishek
   Sahu)

 - Remove device counter on vfio groups, which had restricted the page
   pinning interface to singleton groups to account for limitations in
   the type1 IOMMU backend. Document usage as limited to emulated IOMMU
   devices, ie. traditional mdev devices where this restriction is
   consistent (Jason Gunthorpe)

 - Correct function prefix in hisi_acc driver incurred during previous
   refactoring (Shameer Kolothum)

 - Correct typo and remove redundant warning triggers in vfio-fsl driver
   (Christophe JAILLET)

 - Introduce device level DMA dirty tracking uAPI and implementation in
   the mlx5 variant driver (Yishai Hadas & Joao Martins)

 - Move much of the vfio_device life cycle management into vfio core,
   simplifying and avoiding duplication across drivers. This also
   facilitates adding a struct device to vfio_device which begins the
   introduction of device rather than group level user support and fills
   a gap allowing userspace identify devices as vfio capable without
   implicit knowledge of the driver (Kevin Tian & Yi Liu)

 - Split vfio container handling to a separate file, creating a more
   well defined API between the core and container code, masking IOMMU
   backend implementation from the core, allowing for an easier future
   transition to an iommufd based implementation of the same (Jason
   Gunthorpe)

 - Attempt to resolve race accessing the iommu_group for a device
   between vfio releasing DMA ownership and removal of the device from
   the IOMMU driver. Follow-up with support to allow vfio_group to exist
   with NULL iommu_group pointer to support existing userspace use cases
   of holding the group file open (Jason Gunthorpe)

 - Fix error code and hi/lo register manipulation issues in the hisi_acc
   variant driver, along with various code cleanups (Longfang Liu)

 - Fix a prior regression in GVT-g group teardown, resulting in
   unreleased resources (Jason Gunthorpe)

 - A significant cleanup and simplification of the mdev interface,
   consolidating much of the open coded per driver sysfs interface
   support into the mdev core (Christoph Hellwig)

 - Simplification of tracking and locking around vfio_groups that fall
   out from previous refactoring (Jason Gunthorpe)

 - Replace trivial open coded f_ops tests with new helper (Alex
   Williamson)

* tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits)
  vfio: More vfio_file_is_group() use cases
  vfio: Make the group FD disassociate from the iommu_group
  vfio: Hold a reference to the iommu_group in kvm for SPAPR
  vfio: Add vfio_file_is_group()
  vfio: Change vfio_group->group_rwsem to a mutex
  vfio: Remove the vfio_group->users and users_comp
  vfio/mdev: add mdev available instance checking to the core
  vfio/mdev: consolidate all the description sysfs into the core code
  vfio/mdev: consolidate all the available_instance sysfs into the core code
  vfio/mdev: consolidate all the name sysfs into the core code
  vfio/mdev: consolidate all the device_api sysfs into the core code
  vfio/mdev: remove mtype_get_parent_dev
  vfio/mdev: remove mdev_parent_dev
  vfio/mdev: unexport mdev_bus_type
  vfio/mdev: remove mdev_from_dev
  vfio/mdev: simplify mdev_type handling
  vfio/mdev: embedd struct mdev_parent in the parent data structure
  vfio/mdev: make mdev.h standalone includable
  drm/i915/gvt: simplify vgpu configuration management
  drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types
  ...
This commit is contained in:
Linus Torvalds 2022-10-12 14:46:48 -07:00
commit d3cf405133
51 changed files with 4940 additions and 2773 deletions

View file

@ -986,6 +986,148 @@ enum vfio_device_mig_state {
VFIO_DEVICE_STATE_RUNNING_P2P = 5,
};
/*
* Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
* state with the platform-based power management. Device use of lower power
* states depends on factors managed by the runtime power management core,
* including system level support and coordinating support among dependent
* devices. Enabling device low power entry does not guarantee lower power
* usage by the device, nor is a mechanism provided through this feature to
* know the current power state of the device. If any device access happens
* (either from the host or through the vfio uAPI) when the device is in the
* low power state, then the host will move the device out of the low power
* state as necessary prior to the access. Once the access is completed, the
* device may re-enter the low power state. For single shot low power support
* with wake-up notification, see
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. Access to mmap'd
* device regions is disabled on LOW_POWER_ENTRY and may only be resumed after
* calling LOW_POWER_EXIT.
*/
#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3
/*
* This device feature has the same behavior as
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user
* provides an eventfd for wake-up notification. When the device moves out of
* the low power state for the wake-up, the host will not allow the device to
* re-enter a low power state without a subsequent user call to one of the low
* power entry device feature IOCTLs. Access to mmap'd device regions is
* disabled on LOW_POWER_ENTRY_WITH_WAKEUP and may only be resumed after the
* low power exit. The low power exit can happen either through LOW_POWER_EXIT
* or through any other access (where the wake-up notification has been
* generated). The access to mmap'd device regions will not trigger low power
* exit.
*
* The notification through the provided eventfd will be generated only when
* the device has entered and is resumed from a low power state after
* calling this device feature IOCTL. A device that has not entered low power
* state, as managed through the runtime power management core, will not
* generate a notification through the provided eventfd on access. Calling the
* LOW_POWER_EXIT feature is optional in the case where notification has been
* signaled on the provided eventfd that a resume from low power has occurred.
*/
struct vfio_device_low_power_entry_with_wakeup {
__s32 wakeup_eventfd;
__u32 reserved;
};
#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4
/*
* Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states as
* previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features.
* This device feature IOCTL may itself generate a wakeup eventfd notification
* in the latter case if the device had previously entered a low power state.
*/
#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5
/*
* Upon VFIO_DEVICE_FEATURE_SET start/stop device DMA logging.
* VFIO_DEVICE_FEATURE_PROBE can be used to detect if the device supports
* DMA logging.
*
* DMA logging allows a device to internally record what DMAs the device is
* initiating and report them back to userspace. It is part of the VFIO
* migration infrastructure that allows implementing dirty page tracking
* during the pre copy phase of live migration. Only DMA WRITEs are logged,
* and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
*
* When DMA logging is started a range of IOVAs to monitor is provided and the
* device can optimize its logging to cover only the IOVA range given. Each
* DMA that the device initiates inside the range will be logged by the device
* for later retrieval.
*
* page_size is an input that hints what tracking granularity the device
* should try to achieve. If the device cannot do the hinted page size then
* it's the driver choice which page size to pick based on its support.
* On output the device will return the page size it selected.
*
* ranges is a pointer to an array of
* struct vfio_device_feature_dma_logging_range.
*
* The core kernel code guarantees to support by minimum num_ranges that fit
* into a single kernel page. User space can try higher values but should give
* up if the above can't be achieved as of some driver limitations.
*
* A single call to start device DMA logging can be issued and a matching stop
* should follow at the end. Another start is not allowed in the meantime.
*/
struct vfio_device_feature_dma_logging_control {
__aligned_u64 page_size;
__u32 num_ranges;
__u32 __reserved;
__aligned_u64 ranges;
};
struct vfio_device_feature_dma_logging_range {
__aligned_u64 iova;
__aligned_u64 length;
};
#define VFIO_DEVICE_FEATURE_DMA_LOGGING_START 6
/*
* Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
* by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
*/
#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 7
/*
* Upon VFIO_DEVICE_FEATURE_GET read back and clear the device DMA log
*
* Query the device's DMA log for written pages within the given IOVA range.
* During querying the log is cleared for the IOVA range.
*
* bitmap is a pointer to an array of u64s that will hold the output bitmap
* with 1 bit reporting a page_size unit of IOVA. The mapping of IOVA to bits
* is given by:
* bitmap[(addr - iova)/page_size] & (1ULL << (addr % 64))
*
* The input page_size can be any power of two value and does not have to
* match the value given to VFIO_DEVICE_FEATURE_DMA_LOGGING_START. The driver
* will format its internal logging to match the reporting page size, possibly
* by replicating bits if the internal page size is lower than requested.
*
* The LOGGING_REPORT will only set bits in the bitmap and never clear or
* perform any initialization of the user provided bitmap.
*
* If any error is returned userspace should assume that the dirty log is
* corrupted. Error recovery is to consider all memory dirty and try to
* restart the dirty tracking, or to abort/restart the whole migration.
*
* If DMA logging is not enabled, an error will be returned.
*
*/
struct vfio_device_feature_dma_logging_report {
__aligned_u64 iova;
__aligned_u64 length;
__aligned_u64 page_size;
__aligned_u64 bitmap;
};
#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 8
/* -------- API for Type1 VFIO IOMMU -------- */
/**