Commit graph

85453 commits

Author SHA1 Message Date
Christoph Hellwig
685a1537f4 vfio/mdev: consolidate all the description sysfs into the core code
Every driver just emits a string, simply add a method to the mdev_driver
to return it and provide a standard sysfs show function.

Remove the now unused types_attrs field in struct mdev_driver and the
support code for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-14-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
f2fbc72e6d vfio/mdev: consolidate all the available_instance sysfs into the core code
Every driver just print a number, simply add a method to the mdev_driver
to return it and provide a standard sysfs show function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-13-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
0bc79069cc vfio/mdev: consolidate all the name sysfs into the core code
Every driver just emits a static string, simply add a field to the
mdev_type for the driver to fill out or fall back to the sysfs name and
provide a standard sysfs show function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-12-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Jason Gunthorpe
290aac5df8 vfio/mdev: consolidate all the device_api sysfs into the core code
Every driver just emits a static string, simply feed it through the ops
and provide a standard sysfs show function.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-11-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
c7c1f38f6c vfio/mdev: remove mtype_get_parent_dev
Just open code the dereferences in the only user.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-10-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
062e720cd2 vfio/mdev: remove mdev_parent_dev
Just open code the dereferences in the only user.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-9-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
2815fe149f vfio/mdev: unexport mdev_bus_type
mdev_bus_type is only used in mdev.ko now, so unexport it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-8-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
cbf3bb28aa vfio/mdev: remove mdev_from_dev
Just open code it in the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-7-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
da44c340c4 vfio/mdev: simplify mdev_type handling
Instead of abusing struct attribute_group to control initialization of
struct mdev_type, just define the actual attributes in the mdev_driver,
allocate the mdev_type structures in the caller and pass them to
mdev_register_parent.

This allows the caller to use container_of to get at the containing
structure and thus significantly simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-6-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
89345d5177 vfio/mdev: embedd struct mdev_parent in the parent data structure
Simplify mdev_{un}register_device by requiring the caller to pass in
a structure allocate as part of the parent device structure.  This
removes the need for a list of parents and the separate mdev_parent
refcount as we can simplify rely on the reference to the parent device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-5-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
bdef2b7896 vfio/mdev: make mdev.h standalone includable
Include <linux/device.h> and <linux/uuid.h> so that users of this headers
don't need to do that and remove those includes that aren't needed
any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-4-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Yi Liu
3c28a76124 vfio: Add struct device to vfio_device
and replace kref. With it a 'vfio-dev/vfioX' node is created under the
sysfs path of the parent, indicating the device is bound to a vfio
driver, e.g.:

/sys/devices/pci0000\:6f/0000\:6f\:01.0/vfio-dev/vfio0

It is also a preparatory step toward adding cdev for supporting future
device-oriented uAPI.

Add Documentation/ABI/testing/sysfs-devices-vfio-dev.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220921104401.38898-16-kevin.tian@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21 14:15:11 -06:00
Kevin Tian
ebb72b765f vfio/ccw: Use the new device life cycle helpers
ccw is the only exception which cannot use vfio_alloc_device() because
its private device structure is designed to serve both mdev and parent.
Life cycle of the parent is managed by css_driver so vfio_ccw_private
must be allocated/freed in css_driver probe/remove path instead of
conforming to vfio core life cycle for mdev.

Given that use a wait/completion scheme so the mdev remove path waits
after vfio_put_device() until receiving a completion notification from
@release. The completion indicates that all active references on
vfio_device have been released.

After that point although free of vfio_ccw_private is delayed to
css_driver it's at least guaranteed to have no parallel reference on
released vfio device part from other code paths.

memset() in @probe is removed. vfio_device is either already cleared
when probed for the first time or cleared in @release from last probe.

The right fix is to introduce separate structures for mdev and parent,
but this won't happen in short term per prior discussions.

Remove vfio_init/uninit_group_dev() as no user now.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220921104401.38898-14-kevin.tian@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21 14:15:11 -06:00
Yi Liu
27aeb91559 vfio/hisi_acc: Use the new device life cycle helpers
Tidy up @probe so all migration specific initialization logic is moved
to migration specific @init callback.

Remove vfio_pci_core_{un}init_device() given no user now.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20220921104401.38898-5-kevin.tian@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21 14:15:10 -06:00
Yi Liu
63d7c77989 vfio/pci: Use the new device life cycle helpers
Also introduce two pci core helpers as @init/@release for pci drivers:

 - vfio_pci_core_init_dev()
 - vfio_pci_core_release_dev()

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220921104401.38898-3-kevin.tian@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21 14:15:10 -06:00
Kevin Tian
cb9ff3f3b8 vfio: Add helpers for unifying vfio_device life cycle
The idea is to let vfio core manage the vfio_device life cycle instead
of duplicating the logic cross drivers. This is also a preparatory
step for adding struct device into vfio_device.

New pair of helpers together with a kref in vfio_device:

 - vfio_alloc_device()
 - vfio_put_device()

Drivers can register @init/@release callbacks to manage any private
state wrapping the vfio_device.

However vfio-ccw doesn't fit this model due to a life cycle mess
that its private structure mixes both parent and mdev info hence must
be allocated/freed outside of the life cycle of vfio device.

Per prior discussions this won't be fixed in short term by IBM folks.

Instead of waiting for those modifications introduce another helper
vfio_init_device() so ccw can call it to initialize a pre-allocated
vfio_device.

Further implication of the ccw trick is that vfio_device cannot be
freed uniformly in vfio core. Instead, require *EVERY* driver to
implement @release and free vfio_device inside. Then ccw can choose
to delay the free at its own discretion.

Another trick down the road is that kvzalloc() is used to accommodate
the need of gvt which uses vzalloc() while all others use kzalloc().
So drivers should call a helper vfio_free_device() to free the
vfio_device instead of assuming that kfree() or vfree() is appliable.

Later once the ccw mess is fixed we can remove those tricks and
fully handle structure alloc/free in vfio core.

Existing vfio_{un}init_group_dev() will be deprecated after all
existing usages are converted to the new model.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20220921104401.38898-2-kevin.tian@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21 14:15:10 -06:00
Yishai Hadas
80c4b92a2d vfio: Introduce the DMA logging feature support
Introduce the DMA logging feature support in the vfio core layer.

It includes the processing of the device start/stop/report DMA logging
UAPIs and calling the relevant driver 'op' to do the work.

Specifically,
Upon start, the core translates the given input ranges into an interval
tree, checks for unexpected overlapping, non aligned ranges and then
pass the translated input to the driver for start tracking the given
ranges.

Upon report, the core translates the given input user space bitmap and
page size into an IOVA kernel bitmap iterator. Then it iterates it and
call the driver to set the corresponding bits for the dirtied pages in a
specific IOVA range.

Upon stop, the driver is called to stop the previous started tracking.

The next patches from the series will introduce the mlx5 driver
implementation for the logging ops.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220908183448.195262-6-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08 12:59:00 -06:00
Joao Martins
58ccf0190d vfio: Add an IOVA bitmap support
The new facility adds a bunch of wrappers that abstract how an IOVA range
is represented in a bitmap that is granulated by a given page_size. So it
translates all the lifting of dealing with user pointers into its
corresponding kernel addresses backing said user memory into doing finally
the (non-atomic) bitmap ops to change various bits.

The formula for the bitmap is:

   data[(iova / page_size) / 64] & (1ULL << (iova % 64))

Where 64 is the number of bits in a unsigned long (depending on arch)

It introduces an IOVA iterator that uses a windowing scheme to minimize the
pinning overhead, as opposed to pinning it on demand 4K at a time. Assuming
a 4K kernel page and 4K requested page size, we can use a single kernel
page to hold 512 page pointers, mapping 2M of bitmap, representing 64G of
IOVA space.

An example usage of these helpers for a given @base_iova, @page_size,
@length and __user @data:

   bitmap = iova_bitmap_alloc(base_iova, page_size, length, data);
   if (IS_ERR(bitmap))
       return -ENOMEM;

   ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn);

   iova_bitmap_free(bitmap);

Each iteration of the @dirty_reporter_fn is called with a unique @iova
and @length argument, indicating the current range available through the
iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty
areas (@iova_length) within that provided range, as following:

   iova_bitmap_set(bitmap, iova, iova_length);

The facility is intended to be used for user bitmaps representing dirtied
IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220908183448.195262-5-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08 12:59:00 -06:00
Alex Williamson
71aef261e0 Merge remote-tracking branch 'mlx5/mlx5-vfio' into v6.1/vfio/next
Merge net/mlx5 depedencies for device DMA logging and mlx5 variant
driver suppport.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08 10:44:34 -06:00
Yishai Hadas
939838632b net/mlx5: Query ADV_VIRTUALIZATION capabilities
Query ADV_VIRTUALIZATION capabilities which provide information for
advanced virtualization related features.

Current capabilities refer to the page tracker object which is used for
tracking the pages that are dirtied by the device.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220905105852.26398-3-yishaih@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07 11:57:27 +03:00
Yishai Hadas
a1be74c538 net/mlx5: Introduce ifc bits for page tracker
Introduce ifc related stuff to enable using page tracker.

A page tracker is a dirty page tracking object used by the device to
report the tracking log.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20220905105852.26398-2-yishaih@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07 11:57:27 +03:00
Linus Torvalds
42cf58c272 TTY/Serial driver fixes for 6.0-rc4
Here are some small tty/serial/vt driver fixes for 6.0-rc4 that resolve
 a number of reported issues:
   - n_gsm fixups for previous changes that caused problems
   - much-reported serdev crash fix that showed up in 6.0-rc1
   - vt font selection bugfix
   - kerneldoc build warning fixes
   - other tiny serial core fixes
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYxL+aQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykpugCfdekXPy61pZbfme6hrdnJ6cu7DzQAn3DNqhNU
 DWpSYr7To0W8qReR2kB1
 =yhvz
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty/serial/vt driver fixes for 6.0-rc4 that
  resolve a number of reported issues:

   - n_gsm fixups for previous changes that caused problems

   - much-reported serdev crash fix that showed up in 6.0-rc1

   - vt font selection bugfix

   - kerneldoc build warning fixes

   - other tiny serial core fixes

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'tty-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: n_gsm: avoid call of sleeping functions from atomic context
  tty: n_gsm: replace kicktimer with delayed_work
  tty: n_gsm: initialize more members at gsm_alloc_mux()
  tty: n_gsm: add sanity check for gsm->receive in gsm_receive_buf()
  tty: serial: atmel: Preserve previous USART mode if RS485 disabled
  tty: serial: lpuart: disable flow control while waiting for the transmit engine to complete
  tty: Fix lookahead_buf crash with serdev
  serial: fsl_lpuart: RS485 RTS polariy is inverse
  vt: Clear selection before changing the font
  serial: document start_rx member at struct uart_ops
2022-09-03 10:34:02 -07:00
Linus Torvalds
0b3acd1cc0 Driver core fixes for 6.0-rc4
Here are some small driver core fixes for some oft-reported problems in
 6.0-rc1.  They include:
   - a bunch of reverts to handle driver_deferred_probe_check_state()
     problems that were part of the 6.0-rc1 merge.
   - firmware_loader bugfixes now that the code is being properly tested
     and used by others
   - arch_topology fix
   - deferred driver probe bugfix to solve a long-suffering amba bus
     problem that many people have reported.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYxIhbg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykhDACgr/u/okZMF3Kds56Rg/2onnEzTScAmgPVBzQY
 NxzzAxoNfcCnHuYIh3z3
 =s9FB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core fixes for some oft-reported problems
  in 6.0-rc1.  They include:

   - a bunch of reverts to handle driver_deferred_probe_check_state()
     problems that were part of the 6.0-rc1 merge.

   - firmware_loader bugfixes now that the code is being properly tested
     and used by others

   - arch_topology fix

   - deferred driver probe bugfix to solve a long-suffering amba bus
     problem that many people have reported.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware_loader: Fix memory leak in firmware upload
  firmware_loader: Fix use-after-free during unregister
  arch_topology: Silence early cacheinfo errors when non-existent
  driver core: Don't probe devices after bus_type.match() probe deferral
  Revert "iommu/of: Delete usage of driver_deferred_probe_check_state()"
  Revert "PM: domains: Delete usage of driver_deferred_probe_check_state()"
  Revert "net: mdio: Delete usage of driver_deferred_probe_check_state()"
  Revert "driver core: Delete driver_deferred_probe_check_state()"
2022-09-02 10:55:23 -07:00
Linus Torvalds
fd59585c42 USB/Thunderbolt driver fixes for 6.0-rc4
Here are a lot of small USB and Thunderbolt driver fixes for 6.0-rc4 for
 reported problems.  Included in here are:
   - new usb-serial driver ids
   - dwc3 driver bugfixes for reported problems with 6.0-rc1
   - new device quirks, and reverts of some quirks that were incorrect
   - gadget driver bugfixes for reported problems
   - USB host controller bugfixes (xhci and others)
   - other small USB fixes, details in the shortlog
   - small thunderbolt driver fixes
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYxIfKA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk7iQCeN8Lg6Sm+2bdO7sh38QDEKCIfOKEAoM+yHA3f
 2kDkCyueaTn7ra388hSy
 =iJh8
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt driver fixes from Greg KH:
 "Here are a lot of small USB and Thunderbolt driver fixes for 6.0-rc4
  for reported problems. Included in here are:

   - new usb-serial driver ids

   - dwc3 driver bugfixes for reported problems with 6.0-rc1

   - new device quirks, and reverts of some quirks that were incorrect

   - gadget driver bugfixes for reported problems

   - USB host controller bugfixes (xhci and others)

   - other small USB fixes, details in the shortlog

   - small thunderbolt driver fixes

  All of these have been in linux-next with no reported issues"

* tag 'usb-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (51 commits)
  Revert "usb: gadget: udc-xilinx: replace memcpy with memcpy_toio"
  usb: storage: Add ASUS <0x0b05:0x1932> to IGNORE_UAS
  USB: serial: ch341: fix disabled rx timer on older devices
  USB: serial: ch341: fix lost character on LCR updates
  USB: serial: cp210x: add Decagon UCA device id
  Revert "usb: add quirks for Lenovo OneLink+ Dock"
  usb: cdns3: fix issue with rearming ISO OUT endpoint
  usb: cdns3: fix incorrect handling TRB_SMM flag for ISOC transfer
  usb: gadget: mass_storage: Fix cdrom data transfers on MAC-OS
  media: mceusb: Use new usb_control_msg_*() routines
  USB: core: Prevent nested device-reset calls
  USB: gadget: Fix obscure lockdep violation for udc_mutex
  usb: dwc2: fix wrong order of phy_power_on and phy_init
  usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
  usb: typec: Remove retimers properly
  usb: dwc3: disable USB core PHY management
  usb: add quirks for Lenovo OneLink+ Dock
  USB: serial: option: add support for Cinterion MV32-WA/WB RmNet mode
  USB: serial: ftdi_sio: add Omron CS1W-CIF31 device id
  USB: serial: option: add Quectel EM060K modem
  ...
2022-09-02 10:43:46 -07:00
Linus Torvalds
1a2f6a3722 platform-drivers-x86 for v6.0-2
Various small fixes and hardware-id additions.
 
 The following is an automated git shortlog grouped by driver:
 
 acer-wmi:
  -  Acer Aspire One AOD270/Packard Bell Dot keymap fixes
 
 asus-wmi:
  -  Increase FAN_CURVE_BUF_LEN to 32
  -  Fix the name of the mic-mute LED classdev
 
 p2sb:
  -  Fix UAF when caller uses resource name
 
 platform/mellanox:
  -  Remove redundant 'NULL' check
  -  Remove unnecessary code
  -  mlxreg-lc: Fix locking issue
  -  mlxreg-lc: Fix coverity warning
 
 platform/surface:
  -  aggregator_registry: Add HID devices for sensors and UCSI client to SP8
  -  aggregator_registry: Rename HID device nodes based on new findings
  -  aggregator_registry: Rename HID device nodes based on their function
  -  aggregator_registry: Add support for Surface Laptop Go 2
 
 pmc_atom:
  -  Fix SLP_TYPx bitfield mask
 
 thinkpad_acpi:
  -  Explicitly set to balanced mode on startup
 
 x86-android-tablets:
  -  Fix broken touchscreen on Chuwi Hi8 with Windows BIOS
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmMQ1usUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9x/AwgAvDcelUki0Roaphxpak1dnC0hRLUt
 P3PXf6NT5Xx0qHClC/0eCnuk9BzhASxAy5IW4huR8oq2nRWiMyym4PJh1e+aYniq
 IYulfky/eq1nqj64mMd50FNaGQc6Q5i0oeYwasLLdN6c/buToI9IzTEqLTGNg5Zs
 ZKvREyKHeiOYnXrpQHZE9h5zCG+dEr2zj0cdOSuijJnaee6aYt1TwghWTisPIrqk
 HJEFaRguR/HoAYoMpQP485lXdKcJFE8kIsT2qrrEsxHkL2WQkWuRpKmw+b1g5cQd
 0RO0Q4ZFtjlT13KLuKtTuVnkwSbLbHVxgCXftQxkWkZXP4nKABAXpRvKfg==
 =1CbV
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "Various small fixes and hardware-id additions"

* tag 'platform-drivers-x86-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: p2sb: Fix UAF when caller uses resource name
  platform/x86: asus-wmi: Increase FAN_CURVE_BUF_LEN to 32
  platform/mellanox: Remove redundant 'NULL' check
  platform/mellanox: Remove unnecessary code
  platform/mellanox: mlxreg-lc: Fix locking issue
  platform/mellanox: mlxreg-lc: Fix coverity warning
  platform/x86: acer-wmi: Acer Aspire One AOD270/Packard Bell Dot keymap fixes
  platform/x86: thinkpad_acpi: Explicitly set to balanced mode on startup
  platform/x86: asus-wmi: Fix the name of the mic-mute LED classdev
  platform/surface: aggregator_registry: Add HID devices for sensors and UCSI client to SP8
  platform/surface: aggregator_registry: Rename HID device nodes based on new findings
  platform/surface: aggregator_registry: Rename HID device nodes based on their function
  platform/surface: aggregator_registry: Add support for Surface Laptop Go 2
  platform/x86: x86-android-tablets: Fix broken touchscreen on Chuwi Hi8 with Windows BIOS
  platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask
2022-09-02 10:35:51 -07:00
Abhishek Sahu
453e6c98fd vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is
any access for the VFIO device on the host side, then the device will
be moved out of the low power state without the user's guest driver
involvement. Once the device access has been finished, then the host
can move the device again into low power state. With the low power
entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP,
the device will not be moved back into the low power state and
a notification will be sent to the user by triggering wakeup eventfd.

vfio_pci_core_pm_entry() will be called for both the variants of low
power feature entry so add an extra argument for wakeup eventfd context
and store locally in 'struct vfio_pci_core_device'.

For the entry happened without wakeup eventfd, all the exit related
handling will be done by the LOW_POWER_EXIT device feature only.
When the LOW_POWER_EXIT will be called, then the vfio core layer
vfio_device_pm_runtime_get() will increment the usage count and will
resume the device. In the driver runtime_resume callback, the
'pm_wake_eventfd_ctx' will be NULL. Then vfio_pci_core_pm_exit()
will call vfio_pci_runtime_pm_exit() and all the exit related handling
will be done.

For the entry happened with wakeup eventfd, in the driver resume
callback, eventfd will be triggered and all the exit related handling will
be done. When vfio_pci_runtime_pm_exit() will be called by
vfio_pci_core_pm_exit(), then it will return early.
But if the runtime suspend has not happened on the host side, then
all the exit related handling will be done in vfio_pci_core_pm_exit()
only.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220829114850.4341-6-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01 15:29:11 -06:00
Abhishek Sahu
cc2742fe36 vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY/EXIT
Currently, if the runtime power management is enabled for vfio-pci
based devices in the guest OS, then the guest OS will do the register
write for PCI_PM_CTRL register. This write request will be handled in
vfio_pm_config_write() where it will do the actual register write of
PCI_PM_CTRL register. With this, the maximum D3hot state can be
achieved for low power. If we can use the runtime PM framework, then
we can achieve the D3cold state (on the supported systems) which will
help in saving maximum power.

1. D3cold state can't be achieved by writing PCI standard
   PM config registers. This patch implements the following
   newly added low power related device features:
    - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY
    - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT

   The VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY feature will allow the
   device to make use of low power platform states on the host
   while the VFIO_DEVICE_FEATURE_LOW_POWER_EXIT will prevent
   further use of those power states.

2. The vfio-pci driver uses runtime PM framework for low power entry and
   exit. On the platforms where D3cold state is supported, the runtime
   PM framework will put the device into D3cold otherwise, D3hot or some
   other power state will be used.

   There are various cases where the device will not go into the runtime
   suspended state. For example,

   - The runtime power management is disabled on the host side for
     the device.
   - The user keeps the device busy after calling LOW_POWER_ENTRY.
   - There are dependent devices that are still in runtime active state.

   For these cases, the device will be in the same power state that has
   been configured by the user through PCI_PM_CTRL register.

3. The hypervisors can implement virtual ACPI methods. For example,
   in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power
   resources with _ON/_OFF method, then guest linux OS invokes
   the _OFF method during D3cold transition and then _ON during D0
   transition. The hypervisor can tap these virtual ACPI calls and then
   call the low power device feature IOCTL.

4. The 'pm_runtime_engaged' flag tracks the entry and exit to
   runtime PM. This flag is protected with 'memory_lock' semaphore.

5. All the config and other region access are wrapped under
   pm_runtime_resume_and_get() and pm_runtime_put(). So, if any
   device access happens while the device is in the runtime suspended
   state, then the device will be resumed first before access. Once the
   access has been finished, then the device will again go into the
   runtime suspended state.

6. The memory region access through mmap will not be allowed in the low
   power state. Since __vfio_pci_memory_enabled() is a common function,
   so check for 'pm_runtime_engaged' has been added explicitly in
   vfio_pci_mmap_fault() to block only mmap'ed access.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220829114850.4341-5-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01 15:29:11 -06:00
Abhishek Sahu
4813724c4b vfio/pci: Mask INTx during runtime suspend
This patch adds INTx handling during runtime suspend/resume.
All the suspend/resume related code for the user to put the device
into the low power state will be added in subsequent patches.

The INTx lines may be shared among devices. Whenever any INTx
interrupt comes for the VFIO devices, then vfio_intx_handler() will be
called for each device sharing the interrupt. Inside vfio_intx_handler(),
it calls pci_check_and_mask_intx() and checks if the interrupt has
been generated for the current device. Now, if the device is already
in the D3cold state, then the config space can not be read. Attempt
to read config space in D3cold state can cause system unresponsiveness
in a few systems. To prevent this, mask INTx in runtime suspend callback,
and unmask the same in runtime resume callback. If INTx has been already
masked, then no handling is needed in runtime suspend/resume callbacks.
'pm_intx_masked' tracks this, and vfio_pci_intx_mask() has been updated
to return true if the INTx vfio_pci_irq_ctx.masked value is changed
inside this function.

For the runtime suspend which is triggered for the no user of VFIO
device, the 'irq_type' will be VFIO_PCI_NUM_IRQS and these
callbacks won't do anything.

The MSI/MSI-X are not shared so similar handling should not be
needed for MSI/MSI-X. vfio_msihandler() triggers eventfd_signal()
without doing any device-specific config access. When the user performs
any config access or IOCTL after receiving the eventfd notification,
then the device will be moved to the D0 state first before
servicing any request.

Another option was to check this flag 'pm_intx_masked' inside
vfio_intx_handler() instead of masking the interrupts. This flag
is being set inside the runtime_suspend callback but the device
can be in non-D3cold state (for example, if the user has disabled D3cold
explicitly by sysfs, the D3cold is not supported in the platform, etc.).
Also, in D3cold supported case, the device will be in D0 till the
PCI core moves the device into D3cold. In this case, there is
a possibility that the device can generate an interrupt. Adding check
in the IRQ handler will not clear the IRQ status and the interrupt
line will still be asserted. This can cause interrupt flooding.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220829114850.4341-4-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01 15:29:11 -06:00
Jason Gunthorpe
1e979ef5df vfio/pci: Rename vfio_pci_register_dev_region()
As this is part of the vfio_pci_core component it should be called
vfio_pci_core_register_dev_region() like everything else exported from
this module.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/2-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01 15:29:11 -06:00
Jason Gunthorpe
e34a0425b8 vfio/pci: Split linux/vfio_pci_core.h
The header in include/linux should have only the exported interface for
other vfio_pci modules to use.  Internal definitions for vfio_pci.ko
should be in a "priv" header along side the .c files.

Move the internal declarations out of vfio_pci_core.h. They either move to
vfio_pci_priv.h or to the C file that is the only user.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/1-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01 15:29:11 -06:00
Jann Horn
2555283eb4 mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse
anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.

anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.

This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created.  So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.

This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same.  When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.

Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.

Fixes: 7a3ef208e6 ("mm: prevent endless growth of anon_vma hierarchy")
Cc: stable@kernel.org
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-31 15:45:10 -07:00
Linus Torvalds
c5e4d5e991 fscache/cachefiles fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmMPhkAACgkQ+7dXa6fL
 C2u6pg//erdjg+DhN1TxV0+ZCHGfeiV0NCLJVDHgYuAhWZGEWbq3x58bJU3nrUkE
 JFg0/En3Any3prxQ13FzCDEYGoA9bdgrE03d1EWPXH1Q7dK0dfCk8FoHSEBuIlk2
 3M0sx0pY/TpeY8IeppXT2UI89wwVc+xCVOZtRDbjit0FU0QaAOSghgFtTrujLBpP
 ohIj83h41WzqSwB+WsyW170Sa0IWTizP+piziaxIIlt4JmCx0glwSU+JWiLj2s11
 PgnkYjxmSvsYjPsYuxZOwU6vnefrfcOcy/ftYG701FUWdXbpqRMYlr9WhvwvFTUR
 WE3+XXx7sGiUrN9Cu67rhWNu7d1CshoJTmT8H6vPHd/w+vmOHWn84ReMaX3wowLp
 e+ePEglbYFEEeF2yJX3lvVV2vVr6hiUZ2NsOKoGFrPc8ynR2ec+sjMPkMDFSsHBX
 LI4jTIvzkiBpItcOt1gZei27BhwlFNX9Kj47/3lDPM5y6y5GTOEFoC/t79An9Zdj
 sc8ArAfaik618NO9V8Mw8PW4aNGHHnmD179q+Go+6eime0M6BA2ie70mcw+2ruMn
 PvlDRKR+IiwqGEuvOrm25Erswy2gslXVBvHtqCM8XmfLS6dGtAC2Qd4C8D7PwwT+
 /LU0sCPlLg68juWDGDKF2Fh/Dvi0dnxUQdntS/d6+h7Iot/a42s=
 =z6wT
 -----END PGP SIGNATURE-----

Merge tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache/cachefiles fixes from David Howells:

 - Fix kdoc on fscache_use/unuse_cookie().

 - Fix the error returned by cachefiles_ondemand_copen() from an upcall
   result.

 - Fix the distribution of requests in on-demand mode in cachefiles to
   be fairer by cycling through them rather than picking the one with
   the lowest ID each time (IDs being reused).

* tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  cachefiles: make on-demand request distribution fairer
  cachefiles: fix error return code in cachefiles_ondemand_copen()
  fscache: fix misdocumented parameter
2022-08-31 10:13:34 -07:00
Linus Torvalds
9c9d1896fa lsm/stable-6.0 PR 20220829
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmMNEC8UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXN6uA//Wvoj5l33ngi5p6CNAfxrZiOeeki7
 ylMO9NF4BZY+BOKtWDcrUvpZoLCEEEtLihQ8vz7Iyedtpd34KBzI+H+36JDC9jei
 dWZiXYzzmaN6JVQ2pIGWr9kTfRPbbE4X91bI2jhDOBv64zCqZu2qDoXshud5WHU1
 XhMMtAsQHKrdZa29y6nj6xHYuVA/fkpL5rg5LDrFDYwS7fV+g02ATmRnEsGefRNu
 JbjrapAnl6lWO6peRuyLNzf6NNgLLsXAmYOdyJGERKx23TSwqVMGhK6eODYBttiH
 E9OfFDz3oqbLfVrL6uBlr30T1lnns+WyRWdRvAP36L9wbQ/0o24mGsf5E20wo1T9
 rwPNsFelI66Eu2S1v/DQWtGtzeaed5IrWMtQc93x4I1PQIxwMSP4znWEKg/2zDNQ
 tBVVjs6bIzWHbeYozmKK9xvtqL08F5H6t+cS7BDVWPfb8nAfiXvyrwgCRY36xHfO
 LJWb125lbDflkPRiIgf81IAE6SZLH/PFLowNXZUSAo0CTALhlGZXmhNr6Oz7Xr2A
 NIwKvuFNqGav0Rcsk+Qy0ir6jRKOj9854U4y3kAVOAhPSyBVZAoN1Y3wtiOpmdI0
 taLNKv9W46ZxQtqQNOm31/py3N4bZl0y2JvS4lvwbDMqCjCqVE7236GjQ0vtYQQi
 8thpb268VJTby8Y=
 =/7Pp
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM support for IORING_OP_URING_CMD from Paul Moore:
 "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD.

  These are necessary as without them the IORING_OP_URING_CMD remains
  outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch,
  and my SELinux patch). They have been discussed at length with the
  io_uring folks, and Jens has given his thumbs-up on the relevant
  patches (see the commit descriptions).

  There is one patch that is not strictly necessary, but it makes
  testing much easier and is very trivial: the /dev/null
  IORING_OP_URING_CMD patch."

* tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  Smack: Provide read control for io_uring_cmd
  /dev/null: add IORING_OP_URING_CMD support
  selinux: implement the security_uring_cmd() LSM hook
  lsm,io_uring: add LSM hooks for the new uring_cmd file op
2022-08-31 09:23:16 -07:00
Khalid Masum
ec1bd37123 fscache: fix misdocumented parameter
This patch fixes two warnings generated by make docs. The functions
fscache_use_cookie and fscache_unuse_cookie, both have a parameter
named cookie. But they are documented with the name "object" with
unclear description. Which generates the warning when creating docs.

This commit will replace the currently misdocumented parameter names
with the correct ones while adding proper descriptions.

CC: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220521142446.4746-1-khalid.masum.92@gmail.com/ # v1
Link: https://lore.kernel.org/r/20220818040738.12036-1-khalid.masum.92@gmail.com/ # v2
Link: https://lore.kernel.org/r/880d7d25753fb326ee17ac08005952112fcf9bdb.1657360984.git.mchehab@kernel.org/ # Mauro's version
2022-08-31 14:57:28 +01:00
Alan Stern
9c6d778800 USB: core: Prevent nested device-reset calls
Automatic kernel fuzzing revealed a recursive locking violation in
usb-storage:

============================================
WARNING: possible recursive locking detected
5.18.0 #3 Not tainted
--------------------------------------------
kworker/1:3/1205 is trying to acquire lock:
ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at:
usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230

but task is already holding lock:
ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at:
usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230

...

stack backtrace:
CPU: 1 PID: 1205 Comm: kworker/1:3 Not tainted 5.18.0 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: usb_hub_wq hub_event
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2988 [inline]
check_deadlock kernel/locking/lockdep.c:3031 [inline]
validate_chain kernel/locking/lockdep.c:3816 [inline]
__lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5630
__mutex_lock_common kernel/locking/mutex.c:603 [inline]
__mutex_lock+0x14f/0x1610 kernel/locking/mutex.c:747
usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230
usb_reset_device+0x37d/0x9a0 drivers/usb/core/hub.c:6109
r871xu_dev_remove+0x21a/0x270 drivers/staging/rtl8712/usb_intf.c:622
usb_unbind_interface+0x1bd/0x890 drivers/usb/core/driver.c:458
device_remove drivers/base/dd.c:545 [inline]
device_remove+0x11f/0x170 drivers/base/dd.c:537
__device_release_driver drivers/base/dd.c:1222 [inline]
device_release_driver_internal+0x1a7/0x2f0 drivers/base/dd.c:1248
usb_driver_release_interface+0x102/0x180 drivers/usb/core/driver.c:627
usb_forced_unbind_intf+0x4d/0xa0 drivers/usb/core/driver.c:1118
usb_reset_device+0x39b/0x9a0 drivers/usb/core/hub.c:6114

This turned out not to be an error in usb-storage but rather a nested
device reset attempt.  That is, as the rtl8712 driver was being
unbound from a composite device in preparation for an unrelated USB
reset (that driver does not have pre_reset or post_reset callbacks),
its ->remove routine called usb_reset_device() -- thus nesting one
reset call within another.

Performing a reset as part of disconnect processing is a questionable
practice at best.  However, the bug report points out that the USB
core does not have any protection against nested resets.  Adding a
reset_in_progress flag and testing it will prevent such errors in the
future.

Link: https://lore.kernel.org/all/CAB7eexKUpvX-JNiLzhXBDWgfg2T9e9_0Tw4HQ6keN==voRbP0g@mail.gmail.com/
Cc: stable@vger.kernel.org
Reported-and-tested-by: Rondreis <linhaoguo86@gmail.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/YwkflDxvg0KWqyZK@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-30 15:32:04 +02:00
Bart Van Assche
dcf8e5633e tracing: Define the is_signed_type() macro once
There are two definitions of the is_signed_type() macro: one in
<linux/overflow.h> and a second definition in <linux/trace_events.h>.

As suggested by Linus, move the definition of the is_signed_type() macro
into the <linux/compiler.h> header file.  Change the definition of the
is_signed_type() macro to make sure that it does not trigger any sparse
warnings with future versions of sparse for bitwise types.

Link: https://lore.kernel.org/all/CAHk-=whjH6p+qzwUdx5SOVVHjS3WvzJQr6mDUwhEyTf6pJWzaQ@mail.gmail.com/
Link: https://lore.kernel.org/all/CAHk-=wjQGnVfb4jehFR0XyZikdQvCZouE96xR_nnf5kqaM5qqQ@mail.gmail.com/
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-29 13:29:40 -07:00
Linus Torvalds
b467192ec7 Seventeen hotfixes. Mostly memory management things. Ten patches are
cc:stable, addressing pre-6.0 issues.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYwvgrAAKCRDdBJ7gKXxA
 jlweAQC9dzE08Elxl4F7Uvxe+62JWVeflBRrT7sJ6jU1Gu3QcQEAhhI1Xit3/MGq
 pRytDBObGADxlA67c9eNq6J5pCT/7gE=
 =pD67
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more hotfixes from Andrew Morton:
 "Seventeen hotfixes.  Mostly memory management things.

  Ten patches are cc:stable, addressing pre-6.0 issues"

* tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  .mailmap: update Luca Ceresoli's e-mail address
  mm/mprotect: only reference swap pfn page if type match
  squashfs: don't call kmalloc in decompressors
  mm/damon/dbgfs: avoid duplicate context directory creation
  mailmap: update email address for Colin King
  asm-generic: sections: refactor memory_intersects
  bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem
  ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown
  Revert "memcg: cleanup racy sum avoidance code"
  mm/zsmalloc: do not attempt to free IS_ERR handle
  binder_alloc: add missing mmap_lock calls when using the VMA
  mm: re-allow pinning of zero pfns (again)
  vmcoreinfo: add kallsyms_num_syms symbol
  mailmap: update Guilherme G. Piccoli's email addresses
  writeback: avoid use-after-free after removing device
  shmem: update folio if shmem_replace_page() updates the page
  mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
2022-08-28 14:49:59 -07:00
Shakeel Butt
dbb16df644 Revert "memcg: cleanup racy sum avoidance code"
This reverts commit 96e51ccf1a.

Recently we started running the kernel with rstat infrastructure on
production traffic and begin to see negative memcg stats values. 
Particularly the 'sock' stat is the one which we observed having negative
value.

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 18446744073708724224

Re-run after couple of seconds

$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 53248

For now we are only seeing this issue on large machines (256 CPUs) and
only with 'sock' stat.  I think the networking stack increase the stat on
one cpu and decrease it on another cpu much more often.  So, this negative
sock is due to rstat flusher flushing the stats on the CPU that has seen
the decrement of sock but missed the CPU that has increments.  A typical
race condition.

For easy stable backport, revert is the most simple solution.  For long
term solution, I am thinking of two directions.  First is just reduce the
race window by optimizing the rstat flusher.  Second is if the reader sees
a negative stat value, force flush and restart the stat collection. 
Basically retry but limited.

Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com
Fixes: 96e51ccf1a ("memcg: cleanup racy sum avoidance code")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org>	[5.15]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-28 14:02:44 -07:00
Alex Williamson
fcab34b433 mm: re-allow pinning of zero pfns (again)
The below referenced commit makes the same error as 1c56343258 ("mm: fix
is_pinnable_page against a cma page"), re-interpreting the logic to
exclude pinning of the zero page, which breaks device assignment with
vfio.

To avoid further subtle mistakes, split the logic into discrete tests.

[akpm@linux-foundation.org: simplify comment, per John]
Link: https://lkml.kernel.org/r/166015037385.760108.16881097713975517242.stgit@omen
Link: https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen
Fixes: f25cbb7a95 ("mm: add zone device coherent type memory support")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Slawomir Laba <slawomirx.laba@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-28 14:02:44 -07:00
Mikulas Patocka
8238b45798 wait_on_bit: add an acquire memory barrier
There are several places in the kernel where wait_on_bit is not followed
by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read).

On architectures with weak memory ordering, it may happen that memory
accesses that follow wait_on_bit are reordered before wait_on_bit and
they may return invalid data.

Fix this class of bugs by introducing a new function "test_bit_acquire"
that works like test_bit, but has acquire memory ordering semantics.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-26 09:30:25 -07:00
Luis Chamberlain
2a58401240 lsm,io_uring: add LSM hooks for the new uring_cmd file op
io-uring cmd support was added through ee692a21e9 ("fs,io_uring:
add infrastructure for uring-cmd"), this extended the struct
file_operations to allow a new command which each subsystem can use
to enable command passthrough. Add an LSM specific for the command
passthrough which enables LSMs to inspect the command details.

This was discussed long ago without no clear pointer for something
conclusive, so this enables LSMs to at least reject this new file
operation.

[0] https://lkml.kernel.org/r/8adf55db-7bab-f59d-d612-ed906b948d19@schaufler-ca.com

Cc: stable@vger.kernel.org
Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-26 11:19:43 -04:00
Linus Torvalds
4c612826be Including fixes from ipsec and netfilter (with one broken Fixes tag).
Current release - new code bugs:
 
  - dsa: don't dereference NULL extack in dsa_slave_changeupper()
 
  - dpaa: fix <1G ethernet on LS1046ARDB
 
  - neigh: don't call kfree_skb() under spin_lock_irqsave()
 
 Previous releases - regressions:
 
  - r8152: fix the RX FIFO settings when suspending
 
  - dsa: microchip: keep compatibility with device tree blobs with
    no phy-mode
 
  - Revert "net: macsec: update SCI upon MAC address change."
 
  - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367
 
 Previous releases - always broken:
 
  - netfilter: conntrack: work around exceeded TCP receive window
 
  - ipsec: fix a null pointer dereference of dst->dev on a metadata
    dst in xfrm_lookup_with_ifid
 
  - moxa: get rid of asymmetry in DMA mapping/unmapping
 
  - dsa: microchip: make learning configurable and keep it off
    while standalone
 
  - ice: xsk: prohibit usage of non-balanced queue id
 
  - rxrpc: fix locking in rxrpc's sendmsg
 
 Misc:
 
  - another chunk of sysctl data race silencing
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmMH1scACgkQMUZtbf5S
 IrtzTA//as5jbKepxBLqWjmDtTXTzkR9AZwD3pz/y2eRYYZz97N5R6TYLXh03zc0
 OoB7yNIsjOtYu0aB0KosF+mqeGSzIG8MZ5W6eecQVRhUL270OD/kJ0G89CeHyuKP
 BYUQE2S8z+55qM6IQ0DKbR4F038J2OeR6HdV7VUDFYRGfxDZsTZU4q3aY5bklAuz
 TvpDAEsw0818a2lTdgqFUeRwbcU8ZIAJhiE/LQmqxhjsGyPkK02907Ccn06IrcAy
 UHRBc6Cbjn8IcNNSL0hChjAkUdHtk7iHAqU8Nr2QnxKbE0FHGVOW8BsmY5GYvLAC
 hH7t/dJAu3WUxubImZG6rnp3YD3YNZoaJrDgg6jSCJeUL6MKO2rJf8Q5HGiTJOWH
 8vyPfCrB9IQVnef6Im0u9EFTyu9+W4MGVN4hyhttv2OykZwSQfdpjceGZgELiwSC
 +od2p8TSXkZix//cTdWeO5THSnpHeMudh+0DEm10Uzf4+ybqIVuPn2ZCSy6piYJX
 nsAIac1j7onWEyKQQ/nqy0o6rlZwLe+h0BraHHp3sApWVjyFwS4p6Z6VADed4kga
 n/BsINdIW56pBT2nSrBTG5/RirlVfUTOaqiry0t6oak2qooEs0Gmm8DEbgTkncbs
 BRLZTVzn6X3XWq52SXf7/v36xEJ/LRooY7MqUEMPg4emgGoNuC4=
 =azH5
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from ipsec and netfilter (with one broken Fixes tag).

  Current release - new code bugs:

   - dsa: don't dereference NULL extack in dsa_slave_changeupper()

   - dpaa: fix <1G ethernet on LS1046ARDB

   - neigh: don't call kfree_skb() under spin_lock_irqsave()

  Previous releases - regressions:

   - r8152: fix the RX FIFO settings when suspending

   - dsa: microchip: keep compatibility with device tree blobs with no
     phy-mode

   - Revert "net: macsec: update SCI upon MAC address change."

   - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367

  Previous releases - always broken:

   - netfilter: conntrack: work around exceeded TCP receive window

   - ipsec: fix a null pointer dereference of dst->dev on a metadata dst
     in xfrm_lookup_with_ifid

   - moxa: get rid of asymmetry in DMA mapping/unmapping

   - dsa: microchip: make learning configurable and keep it off while
     standalone

   - ice: xsk: prohibit usage of non-balanced queue id

   - rxrpc: fix locking in rxrpc's sendmsg

  Misc:

   - another chunk of sysctl data race silencing"

* tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  net: lantiq_xrx200: restore buffer if memory allocation failed
  net: lantiq_xrx200: fix lock under memory pressure
  net: lantiq_xrx200: confirm skb is allocated before using
  net: stmmac: work around sporadic tx issue on link-up
  ionic: VF initial random MAC address if no assigned mac
  ionic: fix up issues with handling EAGAIN on FW cmds
  ionic: clear broken state on generation change
  rxrpc: Fix locking in rxrpc's sendmsg
  net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
  MAINTAINERS: rectify file entry in BONDING DRIVER
  i40e: Fix incorrect address type for IPv6 flow rules
  ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
  net: Fix a data-race around sysctl_somaxconn.
  net: Fix a data-race around netdev_unregister_timeout_secs.
  net: Fix a data-race around gro_normal_batch.
  net: Fix data-races around sysctl_devconf_inherit_init_net.
  net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
  net: Fix a data-race around netdev_budget_usecs.
  net: Fix data-races around sysctl_max_skb_frags.
  net: Fix a data-race around netdev_budget.
  ...
2022-08-25 14:03:58 -07:00
Jakub Kicinski
24c7a64ea4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix crash with malformed ebtables blob which do not provide all
   entry points, from Florian Westphal.

2) Fix possible TCP connection clogging up with default 5-days
   timeout in conntrack, from Florian.

3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian.

4) Do not allow to update implicit chains.

5) Make table handle allocation per-netns to fix data race.

6) Do not truncated payload length and offset, and checksum offset.
   Instead report EINVAl.

7) Enable chain stats update via static key iff no error occurs.

8) Restrict osf expression to ip, ip6 and inet families.

9) Restrict tunnel expression to netdev family.

10) Fix crash when trying to bind again an already bound chain.

11) Flowtable garbage collector might leave behind pending work to
    delete entries. This patch comes with a previous preparation patch
    as dependency.

12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
  netfilter: flowtable: fix stuck flows on cleanup due to pending work
  netfilter: flowtable: add function to invoke garbage collection immediately
  netfilter: nf_tables: disallow binding to already bound chain
  netfilter: nft_tunnel: restrict it to netdev family
  netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
  netfilter: nf_tables: do not leave chain stats enabled on error
  netfilter: nft_payload: do not truncate csum_offset and csum_type
  netfilter: nft_payload: report ERANGE for too long offset and length
  netfilter: nf_tables: make table handle allocation per-netns friendly
  netfilter: nf_tables: disallow updates of implicit chain
  netfilter: nft_tproxy: restrict to prerouting hook
  netfilter: conntrack: work around exceeded receive window
  netfilter: ebtables: reject blobs that don't provide all entry points
====================

Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24 19:18:10 -07:00
Kuniyuki Iwashima
a5612ca10d net: Fix data-races around sysctl_devconf_inherit_init_net.
While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395cfa ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24 13:46:58 +01:00
Kuniyuki Iwashima
af67508ea6 net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6ce2 ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24 13:46:58 +01:00
Linus Torvalds
c40e8341e3 cgroup fixes for v6.0-rc2
Contains fixes for the following issues:
 
 * psi data structure was changed to be allocated dynamically but it wasn't
   being cleared leading to reporting garbage values and triggering spurious
   oom kills.
 
 * A deadlock involving cpuset and cpu hotplug.
 
 * When a controller is moved across cgroup hierarchies, css->rstat_css_node
   didn't get RCU drained properly from the previous list.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCYwVmRg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGb/eAP44dr9/OQtapKm63H/qmLF39LWE6nC99RYHECl5
 ncuZvwD/XIkZt212nr/qC1C0ggB5qCGG7tIZG6tIgkS+J5huqg4=
 =CC/Y
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - The psi data structure was changed to be allocated dynamically but
   it wasn't being cleared leading to it reporting garbage values and
   triggering spurious oom kills.

 - A deadlock involving cpuset and cpu hotplug.

 - When a controller is moved across cgroup hierarchies,
   css->rstat_css_node didn't get RCU drained properly from the previous
   list.

* tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Fix race condition at rebind_subsystems()
  cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
  sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS
  sched/psi: Remove unused parameter nbytes of psi_trigger_create()
  sched/psi: Zero the memory of struct psi_group
2022-08-23 19:33:28 -07:00
Linus Torvalds
95607ad99b Thirteen fixes, almost all for MM. Seven of these are cc:stable and the
remainder fix up the changes which went into this -rc cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYwQZcgAKCRDdBJ7gKXxA
 jnCxAQCk8L6PPm0L2KvKr5Vu3M/T0o9SvfxfM5yho80zM68fHQD/eLxz+nd3m+N5
 K7Mdbcb2u6F46qQaS+S5RialEWKpsw8=
 =WtBo
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Thirteen fixes, almost all for MM.

  Seven of these are cc:stable and the remainder fix up the changes
  which went into this -rc cycle"

* tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kprobes: don't call disarm_kprobe() for disabled kprobes
  mm/shmem: shmem_replace_page() remember NR_SHMEM
  mm/shmem: tmpfs fallocate use file_modified()
  mm/shmem: fix chattr fsflags support in tmpfs
  mm/hugetlb: support write-faults in shared mappings
  mm/hugetlb: fix hugetlb not supporting softdirty tracking
  mm/uffd: reset write protection when unregister with wp-mode
  mm/smaps: don't access young/dirty bit if pte unpresent
  mm: add DEVICE_ZONE to FOR_ALL_ZONES
  kernel/sys_ni: add compat entry for fadvise64_64
  mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW
  Revert "zram: remove double compression logic"
  get_maintainer: add Alan to .get_maintainer.ignore
2022-08-23 13:33:08 -07:00
Florian Westphal
7997eff828 netfilter: ebtables: reject blobs that don't provide all entry points
Harshit Mogalapalli says:
 In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
 can lead to NULL pointer dereference. [..] Kernel panic:

general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
 nf_hook_slow+0xb1/0x170
 __br_forward+0x289/0x730
 maybe_deliver+0x24b/0x380
 br_flood+0xc6/0x390
 br_dev_xmit+0xa2e/0x12c0

For some reason ebtables rejects blobs that provide entry points that are
not supported by the table, but what it should instead reject is the
opposite: blobs that DO NOT provide an entry point supported by the table.

t->valid_hooks is the bitmask of hooks (input, forward ...) that will see
packets.  Providing an entry point that is not support is harmless
(never called/used), but the inverse isn't: it results in a crash
because the ebtables traverser doesn't expect a NULL blob for a location
its receiving packets for.

Instead of fixing all the individual checks, do what iptables is doing and
reject all blobs that differ from the expected hooks.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-23 18:23:15 +02:00
Saravana Kannan
13a8e0f6b0 Revert "driver core: Delete driver_deferred_probe_check_state()"
This reverts commit 9cbffc7a59.

There are a few more issues to fix that have been reported in the thread
for the original series [1]. We'll need to fix those before this will work.
So, revert it for now.

[1] - https://lore.kernel.org/lkml/20220601070707.3946847-1-saravanak@google.com/

Fixes: 9cbffc7a59 ("driver core: Delete driver_deferred_probe_check_state()")
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Peng Fan <peng.fan@nxp.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220819221616.2107893-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-23 13:14:02 +02:00
Moshe Shemesh
d59b73a66e net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
Add a lock_class_key per mlx5 device to avoid a false positive
"possible circular locking dependency" warning by lockdep, on flows
which lock more than one mlx5 device, such as adding SF.

kernel log:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.19.0-rc8+ #2 Not tainted
 ------------------------------------------------------
 kworker/u20:0/8 is trying to acquire lock:
 ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]

 but task is already holding lock:
 ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}:
        down_write+0x90/0x150
        blocking_notifier_chain_register+0x53/0xa0
        mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
        mlx5_init_one+0x261/0x490 [mlx5_core]
        probe_one+0x430/0x680 [mlx5_core]
        local_pci_probe+0xd6/0x170
        work_for_cpu_fn+0x4e/0xa0
        process_one_work+0x7c2/0x1340
        worker_thread+0x6f6/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
        __lock_acquire+0x2fc7/0x6720
        lock_acquire+0x1c1/0x550
        __mutex_lock+0x12c/0x14b0
        mlx5_init_one+0x2e/0x490 [mlx5_core]
        mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
        auxiliary_bus_probe+0x9d/0xe0
        really_probe+0x1e0/0xaa0
        __driver_probe_device+0x219/0x480
        driver_probe_device+0x49/0x130
        __device_attach_driver+0x1b8/0x280
        bus_for_each_drv+0x123/0x1a0
        __device_attach+0x1a3/0x460
        bus_probe_device+0x1a2/0x260
        device_add+0x9b1/0x1b40
        __auxiliary_device_add+0x88/0xc0
        mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
        blocking_notifier_call_chain+0xd5/0x130
        mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
        process_one_work+0x7c2/0x1340
        worker_thread+0x59d/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&(&notifier->n_head)->rwsem);
                                lock(&dev->intf_state_mutex);
                                lock(&(&notifier->n_head)->rwsem);
   lock(&dev->intf_state_mutex);

  *** DEADLOCK ***

 4 locks held by kworker/u20:0/8:
  #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
  #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
  #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
  #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460

 stack backtrace:
 CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x7d
  check_noncircular+0x278/0x300
  ? print_circular_bug+0x460/0x460
  ? lock_chain_count+0x20/0x20
  ? register_lock_class+0x1880/0x1880
  __lock_acquire+0x2fc7/0x6720
  ? register_lock_class+0x1880/0x1880
  ? register_lock_class+0x1880/0x1880
  lock_acquire+0x1c1/0x550
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  __mutex_lock+0x12c/0x14b0
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? _raw_read_unlock+0x1f/0x30
  ? mutex_lock_io_nested+0x1320/0x1320
  ? __ioremap_caller.constprop.0+0x306/0x490
  ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
  ? iounmap+0x160/0x160
  mlx5_init_one+0x2e/0x490 [mlx5_core]
  mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
  ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
  auxiliary_bus_probe+0x9d/0xe0
  really_probe+0x1e0/0xaa0
  __driver_probe_device+0x219/0x480
  ? auxiliary_match_id+0xe9/0x140
  driver_probe_device+0x49/0x130
  __device_attach_driver+0x1b8/0x280
  ? driver_allows_async_probing+0x140/0x140
  bus_for_each_drv+0x123/0x1a0
  ? bus_for_each_dev+0x1a0/0x1a0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  ? trace_hardirqs_on+0x2d/0x100
  __device_attach+0x1a3/0x460
  ? device_driver_attach+0x1e0/0x1e0
  ? kobject_uevent_env+0x22d/0xf10
  bus_probe_device+0x1a2/0x260
  device_add+0x9b1/0x1b40
  ? dev_set_name+0xab/0xe0
  ? __fw_devlink_link_to_suppliers+0x260/0x260
  ? memset+0x20/0x40
  ? lockdep_init_map_type+0x21a/0x7d0
  __auxiliary_device_add+0x88/0xc0
  ? auxiliary_device_init+0x86/0xa0
  mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
  blocking_notifier_call_chain+0xd5/0x130
  mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
  ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
  ? lock_downgrade+0x6e0/0x6e0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  process_one_work+0x7c2/0x1340
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? pwq_dec_nr_in_flight+0x230/0x230
  ? rwlock_bug.part.0+0x90/0x90
  worker_thread+0x59d/0xec0
  ? process_one_work+0x1340/0x1340
  kthread+0x28f/0x330
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Fixes: 6a32732174 ("net/mlx5: SF, Port function state change support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22 12:57:08 -07:00