Commit graph

78350 commits

Author SHA1 Message Date
Linus Torvalds
4f9701057a IOMMU Updates for Linux v5.13
Including:
 
 	- Big cleanup of almost unsused parts of the IOMMU API by
 	  Christoph Hellwig. This mostly affects the Freescale PAMU
 	  driver.
 
 	- New IOMMU driver for Unisoc SOCs
 
 	- ARM SMMU Updates from Will:
 
 	  - SMMUv3: Drop vestigial PREFETCH_ADDR support
 	  - SMMUv3: Elide TLB sync logic for empty gather
 	  - SMMUv3: Fix "Service Failure Mode" handling
     	  - SMMUv2: New Qualcomm compatible string
 
 	- Removal of the AMD IOMMU performance counter writeable check
 	  on AMD. It caused long boot delays on some machines and is
 	  only needed to work around an errata on some older (possibly
 	  pre-production) chips. If someone is still hit by this
 	  hardware issue anyway the performance counters will just
 	  return 0.
 
 	- Support for targeted invalidations in the AMD IOMMU driver.
 	  Before that the driver only invalidated a single 4k page or the
 	  whole IO/TLB for an address space. This has been extended now
 	  and is mostly useful for emulated AMD IOMMUs.
 
 	- Several fixes for the Shared Virtual Memory support in the
 	  Intel VT-d driver
 
 	- Mediatek drivers can now be built as modules
 
 	- Re-introduction of the forcedac boot option which got lost
 	  when converting the Intel VT-d driver to the common dma-iommu
 	  implementation.
 
 	- Extension of the IOMMU device registration interface and
 	  support iommu_ops to be const again when drivers are built as
 	  modules.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmCMEIoACgkQK/BELZcB
 GuOu9xAAvg6aR0uHlxvRq6cgNnHN9Ltp5+t3qFYtRRrauY0iOPMO62k0QQli5shX
 CGeczD0e59KAZqI0zNJnQn8hMY5dg7XVkFCC5BrSzuCDCtwJZ0N5Tq3pfUlaV1rw
 BJf41t79Fd+jp7kn53tu+vRAfYZ3+sLOx/6U3c15pqKRZSkyFWbQllOtD3J5LnLu
 1PyPlfiNpMwCajiS7aQbN+fuJ/lKIFeA2MDPOsCBzhbfxiJUqJxZOKAZO3rOjFfK
 feTibqQ+3Zz6MPXt9st1cvPpy8jCosv81OY6Knqvxf/oB5q+fEdi2uNrKISonb/t
 Fw331oOIwg2A+HOpwC9MN1AumOIqiHSWWENAMk9SlP+TMIWKQ8kZreyI6IEB23dV
 +QvP3DVA+CfLwtNY/Zh0IqKh28D+IHlKbpWNU1m+9AUe468mV/MTjfwxr9Yfffhm
 LZ6C0DgFdmtqv8jPuDGUOgo3RNeN8bLnUSEHG9gHibA+RKujl5BWDjKkwILqMQTt
 Ysdsu8TiNtFIULomizqCpgqEbQfW8TLFvASXCM1VMQ/PDURxvchZPxFDJonYXy+K
 z2HGaG3eUE07YrAdRKH69aMVIbmS+sjEhvmi4xZ1Lh7wWcIE2AZVvO8qNb+Ckcp3
 4tLPPDksm/iQngnFf6gdgH3qv4rgbzE4+74GXqeANiQCjY9dSJI=
 =qF2C
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Big cleanup of almost unsused parts of the IOMMU API by Christoph
   Hellwig. This mostly affects the Freescale PAMU driver.

 - New IOMMU driver for Unisoc SOCs

 - ARM SMMU Updates from Will:
     - Drop vestigial PREFETCH_ADDR support (SMMUv3)
     - Elide TLB sync logic for empty gather (SMMUv3)
     - Fix "Service Failure Mode" handling (SMMUv3)
     - New Qualcomm compatible string (SMMUv2)

 - Removal of the AMD IOMMU performance counter writeable check on AMD.
   It caused long boot delays on some machines and is only needed to
   work around an errata on some older (possibly pre-production) chips.
   If someone is still hit by this hardware issue anyway the performance
   counters will just return 0.

 - Support for targeted invalidations in the AMD IOMMU driver. Before
   that the driver only invalidated a single 4k page or the whole IO/TLB
   for an address space. This has been extended now and is mostly useful
   for emulated AMD IOMMUs.

 - Several fixes for the Shared Virtual Memory support in the Intel VT-d
   driver

 - Mediatek drivers can now be built as modules

 - Re-introduction of the forcedac boot option which got lost when
   converting the Intel VT-d driver to the common dma-iommu
   implementation.

 - Extension of the IOMMU device registration interface and support
   iommu_ops to be const again when drivers are built as modules.

* tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (84 commits)
  iommu: Streamline registration interface
  iommu: Statically set module owner
  iommu/mediatek-v1: Add error handle for mtk_iommu_probe
  iommu/mediatek-v1: Avoid build fail when build as module
  iommu/mediatek: Always enable the clk on resume
  iommu/fsl-pamu: Fix uninitialized variable warning
  iommu/vt-d: Force to flush iotlb before creating superpage
  iommu/amd: Put newline after closing bracket in warning
  iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'
  iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86
  iommu/amd: Remove performance counter pre-initialization test
  Revert "iommu/amd: Fix performance counter initialization"
  iommu/amd: Remove duplicate check of devid
  iommu/exynos: Remove unneeded local variable initialization
  iommu/amd: Page-specific invalidations for more than one page
  iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command
  iommu/vt-d: Avoid unnecessary cache flush in pasid entry teardown
  iommu/vt-d: Invalidate PASID cache when root/context entry changed
  iommu/vt-d: Remove WO permissions on second-level paging entries
  iommu/vt-d: Report the right page fault address
  ...
2021-05-01 09:33:00 -07:00
Linus Torvalds
f34b2cf178 RDMA merge window pull request
This is significantly bug fixes and general cleanups. The noteworthy new
 features are fairly small:
 
 - XRC support for HNS and improves RQ operations
 
 - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and
   qib
 
 - Quite a few general cleanups on spelling, error handling, static checker
   detections, etc
 
 - Increase the number of device ports supported beyond 255. High port
   count software switches now exist
 
 - Several bug fixes for rtrs
 
 - mlx5 Device Memory support for host controlled atomics
 
 - Report SRQ tables through to rdma-tool
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmCMMHEACgkQOG33FX4g
 mxri3Q//RAgIExCGHebQ9xkptZHVyTLLJMpiMl2cqk3ZVRdDZ7QdiQjIqY2KqlUK
 nxBj7EXJeX6rV5a1xqCcOO1gBetB28TSwnCNE2ZqrXP5B59ISW8D052IWza3UkUz
 WmHLARxHQlyKBWA4+ZAgfoUGL0NmWA8QPf56t/RK/3/OsuYnGzcnWmmFbt8XKFcH
 NtO3KC45mKWDqqG0A0XRrLbEQz/ElO3OuPBqlBKgB3ZgGPzgsOUTOGkm1tCcZ89L
 /pvZGB7SklKZdCX8TxdpVGd9h0zHl8pqh1yEzvTA1ypNAYSUId2mvZXluU8J5yJl
 FLk7E1IxE5050FNEc7T5uZdUVntulYiqL2558coRI34l5w26pKGjIMxw/nTB8hg8
 4ZfBtKVemIG6yzW5Up6iBpK7qWYpvLWVShwYAWhbNsjN7JGzJuh1gJnjbmYgyz2P
 RTMU9wjFPLL2wZxg4LDHACVJNBb82j6KKuE+kZWpk11ro7INw9+7YwRuTo7/ezxC
 BwXKu8wF4igwSigV55jM+WnGXLhxdC3qmx/2cbtWyLM/PzdRL96tM0RWW5v8/Nv7
 teFhkt+f3RVqcfYH5K1qCXy3UFrxG6bxFSvcHHSBx2bdIrqhuTY5FqszAYImeW2j
 iHoyIsuSuGu79HQgOzAQZsEyksWi6OYDvA9Q9VBoPP4bJ3DOAa4=
 =vsXA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is significantly bug fixes and general cleanups. The noteworthy
  new features are fairly small:

   - XRC support for HNS and improves RQ operations

   - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw
     and qib

   - Quite a few general cleanups on spelling, error handling, static
     checker detections, etc

   - Increase the number of device ports supported beyond 255. High port
     count software switches now exist

   - Several bug fixes for rtrs

   - mlx5 Device Memory support for host controlled atomics

   - Report SRQ tables through to rdma-tool"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits)
  IB/qib: Remove redundant assignment to ret
  RDMA/nldev: Add copy-on-fork attribute to get sys command
  RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res
  RDMA/siw: Fix a use after free in siw_alloc_mr
  IB/hfi1: Remove redundant variable rcd
  RDMA/nldev: Add QP numbers to SRQ information
  RDMA/nldev: Return SRQ information
  RDMA/restrack: Add support to get resource tracking for SRQ
  RDMA/nldev: Return context information
  RDMA/core: Add CM to restrack after successful attachment to a device
  RDMA/cma: Skip device which doesn't support CM
  RDMA/rxe: Fix a bug in rxe_fill_ip_info()
  RDMA/mlx5: Expose private query port
  RDMA/mlx4: Remove an unused variable
  RDMA/mlx5: Fix type assignment for ICM DM
  IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
  RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails
  RDMA/cxgb4: add missing qpid increment
  IB/ipoib: Remove unnecessary struct declaration
  RDMA/bnxt_re: Get rid of custom module reference counting
  ...
2021-05-01 09:15:05 -07:00
Linus Torvalds
9f67672a81 New features for ext4 this cycle include support for encrypted
casefold, ensure that deleted file names are cleared in directory
 blocks by zeroing directory entries when they are unlinked or moved as
 part of a hash tree node split.  We also improve the block allocator's
 performance on a freshly mounted file system by prefetching block
 bitmaps.
 
 There are also the usual cleanups and bug fixes, including fixing a
 page cache invalidation race when there is mixed buffered and direct
 I/O and the block size is less than page size, and allow the dax flag
 to be set and cleared on inline directories.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmCLei4ACgkQ8vlZVpUN
 gaPZkgf/VH08xjMf3VthC+BpvVmChQXfV4yjigHbO2pmPyYWZhyJzkEGCQD8u2eB
 b7ShW+B1NCifcTU34xAkKHwEtakzzEv3WIMrT1oZNWrpfo8tt850EkwQggaGGDpd
 /HnP1/wLtziJ5hE6DwutmX7qB4VFghVj898MjDrEPSOBqItOjWps9mn/JWL7SHyI
 Dqzhf5XZTYPaXWuJmSmKw3q8O70JDHnZe/rRWlfX1jLI5KDtqp71Nw1B+gszUB66
 IUdncyZKvInsyjYhkbCQ8U6WFih82MrbKeuGYDp/RFvg5eMELEYkwT9j0ofuDHq8
 zn62sAlbOXv1DiqkPDHKVm9GkHx8/g==
 =UpnH
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "New features for ext4 this cycle include support for encrypted
  casefold, ensure that deleted file names are cleared in directory
  blocks by zeroing directory entries when they are unlinked or moved as
  part of a hash tree node split. We also improve the block allocator's
  performance on a freshly mounted file system by prefetching block
  bitmaps.

  There are also the usual cleanups and bug fixes, including fixing a
  page cache invalidation race when there is mixed buffered and direct
  I/O and the block size is less than page size, and allow the dax flag
  to be set and cleared on inline directories"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
  ext4: wipe ext4_dir_entry2 upon file deletion
  ext4: Fix occasional generic/418 failure
  fs: fix reporting supported extra file attributes for statx()
  ext4: allow the dax flag to be set and cleared on inline directories
  ext4: fix debug format string warning
  ext4: fix trailing whitespace
  ext4: fix various seppling typos
  ext4: fix error return code in ext4_fc_perform_commit()
  ext4: annotate data race in jbd2_journal_dirty_metadata()
  ext4: annotate data race in start_this_handle()
  ext4: fix ext4_error_err save negative errno into superblock
  ext4: fix error code in ext4_commit_super
  ext4: always panic when errors=panic is specified
  ext4: delete redundant uptodate check for buffer
  ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
  ext4: make prefetch_block_bitmaps default
  ext4: add proc files to monitor new structures
  ext4: improve cr 0 / cr 1 group scanning
  ext4: add MB_NUM_ORDERS macro
  ext4: add mballoc stats proc file
  ...
2021-04-30 15:35:30 -07:00
Linus Torvalds
d42f323a7d Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "A few misc subsystems and some of MM.

  175 patches.

  Subsystems affected by this patch series: ia64, kbuild, scripts, sh,
  ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub,
  kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap,
  mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization,
  pagealloc, and memory-failure)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (175 commits)
  mm/memory-failure: unnecessary amount of unmapping
  mm/mmzone.h: fix existing kernel-doc comments and link them to core-api
  mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1
  net: page_pool: use alloc_pages_bulk in refill code path
  net: page_pool: refactor dma_map into own function page_pool_dma_map
  SUNRPC: refresh rq_pages using a bulk page allocator
  SUNRPC: set rq_page_end differently
  mm/page_alloc: inline __rmqueue_pcplist
  mm/page_alloc: optimize code layout for __alloc_pages_bulk
  mm/page_alloc: add an array-based interface to the bulk page allocator
  mm/page_alloc: add a bulk page allocator
  mm/page_alloc: rename alloced to allocated
  mm/page_alloc: duplicate include linux/vmalloc.h
  mm, page_alloc: avoid page_to_pfn() in move_freepages()
  mm/Kconfig: remove default DISCONTIGMEM_MANUAL
  mm: page_alloc: dump migrate-failed pages
  mm/mempolicy: fix mpol_misplaced kernel-doc
  mm/mempolicy: rewrite alloc_pages_vma documentation
  mm/mempolicy: rewrite alloc_pages documentation
  mm/mempolicy: rename alloc_pages_current to alloc_pages
  ...
2021-04-30 14:38:01 -07:00
Linus Torvalds
65ec0a7d24 This is the bulk of the pin control changes for the v5.13 kernel cycle
Core changes:
 
 - A semantic change to handle pinmux and pinconf in explicit order
   while up until now we depended on the semantic order in the
   device tree. The device tree is a functional programming
   language and does not imply any order, so the right thing is
   for the pin control core to provide these semantics.
 
 - Add a new pinmux-select debugfs file which makes it possible to
   go in and select functions for a pin manually (iteratively, at
   the prompt) for debugging purposes.
 
 - Fixes to gpio regmap handling for a new pin control driver
   making use of regmap-gpio.
 
 - Use octal permissions on debugfs files.
 
 New drivers:
 
 - A massive rewrite of the former custom pin control driver for
   MIPS Broadcom devices to instead use the pin control subsystem.
   New pin control drivers for BCM6345, BCM6328, BCM6358, BCM6362,
   BCM6368, BCM63268 and BCM6318 SoC variants are implemented.
 
 - Support for PM8350, PM8350B, PM8350C, PMK8350, PMR735A and
   PMR735B in the Qualcomm PMIC GPIO driver. Also the two GPIOs
   on PM8008 are supported.
 
 - Support for the Rockchip RK3568/RK3566 pin controller.
 
 - Support for Ingenic JZ4730, JZ4750, JZ4755, JZ4775 and
   X2000.
 
 - Support for Mediatek MTK8195.
 
 - Add a new Xilinx ZynqMP pin control driver.
 
 Driver improvements and non-urgent fixes:
 
 - Modularization and improvements of the Rockchip drivers.
 
 - Some new pins added to the description of new Renesas SoCs.
 
 - Clarifications of the GPIO base calculation in the Intel driver.
 
 - Fix the function names for the MPP54 and MPP55 pins in the Armada
   CP110 pin controller.
 
 - GPIO wakeup interrupt map for Qualcomm SC7280 and SM8350.
 
 - Support for ACPI probing of the Qualcomm SC8180x.
 
 - Fix interrupt clear status on rockchip
 
 - Fix some missing pins on the Ingenic JZ4770, some semantic
   fixes for the behaviour of the Ingenic pin controller.
   Add DMIC pins for JZ4780, X1000, X1500 and X1830.
 
 - A slew of janitorial like of_node_put() calls.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmCL5sAACgkQQRCzN7AZ
 XXNX5RAAtdPvDrPzzWdeqNLyodnJu/SyeA2xbmsvywrSvgpSx3FojFW9AXY/sr7w
 RuhGGA5KhnrovwiabRKoZ0d0lC/JtKdx5g2o9ePFHDy+7BzFnVacBjL38UftSKy0
 4QpDNJ3zock/XTUgJdaJEsbHhP/N4fOF/SbLpguYzGz7JpybNrZ+2M73yeQSL6uE
 yuhn/AgFMLgWS47nSAH91Yt387+XCEfB75nftXyFSN9GpQ9i3VixWsG3Um/Stoma
 aR7IIknvHdpCrOHH1IKohYcdlOkE7Wh2wHXSJVv26M49Ri5KSXu17lsUknebQ/oq
 UeDYdd/2q/wFjxdEbG2tqinEYHs3e1RPmatVesgyibtYHGwjnSFo/G6UtG4948ii
 1exwBi+0fw58YWLu/z4bhnNtZx6VsOev6mJ5GF7pyYzGIJy3r5J/9KCDzOJEoLom
 YTVmgZRjzJuH/i0rPgyg3lSxlP/pdvdk1YUMlIYN1zWdPnRqj7/q+qaxPOkltqD+
 20NFkvhQuuq+dLn4jtNK9xr2+vIKxIRPClT3D/lAihEPC5MUaFw/+y/V7c1hEJfS
 d1dh5DwgHK7i55/lqLFaXeNNYsmY/SiFecoB8xyFnOJFsHlSHe/6NfjmRhOMUn6V
 IX2GG4CBAzaheIWtN/ub/DcQ1vwA2n9hO5WX+Y3CXkIxXUFPmJY=
 =QrEn
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "There is a lot going on!

  Core changes:

   - A semantic change to handle pinmux and pinconf in explicit order
     while up until now we depended on the semantic order in the device
     tree. The device tree is a functional programming language and does
     not imply any order, so the right thing is for the pin control core
     to provide these semantics.

   - Add a new pinmux-select debugfs file which makes it possible to go
     in and select functions for a pin manually (iteratively, at the
     prompt) for debugging purposes.

   - Fixes to gpio regmap handling for a new pin control driver making
     use of regmap-gpio.

   - Use octal permissions on debugfs files.

  New drivers:

   - A massive rewrite of the former custom pin control driver for MIPS
     Broadcom devices to instead use the pin control subsystem. New pin
     control drivers for BCM6345, BCM6328, BCM6358, BCM6362, BCM6368,
     BCM63268 and BCM6318 SoC variants are implemented.

   - Support for PM8350, PM8350B, PM8350C, PMK8350, PMR735A and PMR735B
     in the Qualcomm PMIC GPIO driver. Also the two GPIOs on PM8008 are
     supported.

   - Support for the Rockchip RK3568/RK3566 pin controller.

   - Support for Ingenic JZ4730, JZ4750, JZ4755, JZ4775 and X2000.

   - Support for Mediatek MTK8195.

   - Add a new Xilinx ZynqMP pin control driver.

  Driver improvements and non-urgent fixes:

   - Modularization and improvements of the Rockchip drivers.

   - Some new pins added to the description of new Renesas SoCs.

   - Clarifications of the GPIO base calculation in the Intel driver.

   - Fix the function names for the MPP54 and MPP55 pins in the Armada
     CP110 pin controller.

   - GPIO wakeup interrupt map for Qualcomm SC7280 and SM8350.

   - Support for ACPI probing of the Qualcomm SC8180x.

   - Fix interrupt clear status on rockchip

   - Fix some missing pins on the Ingenic JZ4770, some semantic fixes
     for the behaviour of the Ingenic pin controller. Add DMIC pins for
     JZ4780, X1000, X1500 and X1830.

   - A slew of janitorial like of_node_put() calls"

* tag 'pinctrl-v5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits)
  pinctrl: Add Xilinx ZynqMP pinctrl driver support
  firmware: xilinx: Add pinctrl support
  pinctrl: rockchip: do coding style for mux route struct
  pinctrl: Add PIN_CONFIG_MODE_PWM to enum pin_config_param
  pinctrl: Introduce MODE group in enum pin_config_param
  pinctrl: Keep enum pin_config_param ordered by name
  dt-bindings: pinctrl: Add binding for ZynqMP pinctrl driver
  pinctrl: core: Fix kernel doc string for pin_get_name()
  pinctrl: mediatek: use spin lock in mtk_rmw
  pinctrl: add drive for I2C related pins on MT8195
  pinctrl: add pinctrl driver on mt8195
  dt-bindings: pinctrl: mt8195: add pinctrl file and binding document
  pinctrl: Ingenic: Add pinctrl driver for X2000.
  pinctrl: Ingenic: Add pinctrl driver for JZ4775.
  pinctrl: Ingenic: Add pinctrl driver for JZ4755.
  pinctrl: Ingenic: Add pinctrl driver for JZ4750.
  pinctrl: Ingenic: Add pinctrl driver for JZ4730.
  dt-bindings: pinctrl: Add bindings for new Ingenic SoCs.
  pinctrl: Ingenic: Reformat the code.
  pinctrl: Ingenic: Add DMIC pins support for Ingenic SoCs.
  ...
2021-04-30 13:04:30 -07:00
Linus Torvalds
592fa9532d Merge branch 'i2c/for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:

 - new drivers for Silicon Labs CP2615 and the HiSilicon I2C unit

 - bigger refactoring for the MPC driver

 - support for full software nodes - no need to work around with only
   properties anymore

 - we now have 'devm_i2c_add_adapter', too

 - sub-system wide fixes for the RPM refcounting problem which often
   caused a leak when an error was encountered during probe

 - the rest is usual driver updates and improvements

* 'i2c/for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (77 commits)
  i2c: mediatek: Use scl_int_delay_ns to compensate clock-stretching
  i2c: mediatek: Fix wrong dma sync flag
  i2c: mediatek: Fix send master code at more than 1MHz
  i2c: sh7760: fix IRQ error path
  i2c: i801: Add support for Intel Alder Lake PCH-M
  i2c: core: Fix spacing error by checkpatch
  i2c: s3c2410: simplify getting of_device_id match data
  i2c: nomadik: Fix space errors
  i2c: iop3xx: Fix coding style issues
  i2c: amd8111: Fix coding style issues
  i2c: mpc: Drop duplicate message from devm_platform_ioremap_resource()
  i2c: mpc: Use device_get_match_data() helper
  i2c: mpc: Remove CONFIG_PM_SLEEP ifdeffery
  i2c: mpc: Use devm_clk_get_optional()
  i2c: mpc: Update license and copyright
  i2c: mpc: Interrupt driven transfer
  i2c: sh7760: add IRQ check
  i2c: rcar: add IRQ check
  i2c: mlxbf: add IRQ check
  i2c: jz4780: add IRQ check
  ...
2021-04-30 13:01:02 -07:00
Linus Torvalds
efd8929b9e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:

 - Surface Aggregator Module support from Maximilian Luz

 - Apple Magic Mouse 2 support from John Chen

 - Support for newer Quad/BT 2.0 Logitech receivers in HID proxy mode
   from Hans de Goede

 - Thinkpad X1 Tablet keyboard support from Hans de Goede

 - Support for FTDI FT260 I2C host adapter from Michael Zaidman

 - other various small device-specific quirks, fixes and cleanups

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (46 commits)
  HID: wacom: Setup pen input capabilities to the targeted tools
  HID: hid-sensor-hub: Move 'hsdev' description to correct struct definition
  HID: hid-sensor-hub: Remove unused struct member 'quirks'
  HID: wacom_sys: Demote kernel-doc abuse
  HID: hid-sensor-custom: Remove unused variable 'ret'
  HID: hid-uclogic-params: Ensure function names are present and correct in kernel-doc headers
  HID: hid-uclogic-rdesc: Kernel-doc is for functions and structs
  HID: hid-logitech-hidpp: Fix conformant kernel-doc header and demote abuses
  HID: hid-picolcd_core: Remove unused variable 'ret'
  HID: hid-kye: Fix incorrect function name for kye_tablet_enable()
  HID: hid-core: Fix incorrect function name in header
  HID: hid-alps: Correct struct misnaming
  HID: usbhid: hid-pidff: Demote a couple kernel-doc abuses
  HID: usbhid: Repair a formatting issue in a struct description
  HID: hid-thrustmaster: Demote a bunch of kernel-doc abuses
  HID: input: map battery capacity (00850065)
  HID: magicmouse: fix reconnection of Magic Mouse 2
  HID: magicmouse: fix 3 button emulation of Mouse 2
  HID: magicmouse: add Apple Magic Mouse 2 support
  HID: lenovo: Add support for Thinkpad X1 Tablet Thin keyboard
  ...
2021-04-30 12:53:02 -07:00
Linus Torvalds
b71428d7ab sound updates for 5.13
No surprises in this development cycle, and most of works are about
 the fixes and the improvements of the existing code, while a new LED
 control layer and a few new drivers have been introduced.
 
 Here are some highlights:
 
 Core:
 - A common mute-LED framework was introduced;
   used by HD-audio for now, more adaption will follow later.
   The former "Mic Mute-LED Mode" mixer control has been replaced with
   the corresponding sysfs now.
 - User-control management was changed to count consumed bytes instead
   of capping by number of elements;
   this will allow more controls in the normal usage pattern while
   avoiding the possible memory exhaustion DoS
 
 ASoC:
 - Continued refactoring and cleanups in ASoC core and generic card
   drivers
 - Wide range of small cppcheck and warning fixes
 - New drivers for Freescale i.MX DMA over rpmsg, Mediatek MT6358
    accessory detection, and Realtek RT1019, RT1316, RT711 and RT715
 
 USB-audio:
 - Continued improvements and fixes of the implicit feedback mode,
   including better support for Pioneer and Roland/BOSS devices
 
 HD-audio:
 - Default back to non-buffer preallocation on x86
 - Cirrus codec improvements, more quirks for Realtek codecs
 
 Others:
 - New virtio sound driver
 - FireWire Bebob updates
 
 Note that this PR includes a couple of changes in reset and SPI
 drivers, too, and some merge conflicts might happen.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmCMJaAOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE/T6A//Ti0SAWYnAr5l/7ccuwS4zljHcuHngwvIxRPY
 BokU1ZUlagi+Ro2HLUq13G8T4AlUAQ8r2ecz7EJQHHl9tkrIg7Cc0+fiBPHju1Yu
 0F3Vjc78/JsJHvAR2DPll2rwhsdD3usSQXFo181k38J098X02iNcrzsj3kW5Bpzb
 DBvXzOBIAg/PPfPa4edSYsSurqYeZTkhshedTohlwOCnVbW9NN5b5T9yoXP+t5na
 rvK1Vhu0He8nVMBPDrzjKgE5rjm7Kn0FNXZ6CMDekU9sRVzm/PbgAqqmRnn6bUKa
 GDpcQzlaiDrw8a7/uTVgUZy85F9kMXMMnfYpBy4bBXOt6RWOplXY1yMxy1RXV+op
 3qC9k5R+IsjSWFQZ2z5bIHtGBNCG0698z9fQcvpsWTv+R68rUyfj+jeO/G9zzvpi
 qpQTloBfI28NoP+iGis7wtrlQ15ut47YMCQS8QiOEvLmd5/3xKXRut4Ac/VmvDpS
 q7fLivL8MZ/SMoXY74q/kByMBkXNpryQCAN+xAslaJ5P0aefNYJJdBt/sJlsDd9J
 Ya2VIxHoP+Sb1MG6OLq1Y8c53Di9lwY80pOtF3plcz/ZWgzipirf6BhFj0OttiKP
 a6+VewXA7zZcWEdw+Ik4dWP2dybWL+CuNl7Bwug8SyG9iWqg8Ph7FgoCTWAi93Fx
 KKUJxsc=
 =YT3U
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "No surprises in this development cycle, and most of work is about the
  fixes and the improvements of the existing code, while a new LED
  control layer and a few new drivers have been introduced.

  Here are some highlights:

  Core:
   - A common mute-LED framework was introduced. It is used by HD-audio
     for now, more adaption will follow later. The former "Mic Mute-LED
     Mode" mixer control has been replaced with the corresponding sysfs
     now.
   - User-control management was changed to count consumed bytes instead
     of capping by number of elements; this will allow more controls in
     the normal usage pattern while avoiding the possible memory
     exhaustion DoS

  ASoC:
   - Continued refactoring and cleanups in ASoC core and generic card
     drivers
   - Wide range of small cppcheck and warning fixes
   - New drivers for Freescale i.MX DMA over rpmsg, Mediatek MT6358
     accessory detection, and Realtek RT1019, RT1316, RT711 and RT715

  USB-audio:
   - Continued improvements and fixes of the implicit feedback mode,
     including better support for Pioneer and Roland/BOSS devices

  HD-audio:
   - Default back to non-buffer preallocation on x86
   - Cirrus codec improvements, more quirks for Realtek codecs

  Others:
   - New virtio sound driver
   - FireWire Bebob updates"

* tag 'sound-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (587 commits)
  ALSA: hda/conexant: Re-order CX5066 quirk table entries
  ALSA: hda/realtek: Remove redundant entry for ALC861 Haier/Uniwill devices
  ALSA: hda/realtek: Re-order ALC662 quirk table entries
  ALSA: hda/realtek: Re-order remaining ALC269 quirk table entries
  ALSA: hda/realtek: Re-order ALC269 Lenovo quirk table entries
  ALSA: hda/realtek: Re-order ALC269 Sony quirk table entries
  ALSA: hda/realtek: Re-order ALC269 ASUS quirk table entries
  ALSA: hda/realtek: Re-order ALC269 Dell quirk table entries
  ALSA: hda/realtek: Re-order ALC269 Acer quirk table entries
  ALSA: hda/realtek: Re-order ALC269 HP quirk table entries
  ALSA: hda/realtek: Re-order ALC882 Clevo quirk table entries
  ALSA: hda/realtek: Re-order ALC882 Sony quirk table entries
  ALSA: hda/realtek: Re-order ALC882 Acer quirk table entries
  ALSA: usb-audio: Remove redundant assignment to len
  ALSA: hda/realtek: Add quirk for Intel Clevo PCx0Dx
  ALSA: virtio: fix kernel-doc
  ALSA: hda/cirrus: Use CS8409 filter to fix abnormal sounds on Bullseye
  ALSA: hda/cirrus: Set Initial DMIC volume for Bullseye to -26 dB
  ALSA: sb: Fix two use after free in snd_sb_qsound_build
  ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixer
  ...
2021-04-30 12:48:14 -07:00
Linus Torvalds
95275402f6 drm tegra-next + fixes for 5.13-rc1
tegra:
 - Tegra186 hardware cursor support
 - better capability reporting for different SoC
 - better framebuffer modifier support
 - host1x fixes
 
 ttm:
 - fix unswappable BO handling
 
 efifb:
 - check for PCI before using it
 
 amdgpu:
 - Fixes for Aldebaran
 - Display LTTPR fixes
 - eDP fixes
 - Fixes for Vangogh
 - RAS fixes
 - ASPM support
 - Renoir SMU fixes
 - Modifier fixes
 - Misc code cleanups
 - Freesync fixes
 
 i915:
 - Several fixes to GLK handling in recent display refactoring
 - Rare watchdog timer race fix
 - Cppcheck redundant condition fix
 - Overlay error code propagation fix
 - Documentation fix
 - gvt: Remove one unused function warning
 - gvt: Fix intel_gvt_init_device() return type
 - gvt: Remove one duplicated register accessible check
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJgi1hjAAoJEAx081l5xIa+ydkP/1Etl4S07UiLkWtHNOl5lj33
 RCCfjjeqTc5Md2qXuCknEsju38P493Pqhh1j7Nl02QHE1pIyTEjhKTdv/zt7aC0R
 jVUw4ZvtkqN2U1azS8Wfa+YBqrkerrdGIvaIJa/s9eLN+r+/Vb7Crid5NJg5/RQL
 NLN2Riw95MrvrsGFnZRli8qiLbIK2pj3553KZWJim4o9Fp+1TAesXVLI1+qXoIRk
 WWLuhwLxPLSb+ONCbrcOFSAwxVZLwRM90B+jwnhRYLS/Zo1hcWhn9ua5RBdyC5zU
 rGuOA6dWmI4o4L0X+IaNLD4qOGAWTp1dnJtvKj4rQVfSR82hWcTlCnVcZgxjjydx
 m56TlBLGYvilClM5tmRUdW5QKW4c9DP7dAr3XiKyYOuL2pRkg904ziQs4fVwmOgV
 PfDNm/aA+40UKPwLu0+JM7aQO5OgL8pg1FpOck5qOIx9fAQrulPp9OHDbxXVAzax
 t6YmmBnlXShZSooEkwfoCILkdZuNwmj1sXwUh3QzeQj6owk4tg+LWJIqK9mHt+lQ
 YuyueS+hOn1uSzhtLs9QdQvNssLVd490i0tyU/GGNMw/QVyi9gg4Iq2dVk00L9if
 lFQ7PgPslT+ExIdVutsnuLsy9N2by0AcxFJLdYKELTTicTVhYKzArGeCxni/A35r
 7ZmtSPvbuY7g1//n03Nb
 =4Wh+
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2021-04-30' of git://anongit.freedesktop.org/drm/drm

Pull more drm updates from Dave Airlie:
 "Looks like I missed a tegra feature request for next, but should still
  be fine since it's pretty self contained.

  Apart from that got a set of i915 and amdgpu fixes as per usual along
  with a few misc fixes.

  tegra:
   - Tegra186 hardware cursor support
   - better capability reporting for different SoC
   - better framebuffer modifier support
   - host1x fixes

  ttm:
   - fix unswappable BO handling

  efifb:
   - check for PCI before using it

  amdgpu:
   - Fixes for Aldebaran
   - Display LTTPR fixes
   - eDP fixes
   - Fixes for Vangogh
   - RAS fixes
   - ASPM support
   - Renoir SMU fixes
   - Modifier fixes
   - Misc code cleanups
   - Freesync fixes

  i915:
   - Several fixes to GLK handling in recent display refactoring
   - Rare watchdog timer race fix
   - Cppcheck redundant condition fix
   - Overlay error code propagation fix
   - Documentation fix
   - gvt: Remove one unused function warning
   - gvt: Fix intel_gvt_init_device() return type
   - gvt: Remove one duplicated register accessible check"

* tag 'drm-next-2021-04-30' of git://anongit.freedesktop.org/drm/drm: (111 commits)
  efifb: Check efifb_pci_dev before using it
  drm/i915: Fix docbook descriptions for i915_gem_shrinker
  drm/i915: fix an error code in intel_overlay_do_put_image()
  drm/i915/display/psr: Fix cppcheck warnings
  drm/i915: Disable LTTPR detection on GLK once again
  drm/i915: Restore lost glk ccs w/a
  drm/i915: Restore lost glk FBC 16bpp w/a
  drm/i915: Take request reference before arming the watchdog timer
  drm/ttm: fix error handling if no BO can be swapped out v4
  drm/i915/gvt: Remove duplicated register accessible check
  drm/amdgpu/gmc9: remove dummy read workaround for newer chips
  drm/amdgpu: Add mem sync flag for IB allocated by SA
  drm/amdgpu: Fix SDMA RAS error reporting on Aldebaran
  drm/amdgpu: Reset RAS error count and status regs
  Revert "drm/amdgpu: workaround the TMR MC address issue (v2)"
  drm/amd/display: 3.2.132
  drm/amd/display: [FW Promotion] Release 0.0.62
  drm/amd/display: add helper for enabling mst stream features
  drm/amd/display: Report Proper Quantization Range in AVI Infoframe
  drm/amd/display: Fix call to pass bpp in 16ths of a bit
  ...
2021-04-30 12:44:02 -07:00
Heiner Kallweit
4cf0abbce6 PCI/VPD: Remove pci_vpd_find_tag() 'offset' argument
All callers pass 0 as offset. Therefore remove the parameter and use a
fixed offset 0 in pci_vpd_find_tag().

Link: https://lore.kernel.org/r/f62e6e19-5423-2ead-b2bd-62844b23ef8f@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-04-30 14:38:32 -05:00
Heiner Kallweit
384d0c6820 PCI/VPD: Remove pci_set_vpd_size()
24a1720a08 ("cxgb4: collect serial config version from register") removed
the only usage of pci_set_vpd_size(). If a device needs to override the
auto-detected VPD size, then this can be done with a PCI quirk, as is done
for Chelsio devices. There's no need to allow drivers to change the VPD
size.

Remove pci_set_vpd_size().

[bhelgaas: squash in Arnd's fix for "'pci_vpd_set_size' defined but not
used" from
https://lore.kernel.org/r/20210421140334.3847155-1-arnd@kernel.org]
Link: https://lore.kernel.org/r/47d86e52-9bcf-7da7-1edb-0d988a7a82ab@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
2021-04-30 14:37:11 -05:00
Linus Torvalds
c70a4be130 powerpc updates for 5.13
- Enable KFENCE for 32-bit.
 
  - Implement EBPF for 32-bit.
 
  - Convert 32-bit to do interrupt entry/exit in C.
 
  - Convert 64-bit BookE to do interrupt entry/exit in C.
 
  - Changes to our signal handling code to use user_access_begin/end() more extensively.
 
  - Add support for time namespaces (CONFIG_TIME_NS)
 
  - A series of fixes that allow us to reenable STRICT_KERNEL_RWX.
 
  - Other smaller features, fixes & cleanups.
 
 Thanks to: Alexey Kardashevskiy, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V,
   Athira Rajeev, Bhaskar Chowdhury, Bixuan Cui, Cédric Le Goater, Chen Huang, Chris
   Packham, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Dan Carpenter, Daniel
   Axtens, Daniel Henrique Barboza, David Gibson, Davidlohr Bueso, Denis Efremov,
   dingsenjie, Dmitry Safonov, Dominic DeMarco, Fabiano Rosas, Ganesh Goudar, Geert
   Uytterhoeven, Geetika Moolchandani, Greg Kurz, Guenter Roeck, Haren Myneni, He Ying,
   Jiapeng Chong, Jordan Niethe, Laurent Dufour, Lee Jones, Leonardo Bras, Li Huafei,
   Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Nathan Chancellor, Nathan
   Lynch, Nicholas Piggin, Oliver O'Halloran, Paul Menzel, Pu Lehui, Randy Dunlap, Ravi
   Bangoria, Rosen Penev, Russell Currey, Santosh Sivaraj, Sebastian Andrzej Siewior,
   Segher Boessenkool, Shivaprasad G Bhat, Srikar Dronamraju, Stephen Rothwell, Thadeu Lima
   de Souza Cascardo, Thomas Gleixner, Tony Ambardar, Tyrel Datwyler, Vaibhav Jain,
   Vincenzo Frascino, Xiongwei Song, Yang Li, Yu Kuai, Zhang Yunkai.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmCLV1kTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLUyD/4jrTolG4sVec211hYO+0VuJzoqN4Cf
 j2CA2Ju39butnSMiq4LJUPRB7QRZY1OofkoNFpZeDQspjfZXPz2ulpYAz+SxHWE2
 ReHPmWH1rOABlUPXFboePF4OLwmAs9eR5mN2z9HpKXbT3k78HaToLqiONyB4fVCr
 Q5TkJeRn/Y7ZJLdyPLTpczHHleQ8KoM6kT7ncXnTm6p97JOBJSrGaJ5N/8X5a4+e
 6jtgB7Pvw8jNDShSr8BDLBgBZZcmoTiuG8KfgwRZ+m+mKB1yI2X8S/a54w/lDi9g
 UcSv3jQcFLJuW+T/pYe4R330uWDYa0cwjJOtMmsJ98S4EYOevoe9fZuL97qNshme
 xtBr4q1i03G1icYOJJ8dXtvabG2rUzj8t1SCDpwYfrynzTWVRikiQYTXUBhRSFoK
 nsoklvKd2IZa485XYJ2ljSyClMy8S4yJJ9RuzZ94DTXDSJUesKuyRWGnso4mhkcl
 wvl4wwMTJvnCMKVo6dsJyV24QWfd6dABxzm04uPA94CKhG33UwK8252jXVeaohSb
 WSO7qWBONgDXQLJ0mXRcEYa9NHvFS4Jnp6APbxnHr1gS+K+PNkD4gPBf34FoyN0E
 9s27kvEYk5vr8APUclETF6+FkbGUD5bFbusjt3hYloFpAoHQ/k5pFVDsOZNPA8sW
 fDIRp05KunDojw==
 =dfKL
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Enable KFENCE for 32-bit.

 - Implement EBPF for 32-bit.

 - Convert 32-bit to do interrupt entry/exit in C.

 - Convert 64-bit BookE to do interrupt entry/exit in C.

 - Changes to our signal handling code to use user_access_begin/end()
   more extensively.

 - Add support for time namespaces (CONFIG_TIME_NS)

 - A series of fixes that allow us to reenable STRICT_KERNEL_RWX.

 - Other smaller features, fixes & cleanups.

Thanks to Alexey Kardashevskiy, Andreas Schwab, Andrew Donnellan, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Bixuan Cui, Cédric Le
Goater, Chen Huang, Chris Packham, Christophe Leroy, Christopher M.
Riedl, Colin Ian King, Dan Carpenter, Daniel Axtens, Daniel Henrique
Barboza, David Gibson, Davidlohr Bueso, Denis Efremov, dingsenjie,
Dmitry Safonov, Dominic DeMarco, Fabiano Rosas, Ganesh Goudar, Geert
Uytterhoeven, Geetika Moolchandani, Greg Kurz, Guenter Roeck, Haren
Myneni, He Ying, Jiapeng Chong, Jordan Niethe, Laurent Dufour, Lee
Jones, Leonardo Bras, Li Huafei, Madhavan Srinivasan, Mahesh Salgaonkar,
Masahiro Yamada, Nathan Chancellor, Nathan Lynch, Nicholas Piggin,
Oliver O'Halloran, Paul Menzel, Pu Lehui, Randy Dunlap, Ravi Bangoria,
Rosen Penev, Russell Currey, Santosh Sivaraj, Sebastian Andrzej Siewior,
Segher Boessenkool, Shivaprasad G Bhat, Srikar Dronamraju, Stephen
Rothwell, Thadeu Lima de Souza Cascardo, Thomas Gleixner, Tony Ambardar,
Tyrel Datwyler, Vaibhav Jain, Vincenzo Frascino, Xiongwei Song, Yang Li,
Yu Kuai, and Zhang Yunkai.

* tag 'powerpc-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (302 commits)
  powerpc/signal32: Fix erroneous SIGSEGV on RT signal return
  powerpc: Avoid clang uninitialized warning in __get_user_size_allowed
  powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe
  powerpc/kvm: Fix build error when PPC_MEM_KEYS/PPC_PSERIES=n
  powerpc/kasan: Fix shadow start address with modules
  powerpc/kernel/iommu: Use largepool as a last resort when !largealloc
  powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
  powerpc/44x: fix spelling mistake in Kconfig "varients" -> "variants"
  powerpc/iommu: Annotate nested lock for lockdep
  powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
  powerpc/iommu: Allocate it_map by vmalloc
  selftests/powerpc: remove unneeded semicolon
  powerpc/64s: remove unneeded semicolon
  powerpc/eeh: remove unneeded semicolon
  powerpc/selftests: Add selftest to test concurrent perf/ptrace events
  powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
  powerpc/selftests/perf-hwbreak: Coalesce event creation code
  powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
  powerpc/configs: Add IBMVNIC to some 64-bit configs
  selftests/powerpc: Add uaccess flush test
  ...
2021-04-30 12:22:28 -07:00
Mike Rapoport
198fba4137 mm/mmzone.h: fix existing kernel-doc comments and link them to core-api
There are a couple of kernel-doc comments in include/linux/mmzone.h but
they have minor formatting issues that would cause kernel-doc warnings.

Fix the formatting of those comments, add missing Return: descriptions and
link include/linux/mmzone.h to Documentation/core-api/mm-api.rst

Link: https://lkml.kernel.org/r/20210426141927.1314326-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:43 -07:00
Mel Gorman
0f87d9d30f mm/page_alloc: add an array-based interface to the bulk page allocator
The proposed callers for the bulk allocator store pages from the bulk
allocator in an array.  This patch adds an array-based interface to the
API to avoid multiple list iterations.  The page list interface is
preserved to avoid requiring all users of the bulk API to allocate and
manage enough storage to store the pages.

[akpm@linux-foundation.org: remove now unused local `allocated']

Link: https://lkml.kernel.org/r/20210325114228.27719-4-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:43 -07:00
Mel Gorman
387ba26fb1 mm/page_alloc: add a bulk page allocator
This patch adds a new page allocator interface via alloc_pages_bulk, and
__alloc_pages_bulk_nodemask.  A caller requests a number of pages to be
allocated and added to a list.

The API is not guaranteed to return the requested number of pages and
may fail if the preferred allocation zone has limited free memory, the
cpuset changes during the allocation or page debugging decides to fail
an allocation.  It's up to the caller to request more pages in batch if
necessary.

Note that this implementation is not very efficient and could be
improved but it would require refactoring.  The intent is to make it
available early to determine what semantics are required by different
callers.  Once the full semantics are nailed down, it can be refactored.

[mgorman@techsingularity.net: fix alloc_pages_bulk() return type, per Matthew]
  Link: https://lkml.kernel.org/r/20210325123713.GQ3697@techsingularity.net
[mgorman@techsingularity.net: fix uninit var warning]
  Link: https://lkml.kernel.org/r/20210330114847.GX3697@techsingularity.net
[mgorman@techsingularity.net: fix comment, per Vlastimil]
  Link: https://lkml.kernel.org/r/20210412110255.GV3697@techsingularity.net

Link: https://lkml.kernel.org/r/20210325114228.27719-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Tested-by: Colin Ian King <colin.king@canonical.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:43 -07:00
Matthew Wilcox (Oracle)
d7f946d0fa mm/mempolicy: rename alloc_pages_current to alloc_pages
When CONFIG_NUMA is enabled, alloc_pages() is a wrapper around
alloc_pages_current().  This is pointless, just implement alloc_pages()
directly.

Link: https://lkml.kernel.org/r/20210225150642.2582252-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:43 -07:00
Matthew Wilcox (Oracle)
84172f4bb7 mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemask
There are only two callers of __alloc_pages() so prune the thicket of
alloc_page variants by combining the two functions together.  Current
callers of __alloc_pages() simply add an extra 'NULL' parameter and
current callers of __alloc_pages_nodemask() call __alloc_pages() instead.

Link: https://lkml.kernel.org/r/20210225150642.2582252-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:42 -07:00
Yu Zhao
1587db62d8 include/linux/page-flags-layout.h: cleanups
Tidy things up and delete comments stating the obvious with typos or
making no sense.

Link: https://lkml.kernel.org/r/20210303071609.797782-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:42 -07:00
Yu Zhao
f73c6c8805 include/linux/page-flags-layout.h: correctly determine LAST_CPUPID_WIDTH
The naming convention used in include/linux/page-flags-layout.h:
  *_SHIFT: the number of bits trying to allocate
  *_WIDTH: the number of bits successfully allocated

So when it comes to LAST_CPUPID_WIDTH, we need to check whether all
previous *_WIDTH and LAST_CPUPID_SHIFT can fit into page flags. This
means we need to use NODES_WIDTH, not NODES_SHIFT.

Link: https://lkml.kernel.org/r/20210303071609.797782-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:42 -07:00
Kefeng Wang
1f9d03c5e9 mm: move mem_init_print_info() into mm_init()
mem_init_print_info() is called in mem_init() on each architecture, and
pass NULL argument, so using void argument and move it into mm_init().

Link: https://lkml.kernel.org/r/20210317015210.33641-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>	[x86]
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>	[powerpc]
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>	[sparc64]
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>	[arm]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:42 -07:00
Andrey Konovalov
d57a964e09 kasan, mm: integrate slab init_on_free with HW_TAGS
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for slab memory when init_on_free is enabled.

With this change, memory initialization memset() is no longer called when
both HW_TAGS KASAN and init_on_free are enabled.  Instead, memory is
initialized in KASAN runtime.

For SLUB, the memory initialization memset() is moved into
slab_free_hook() that currently directly follows the initialization loop.
A new argument is added to slab_free_hook() that indicates whether to
initialize the memory or not.

To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN hook and initialization memset() are
put together and a warning comment is added.

Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_free is enabled.

Link: https://lkml.kernel.org/r/190fd15c1886654afdec0d19ebebd5ade665b601.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Andrey Konovalov
da844b7872 kasan, mm: integrate slab init_on_alloc with HW_TAGS
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for slab memory when init_on_alloc is enabled.

With this change, memory initialization memset() is no longer called when
both HW_TAGS KASAN and init_on_alloc are enabled.  Instead, memory is
initialized in KASAN runtime.

The memory initialization memset() is moved into slab_post_alloc_hook()
that currently directly follows the initialization loop.  A new argument
is added to slab_post_alloc_hook() that indicates whether to initialize
the memory or not.

To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN hook and initialization memset() are
put together and a warning comment is added.

Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_alloc is enabled.

Link: https://lkml.kernel.org/r/c1292aeb5d519da221ec74a0684a949b027d7720.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Andrey Konovalov
1bb5eab30d kasan, mm: integrate page_alloc init with HW_TAGS
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for page_alloc memory when init_on_alloc/free is
enabled.

With this change, kernel_init_free_pages() is no longer called when both
HW_TAGS KASAN and init_on_alloc/free are enabled.  Instead, memory is
initialized in KASAN runtime.

To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN and kernel_init_free_pages() hooks
are put together and a warning comment is added.

This patch changes the order in which memory initialization and page
poisoning hooks are called.  This doesn't lead to any side-effects, as
whenever page poisoning is enabled, memory initialization gets disabled.

Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_alloc/free is enabled.

[andreyknvl@google.com: fix for "integrate page_alloc init with HW_TAGS"]
  Link: https://lkml.kernel.org/r/65b6028dea2e9a6e8e2cb779b5115c09457363fc.1617122211.git.andreyknvl@google.com

Link: https://lkml.kernel.org/r/e77f0d5b1b20658ef0b8288625c74c2b3690e725.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Andrey Konovalov
a064cb00d3 kasan: initialize shadow to TAG_INVALID for SW_TAGS
Currently, KASAN_SW_TAGS uses 0xFF as the default tag value for
unallocated memory.  The underlying idea is that since that memory hasn't
been allocated yet, it's only supposed to be dereferenced through a
pointer with the native 0xFF tag.

While this is a good idea in terms on consistency, practically it doesn't
bring any benefit.  Since the 0xFF pointer tag is a match-all tag, it
doesn't matter what tag the accessed memory has.  No accesses through
0xFF-tagged pointers are considered buggy by KASAN.

This patch changes the default tag value for unallocated memory to 0xFE,
which is the tag KASAN uses for inaccessible memory.  This doesn't affect
accesses through 0xFF-tagged pointer to this memory, but this allows KASAN
to detect wild and large out-of-bounds invalid memory accesses through
otherwise-tagged pointers.

This is a prepatory patch for the next one, which changes the tag-based
KASAN modes to not poison the boot memory.

Link: https://lkml.kernel.org/r/c8e93571c18b3528aac5eb33ade213bf133d10ad.1613692950.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Lukas Bulwahn
91ab1a4119 pagewalk: prefix struct kernel-doc descriptions
The script './scripts/kernel-doc -none ./include/linux/pagewalk.h' reports:

  include/linux/pagewalk.h:37: warning: cannot understand function prototype: 'struct mm_walk_ops '
  include/linux/pagewalk.h:85: warning: cannot understand function prototype: 'struct mm_walk '

A kernel-doc description for a structure requires to prefix the struct
name with the keyword 'struct'.  So, do that such that no further
kernel-doc warnings are reported for this file.

Link: https://lkml.kernel.org/r/20210322122542.15072-3-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Matthew Wilcox (Oracle)
da2f5eb3d3 mm/doc: turn fault flags into an enum
The kernel-doc script complains about

  include/linux/mm.h:425: warning: wrong kernel-doc identifier on line:
   * Fault flag definitions.

I don't know how to document a series of #defines, so turn these
definitions into an enum and document that instead.

Link: https://lkml.kernel.org/r/20210322195022.2143603-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Matthew Wilcox (Oracle)
136dfc9949 mm/doc: fix page_maybe_dma_pinned kerneldoc
make htmldocs reports:

  include/linux/mm.h:1341: warning: Excess function parameter 'Return' description in 'page_maybe_dma_pinned'

Fix a few other formatting nits while I'm editing this description.

Link: https://lkml.kernel.org/r/20210322195022.2143603-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Matthew Wilcox (Oracle)
78f4841e34 mm/doc: fix fault_flag_allow_retry_first kerneldoc
make htmldocs reports:

  include/linux/mm.h:496: warning: Function parameter or member 'flags' not described in 'fault_flag_allow_retry_first'

Add a description.

Link: https://lkml.kernel.org/r/20210322195022.2143603-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:41 -07:00
Nicholas Piggin
4ad0ae8c64 mm/vmalloc: remove unmap_kernel_range
This is a shim around vunmap_range, get rid of it.

Move the main API comment from the _noflush variant to the normal
variant, and make _noflush internal to mm/.

[npiggin@gmail.com: fix nommu builds and a comment bug per sfr]
  Link: https://lkml.kernel.org/r/1617292598.m6g0knx24s.astroid@bobo.none
[akpm@linux-foundation.org: move vunmap_range_noflush() stub inside !CONFIG_MMU, not !CONFIG_NUMA]
[npiggin@gmail.com: fix nommu builds]
  Link: https://lkml.kernel.org/r/1617292497.o1uhq5ipxp.astroid@bobo.none

Link: https://lkml.kernel.org/r/20210322021806.892164-5-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Cédric Le Goater <clg@kaod.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin
b67177ecd9 mm/vmalloc: remove map_kernel_range
Patch series "mm/vmalloc: cleanup after hugepage series", v2.

Christoph pointed out some overdue cleanups required after the huge
vmalloc series, and I had another failure error message improvement as
well.

This patch (of 5):

This is a shim around vmap_pages_range, get rid of it.

Move the main API comment from the _noflush variant to the normal variant,
and make _noflush internal to mm/.

Link: https://lkml.kernel.org/r/20210322021806.892164-1-npiggin@gmail.com
Link: https://lkml.kernel.org/r/20210322021806.892164-2-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin
121e6f3258 mm/vmalloc: hugepage vmalloc mappings
Support huge page vmalloc mappings.  Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.

vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
larger, and fall back to small pages if that was unsuccessful.

Architectures must ensure that any arch specific vmalloc allocations that
require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx)
use the VM_NOHUGE flag to inhibit larger mappings.

This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.

[colin.king@canonical.com: fix read of uninitialized pointer area]
  Link: https://lkml.kernel.org/r/20210318155955.18220-1-colin.king@canonical.com

Link: https://lkml.kernel.org/r/20210317062402.533919-14-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin
5e9e3d777b mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c
This is a generic kernel virtual memory mapper, not specific to ioremap.

Code is unchanged other than making vmap_range non-static.

Link: https://lkml.kernel.org/r/20210317062402.533919-12-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin
6f680e70b6 mm/vmalloc: provide fallback arch huge vmap support functions
If an architecture doesn't support a particular page table level as a huge
vmap page size then allow it to skip defining the support query function.

Link: https://lkml.kernel.org/r/20210317062402.533919-11-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin
bbc180a5ad mm: HUGE_VMAP arch support cleanup
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query.  This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Brian Geffon
14d071134c Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"
This reverts commit cd544fd1dc.

As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called.  This meant that
vm_ops->mremap() would never be called on such mappings.  Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL.

1. https://lkml.org/lkml/2020/12/28/2340

Link: https://lkml.kernel.org/r/20210323182520.2712101-2-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:39 -07:00
Christoph Hellwig
1fbaf8fc12 mm: add a io_mapping_map_user helper
Add a helper that calls remap_pfn_range for an struct io_mapping, relying
on the pgprot pre-validation done when creating the mapping instead of
doing it at runtime.

Link: https://lkml.kernel.org/r/20210326055505.1424432-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:39 -07:00
Christoph Hellwig
74ffa5a3e6 mm: add remap_pfn_range_notrack
Patch series "add remap_pfn_range_notrack instead of reinventing it in i915", v2.

i915 has some reason to want to avoid the track_pfn_remap overhead in
remap_pfn_range.  Add a function to the core VM to do just that rather
than reinventing the functionality poorly in the driver.

Note that the remap_io_sg path does get exercises when using Xorg on my
Thinkpad X1, so this should be considered lightly tested, I've not managed
to hit the remap_io_mapping path at all.

This patch (of 4):

Add a version of remap_pfn_range that does not call track_pfn_range.  This
will be used to fix horrible abuses of VM internals in the i915 driver.

Link: https://lkml.kernel.org/r/20210326055505.1424432-1-hch@lst.de
Link: https://lkml.kernel.org/r/20210326055505.1424432-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:39 -07:00
Wan Jiabing
a10e995749 linux/memcontrol.h: remove duplicate struct declaration
struct mem_cgroup is declared twice.  One has been declared at forward
struct declaration.  Remove the duplicate.

Link: https://lkml.kernel.org/r/20210330020246.2265371-1-wanjiabing@vivo.com
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Muchun Song
bd290e1e75 mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.

As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.

Link: https://lkml.kernel.org/r/20210319163821.20704-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Muchun Song
b4e0b68fbd mm: memcontrol: use obj_cgroup APIs to charge kmem pages
Since Roman's series "The new cgroup slab memory controller" applied.
All slab objects are charged via the new APIs of obj_cgroup.  The new
APIs introduce a struct obj_cgroup to charge slab objects.  It prevents
long-living objects from pinning the original memory cgroup in the
memory.  But there are still some corner objects (e.g.  allocations
larger than order-1 page on SLUB) which are not charged via the new
APIs.  Those objects (include the pages which are allocated from buddy
allocator directly) are charged as kmem pages which still hold a
reference to the memory cgroup.

We want to reuse the obj_cgroup APIs to charge the kmem pages.  If we do
that, we should store an object cgroup pointer to page->memcg_data for
the kmem pages.

Finally, page->memcg_data will have 3 different meanings.

  1) For the slab pages, page->memcg_data points to an object cgroups
     vector.

  2) For the kmem pages (exclude the slab pages), page->memcg_data
     points to an object cgroup.

  3) For the user pages (e.g. the LRU pages), page->memcg_data points
     to a memory cgroup.

We do not change the behavior of page_memcg() and page_memcg_rcu().  They
are also suitable for LRU pages and kmem pages.  Why?

Because memory allocations pinning memcgs for a long time - it exists at a
larger scale and is causing recurring problems in the real world: page
cache doesn't get reclaimed for a long time, or is used by the second,
third, fourth, ...  instance of the same job that was restarted into a new
cgroup every time.  Unreclaimable dying cgroups pile up, waste memory, and
make page reclaim very inefficient.

We can convert LRU pages and most other raw memcg pins to the objcg
direction to fix this problem, and then the page->memcg will always point
to an object cgroup pointer.  At that time, LRU pages and kmem pages will
be treated the same.  The implementation of page_memcg() will remove the
kmem page check.

This patch aims to charge the kmem pages by using the new APIs of
obj_cgroup.  Finally, the page->memcg_data of the kmem page points to an
object cgroup.  We can use the __page_objcg() to get the object cgroup
associated with a kmem page.  Or we can use page_memcg() to get the memory
cgroup associated with a kmem page, but caller must ensure that the
returned memcg won't be released (e.g.  acquire the rcu_read_lock or
css_set_lock).

  Link: https://lkml.kernel.org/r/20210401030141.37061-1-songmuchun@bytedance.com

Link: https://lkml.kernel.org/r/20210319163821.20704-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
[songmuchun@bytedance.com: fix forget to obtain the ref to objcg in split_page_memcg]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Shakeel Butt
0add0c77a9 memcg: charge before adding to swapcache on swapin
Currently the kernel adds the page, allocated for swapin, to the
swapcache before charging the page.  This is fine but now we want a
per-memcg swapcache stat which is essential for folks who wants to
transparently migrate from cgroup v1's memsw to cgroup v2's memory and
swap counters.  In addition charging a page before exposing it to other
parts of the kernel is a step in the right direction.

To correctly maintain the per-memcg swapcache stat, this patch has
adopted to charge the page before adding it to swapcache.  One challenge
in this option is the failure case of add_to_swap_cache() on which we
need to undo the mem_cgroup_charge().  Specifically undoing
mem_cgroup_uncharge_swap() is not simple.

To resolve the issue, this patch decouples the charging for swapin pages
from mem_cgroup_charge().  Two new functions are introduced,
mem_cgroup_swapin_charge_page() for just charging the swapin page and
mem_cgroup_swapin_uncharge_swap() for uncharging the swap slot once the
page has been successfully added to the swapcache.

[shakeelb@google.com: set page->private before calling swap_readpage]
  Link: https://lkml.kernel.org/r/20210318015959.2986837-1-shakeelb@google.com

Link: https://lkml.kernel.org/r/20210305212639.775498-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Johannes Weiner
2d146aa3aa mm: memcontrol: switch to rstat
Replace the memory controller's custom hierarchical stats code with the
generic rstat infrastructure provided by the cgroup core.

The current implementation does batched upward propagation from the
write side (i.e.  as stats change).  The per-cpu batches introduce an
error, which is multiplied by the number of subgroups in a tree.  In
systems with many CPUs and sizable cgroup trees, the error can be large
enough to confuse users (e.g.  32 batch pages * 32 CPUs * 32 subgroups
results in an error of up to 128M per stat item).  This can entirely
swallow allocation bursts inside a workload that the user is expecting
to see reflected in the statistics.

In the past, we've done read-side aggregation, where a memory.stat read
would have to walk the entire subtree and add up per-cpu counts.  This
became problematic with lazily-freed cgroups: we could have large
subtrees where most cgroups were entirely idle.  Hence the switch to
change-driven upward propagation.  Unfortunately, it needed to trade
accuracy for speed due to the write side being so hot.

Rstat combines the best of both worlds: from the write side, it cheaply
maintains a queue of cgroups that have pending changes, so that the read
side can do selective tree aggregation.  This way the reported stats
will always be precise and recent as can be, while the aggregation can
skip over potentially large numbers of idle cgroups.

The way rstat works is that it implements a tree for tracking cgroups
with pending local changes, as well as a flush function that walks the
tree upwards.  The controller then drives this by 1) telling rstat when
a local cgroup stat changes (e.g.  mod_memcg_state) and 2) when a flush
is required to get uptodate hierarchy stats for a given subtree (e.g.
when memory.stat is read).  The controller also provides a flush
callback that is called during the rstat flush walk for each cgroup and
aggregates its local per-cpu counters and propagates them upwards.

This adds a second vmstats to struct mem_cgroup (MEMCG_NR_STAT +
NR_VM_EVENT_ITEMS) to track pending subtree deltas during upward
aggregation.  It removes 3 words from the per-cpu data.  It eliminates
memcg_exact_page_state(), since memcg_page_state() is now exact.

[akpm@linux-foundation.org: merge fix]
[hannes@cmpxchg.org: fix a sleep in atomic section problem]
  Link: https://lkml.kernel.org/r/20210315234100.64307-1-hannes@cmpxchg.org

Link: https://lkml.kernel.org/r/20210209163304.77088-7-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:38 -07:00
Johannes Weiner
a18e6e6e15 mm: memcontrol: privatize memcg_page_state query functions
There are no users outside of the memory controller itself. The rest
of the kernel cares either about node or lruvec stats.

Link: https://lkml.kernel.org/r/20210209163304.77088-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:37 -07:00
Johannes Weiner
a3747b53b1 mm: memcontrol: kill mem_cgroup_nodeinfo()
No need to encapsulate a simple struct member access.

Link: https://lkml.kernel.org/r/20210209163304.77088-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:37 -07:00
Yang Shi
4066c11948 mm: gup: remove FOLL_SPLIT
Since commit 5a52c9df62 ("uprobe: use FOLL_SPLIT_PMD instead of
FOLL_SPLIT") and commit ba925fa350 ("s390/gmap: improve THP splitting")
FOLL_SPLIT has not been used anymore.  Remove the dead code.

Link: https://lkml.kernel.org/r/20210330203900.9222-1-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:37 -07:00
Joao Martins
458a4f788f mm/gup: add a range variant of unpin_user_pages_dirty_lock()
Add an unpin_user_page_range_dirty_lock() API which takes a starting page
and how many consecutive pages we want to unpin and optionally dirty.

To that end, define another iterator for_each_compound_range() that
operates in page ranges as opposed to page array.

For users (like RDMA mr_dereg) where each sg represents a contiguous set
of pages, we're able to more efficiently unpin pages without having to
supply an array of pages much of what happens today with
unpin_user_pages().

Link: https://lkml.kernel.org/r/20210212130843.13865-4-joao.m.martins@oracle.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:37 -07:00
Matthew Wilcox (Oracle)
842ca547f7 mm: move page_mapping_file to pagemap.h
page_mapping_file() is only used by some architectures, and then it
is usually only used in one place.  Make it a static inline function
so other architectures don't have to carry this dead code.

Link: https://lkml.kernel.org/r/20210317123011.350118-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:37 -07:00
Johannes Weiner
1c824a680b mm: page-writeback: simplify memcg handling in test_clear_page_writeback()
Page writeback doesn't hold a page reference, which allows truncate to
free a page the second PageWriteback is cleared.  This used to require
special attention in test_clear_page_writeback(), where we had to be
careful not to rely on the unstable page->memcg binding and look up all
the necessary information before clearing the writeback flag.

Since commit 073861ed77 ("mm: fix VM_BUG_ON(PageTail) and
BUG_ON(PageWriteback)") test_clear_page_writeback() is called with an
explicit reference on the page, and this dance is no longer needed.

Use unlock_page_memcg() and dec_lruvec_page_state() directly.

This removes the last user of the lock_page_memcg() return value, change
it to void.  Touch up the comments in there as well.  This also removes
the last extern user of __unlock_page_memcg(), make it static.  Further,
it removes the last user of dec_lruvec_state(), delete it, along with a
few other unused helpers.

Link: https://lkml.kernel.org/r/YCQbYAWg4nvBFL6h@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:37 -07:00
Jens Axboe
63135aa386 mm: provide filemap_range_needs_writeback() helper
Patch series "Improve IOCB_NOWAIT O_DIRECT reads", v3.

An internal workload complained because it was using too much CPU, and
when I took a look, we had a lot of io_uring workers going to town.

For an async buffered read like workload, I am normally expecting _zero_
offloads to a worker thread, but this one had tons of them.  I'd drop
caches and things would look good again, but then a minute later we'd
regress back to using workers.  Turns out that every minute something
was reading parts of the device, which would add page cache for that
inode.  I put patches like these in for our kernel, and the problem was
solved.

Don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
entries for the given range.  This causes unnecessary work from the
callers side, when the IO could have been issued totally fine without
blocking on writeback when there is none.

This patch (of 3):

For O_DIRECT reads/writes, we check if we need to issue a call to
filemap_write_and_wait_range() to issue and/or wait for writeback for any
page in the given range.  The existing mechanism just checks for a page in
the range, which is suboptimal for IOCB_NOWAIT as we'll fallback to the
slow path (and needing retry) if there's just a clean page cache page in
the range.

Provide filemap_range_needs_writeback() which tries a little harder to
check if we actually need to issue and/or wait for writeback in the range.

Link: https://lkml.kernel.org/r/20210224164455.1096727-1-axboe@kernel.dk
Link: https://lkml.kernel.org/r/20210224164455.1096727-2-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:36 -07:00
Sergei Trofimovich
8e9b16c476 mm: page_owner: detect page_owner recursion via task_struct
Before the change page_owner recursion was detected via fetching
backtrace and inspecting it for current instruction pointer.
It has a few problems:

 - it is slightly slow as it requires extra backtrace and a linear stack
   scan of the result

 - it is too late to check if backtrace fetching required memory
   allocation itself (ia64's unwinder requires it).

To simplify recursion tracking let's use page_owner recursion flag in
'struct task_struct'.

The change make page_owner=on work on ia64 by avoiding infinite
recursion in:
  kmalloc()
  -> __set_page_owner()
  -> save_stack()
  -> unwind() [ia64-specific]
  -> build_script()
  -> kmalloc()
  -> __set_page_owner() [we short-circuit here]
  -> save_stack()
  -> unwind() [recursion]

Link: https://lkml.kernel.org/r/20210402115342.1463781-1-slyfox@gentoo.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:36 -07:00