Commit graph

83311 commits

Author SHA1 Message Date
Andrey Konovalov
0b7ccc70ee kasan, vmalloc: drop outdated VM_KASAN comment
The comment about VM_KASAN in include/linux/vmalloc.c is outdated.
VM_KASAN is currently only used to mark vm_areas allocated for kernel
modules when CONFIG_KASAN_VMALLOC is disabled.

Drop the comment.

Link: https://lkml.kernel.org/r/780395afea83a147b3b5acc36cf2e38f7f8479f9.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Andrey Konovalov
63840de296 kasan, x86, arm64, s390: rename functions for modules shadow
Rename kasan_free_shadow to kasan_free_module_shadow and
kasan_module_alloc to kasan_alloc_module_shadow.

These functions are used to allocate/free shadow memory for kernel modules
when KASAN_VMALLOC is not enabled.  The new names better reflect their
purpose.

Also reword the comment next to their declaration to improve clarity.

Link: https://lkml.kernel.org/r/36db32bde765d5d0b856f77d2d806e838513fe84.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:47 -07:00
Andrey Konovalov
b42090ae6f kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook
Currently, the code responsible for initializing and poisoning memory in
post_alloc_hook() is scattered across two locations: kasan_alloc_pages()
hook for HW_TAGS KASAN and post_alloc_hook() itself.  This is confusing.

This and a few following patches combine the code from these two
locations.  Along the way, these patches do a step-by-step restructure the
many performed checks to make them easier to follow.

Replace the only caller of kasan_alloc_pages() with its implementation.

As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is
enabled, moving the code does no functional changes.

Also move init and init_tags variables definitions out of
kasan_has_integrated_init() clause in post_alloc_hook(), as they have the
same values regardless of what the if condition evaluates to.

This patch is not useful by itself but makes the simplifications in the
following patches easier to follow.

Link: https://lkml.kernel.org/r/5ac7e0b30f5cbb177ec363ddd7878a3141289592.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:46 -07:00
Andrey Konovalov
c82ce3195f mm: clarify __GFP_ZEROTAGS comment
__GFP_ZEROTAGS is intended as an optimization: if memory is zeroed during
allocation, it's possible to set memory tags at the same time with little
performance impact.

Clarify this intention of __GFP_ZEROTAGS in the comment.

Link: https://lkml.kernel.org/r/cdffde013973c5634a447513e10ec0d21e8eee29.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:46 -07:00
Andrey Konovalov
7c13c163e0 kasan, page_alloc: merge kasan_free_pages into free_pages_prepare
Currently, the code responsible for initializing and poisoning memory in
free_pages_prepare() is scattered across two locations: kasan_free_pages()
for HW_TAGS KASAN and free_pages_prepare() itself.  This is confusing.

This and a few following patches combine the code from these two
locations.  Along the way, these patches also simplify the performed
checks to make them easier to follow.

Replaces the only caller of kasan_free_pages() with its implementation.

As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is
enabled, moving the code does no functional changes.

This patch is not useful by itself but makes the simplifications in the
following patches easier to follow.

Link: https://lkml.kernel.org/r/303498d15840bb71905852955c6e2390ecc87139.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:46 -07:00
Hugh Dickins
bb43b14b57 mm: delete __ClearPageWaiters()
The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and
vmscan.c's free_unref_page_list() callers rely on that not to generate
bad_page() alerts.  So __page_cache_release(), put_pages_list() and
release_pages() (and presumably copy-and-pasted free_zone_device_page())
are redundant and misleading to make a special point of clearing it (as
the "__" implies, it could only safely be used on the freeing path).

Delete __ClearPageWaiters().  Remark on this in one of the "possible"
comments in folio_wake_bit(), and delete the superfluous comments.

Link: https://lkml.kernel.org/r/3eafa969-5b1a-accf-88fe-318784c791a@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:45 -07:00
Linus Torvalds
85c7000fda The highlights are:
- several changes to how snap context and snap realms are tracked
   (Xiubo Li).  In particular, this should resolve a long-standing
   issue of high kworker CPU usage and various stalls caused by
   needless iteration over all inodes in the snap realm.
 
 - async create fixes to address hangs in some edge cases (Jeff Layton)
 
 - support for getvxattr MDS op for querying server-side xattrs, such
   as file/directory layouts and ephemeral pins (Milind Changire)
 
 - average latency is now maintained for all metrics (Venky Shankar)
 
 - some tweaks around handling inline data to make it fit better with
   netfs helper library (David Howells)
 
 Also a couple of memory leaks got plugged along with a few assorted
 fixups.  Last but not least, Xiubo has stepped up to serve as a CephFS
 co-maintainer.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmI8qE4THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi9UvB/kBZt4mkyjqRC+KJ5rukw5q7lyAYGC1
 QYVbuTuIJydyDvqQp9pXYFndlj10pb7ULnUlQlcBfntVAr9s7xx7ZKrKciE48MPT
 vLiJmq3MpEedM4oE4FgcJbmHtltDgZWvOxXB7renpHNeHuPeezNpKzaKQXGUHUDo
 +7cX5XWBzZk+AYbEvxQUsjDozcgDp31qf015mAX3r0P7XFkBB7xwZA7sb7Cw1GEr
 S6ZdlNoFcWUq0ULUdh2C7l5a2mKQnVnpOO3TMjE6tSqJ74iozRy4tO9aFgj99NEn
 D1rQbCLr3JPfY//JFyqEOIDYf3hepOMmEkoGHOFukckFKUe3yfJJxa3r
 =ESr5
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The highlights are:

   - several changes to how snap context and snap realms are tracked
     (Xiubo Li). In particular, this should resolve a long-standing
     issue of high kworker CPU usage and various stalls caused by
     needless iteration over all inodes in the snap realm.

   - async create fixes to address hangs in some edge cases (Jeff
     Layton)

   - support for getvxattr MDS op for querying server-side xattrs, such
     as file/directory layouts and ephemeral pins (Milind Changire)

   - average latency is now maintained for all metrics (Venky Shankar)

   - some tweaks around handling inline data to make it fit better with
     netfs helper library (David Howells)

  Also a couple of memory leaks got plugged along with a few assorted
  fixups. Last but not least, Xiubo has stepped up to serve as a CephFS
  co-maintainer"

* tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client: (27 commits)
  ceph: fix memory leak in ceph_readdir when note_last_dentry returns error
  ceph: uninitialized variable in debug output
  ceph: use tracked average r/w/m latencies to display metrics in debugfs
  ceph: include average/stdev r/w/m latency in mds metrics
  ceph: track average r/w/m latency
  ceph: use ktime_to_timespec64() rather than jiffies_to_timespec64()
  ceph: assign the ci only when the inode isn't NULL
  ceph: fix inode reference leakage in ceph_get_snapdir()
  ceph: misc fix for code style and logs
  ceph: allocate capsnap memory outside of ceph_queue_cap_snap()
  ceph: do not release the global snaprealm until unmounting
  ceph: remove incorrect and unused CEPH_INO_DOTDOT macro
  MAINTAINERS: add Xiubo Li as cephfs co-maintainer
  ceph: eliminate the recursion when rebuilding the snap context
  ceph: do not update snapshot context when there is no new snapshot
  ceph: zero the dir_entries memory when allocating it
  ceph: move to a dedicated slabcache for ceph_cap_snap
  ceph: add getvxattr op
  libceph: drop else branches in prepare_read_data{,_cont}
  ceph: fix comments mentioning i_mutex
  ...
2022-03-24 18:32:48 -07:00
Linus Torvalds
b14ffae378 drm for 5.18-rc1
dma-buf:
 - rename dma-buf-map to iosys-map
 
 core:
 - move buddy allocator to core
 - add pci/platform init macros
 - improve EDID parser deep color handling
 - EDID timing type 7 support
 - add GPD Win Max quirk
 - add yes/no helpers to string_helpers
 - flatten syncobj chains
 - add nomodeset support to lots of drivers
 - improve fb-helper clipping support
 - add default property value interface
 
 fbdev:
 - improve fbdev ops speed
 
 ttm:
 - add a backpointer from ttm bo->ttm resource
 
 dp:
 - move displayport headers
 - add a dp helper module
 
 bridge:
 - anx7625 atomic support, HDCP support
 
 panel:
 - split out panel-lvds and lvds bindings
 - find panels in OF subnodes
 
 privacy:
 - add chromeos privacy screen support
 
 fb:
 - hot unplug fw fb on forced removal
 
 simpledrm:
 - request region instead of marking ioresource busy
 - add panel oreintation property
 
 udmabuf:
 - fix oops with 0 pages
 
 amdgpu:
 - power management code cleanup
 - Enable freesync video mode by default
 - RAS code cleanup
 - Improve VRAM access for debug using SDMA
 - SR-IOV rework special register access and fixes
 - profiling power state request ioctl
 - expose IP discovery via sysfs
 - Cyan skillfish updates
 - GC 10.3.7, SDMA 5.2.7, DCN 3.1.6 updates
 - expose benchmark tests via debugfs
 - add module param to disable XGMI for testing
 - GPU reset debugfs register dumping support
 
 amdkfd:
 - CRIU support
 - SDMA queue fixes
 
 radeon:
 - UVD suspend fix
 - iMac backlight fix
 
 i915:
 - minimal parallel submission for execlists
 - DG2-G12 subplatform added
 - DG2 programming workarounds
 - DG2 accelerated migration support
 - flat CCS and CCS engine support for XeHP
 - initial small BAR support
 - drop fake LMEM support
 - ADL-N PCH support
 - bigjoiner updates
 - introduce VMA resources and async unbinding
 - register definitions cleanups
 - multi-FBC refactoring
 - DG1 OPROM over SPI support
 - ADL-N platform enabling
 - opregion mailbox #5 support
 - DP MST ESI improvements
 - drm device based logging
 - async flip optimisation for DG2
 - CPU arch abstraction fixes
 - improve GuC ADS init to work on aarch64
 - tweak TTM LRU priority hint
 - GuC 69.0.3 support
 - remove short term execbuf pins
 
 nouveau:
 - higher DP/eDP bitrates
 - backlight fixes
 
 msm:
 - dpu + dp support for sc8180x
 - dp support for sm8350
 - dpu + dsi support for qcm2290
 - 10nm dsi phy tuning support
 - bridge support for dp encoder
 - gpu support for additional 7c3 SKUs
 
 ingenic:
 - HDMI support for JZ4780
 - aux channel EDID support
 
 ast:
 - AST2600 support
 - add wide screen support
 - create DP/DVI connectors
 
 omapdrm:
 - fix implicit dma_buf fencing
 
 vc4:
 - add CSC + full range support
 - better display firmware handoff
 
 panfrost:
 - add initial dual-core GPU support
 
 stm:
 - new revision support
 - fb handover support
 
 mediatek:
 - transfer display binding document to yaml format.
 - add mt8195 display device binding.
 - allow commands to be sent during video mode.
 - add wait_for_event for crtc disable by cmdq.
 
 tegra:
 - YUV format support
 
 rcar-du:
 - LVDS support for M3-W+ (R8A77961)
 
 exynos:
 - BGR pixel format for FIMD device
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmI71h4ACgkQDHTzWXnE
 hr6wKg//SvKFiEOhptua8Ao8XYkhXpg1/tgdAs4D7bZ0YgJyF4Im0RuFOKMmF3mN
 0Y8AwguqrsmrOAFbK8B1WEysB66DmGlZN/V2Q75X7fui8xs4uGF2Fcxyr+265zhf
 vONPwAoxYr+KXqwOI1p1BP2QEL6bJTdu+nrXRsXIBIrWnw8ehXJlw3fDhgvG5QBn
 RPdbU7lQnd47hdYxkbe5SiZvWnPC46dJmpqsRJir0xjskR6juU36f34C4IKhTGwO
 NDPeWVgusVXtIC/F4X6RebCWG0f66h+CUFa9zeYIleI/2/5yZWXfcw6Obx8HgPkt
 gieiI0R4TpkVxeHCApCQ5UpxWgfSOXdoDoyw172bKQw7JCHVEkSwenyMEEwNet6r
 SCJrRmlB1PBI/iTWmhm9qgrU46ZZyAnQoTlCsXGzJncdP3hzGlA1embl00yfEl7f
 wzM35N20qd5T4VKUEF8QYF0fLZYmKw4cWVASu4hQ3qmGal6frilphz2J8JK8hQNq
 KhFqNbVTnZsQNr9LBCbrf0kOPaMzpmW+2vQG9ApdAb1N3gNPZT7ctti0Xq5N2OUR
 AipWFAsDPS2NPADKmBtDU55PgFH9MqUIsoHHXLV4Qi76dvCqYoN68qRQxrL7rpSu
 b0gr0YKU2QcIB/uytjOPHcgtI5Xvrh+q8JPz/dJ38/Esgjmk4wo=
 =uRsT
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-03-24' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Lots of work all over, Intel improving DG2 support, amdkfd CRIU
  support, msm new hw support, and faster fbdev support.

  dma-buf:
   - rename dma-buf-map to iosys-map

  core:
   - move buddy allocator to core
   - add pci/platform init macros
   - improve EDID parser deep color handling
   - EDID timing type 7 support
   - add GPD Win Max quirk
   - add yes/no helpers to string_helpers
   - flatten syncobj chains
   - add nomodeset support to lots of drivers
   - improve fb-helper clipping support
   - add default property value interface

  fbdev:
   - improve fbdev ops speed

  ttm:
   - add a backpointer from ttm bo->ttm resource

  dp:
   - move displayport headers
   - add a dp helper module

  bridge:
   - anx7625 atomic support, HDCP support

  panel:
   - split out panel-lvds and lvds bindings
   - find panels in OF subnodes

  privacy:
   - add chromeos privacy screen support

  fb:
   - hot unplug fw fb on forced removal

  simpledrm:
   - request region instead of marking ioresource busy
   - add panel oreintation property

  udmabuf:
   - fix oops with 0 pages

  amdgpu:
   - power management code cleanup
   - Enable freesync video mode by default
   - RAS code cleanup
   - Improve VRAM access for debug using SDMA
   - SR-IOV rework special register access and fixes
   - profiling power state request ioctl
   - expose IP discovery via sysfs
   - Cyan skillfish updates
   - GC 10.3.7, SDMA 5.2.7, DCN 3.1.6 updates
   - expose benchmark tests via debugfs
   - add module param to disable XGMI for testing
   - GPU reset debugfs register dumping support

  amdkfd:
   - CRIU support
   - SDMA queue fixes

  radeon:
   - UVD suspend fix
   - iMac backlight fix

  i915:
   - minimal parallel submission for execlists
   - DG2-G12 subplatform added
   - DG2 programming workarounds
   - DG2 accelerated migration support
   - flat CCS and CCS engine support for XeHP
   - initial small BAR support
   - drop fake LMEM support
   - ADL-N PCH support
   - bigjoiner updates
   - introduce VMA resources and async unbinding
   - register definitions cleanups
   - multi-FBC refactoring
   - DG1 OPROM over SPI support
   - ADL-N platform enabling
   - opregion mailbox #5 support
   - DP MST ESI improvements
   - drm device based logging
   - async flip optimisation for DG2
   - CPU arch abstraction fixes
   - improve GuC ADS init to work on aarch64
   - tweak TTM LRU priority hint
   - GuC 69.0.3 support
   - remove short term execbuf pins

  nouveau:
   - higher DP/eDP bitrates
   - backlight fixes

  msm:
   - dpu + dp support for sc8180x
   - dp support for sm8350
   - dpu + dsi support for qcm2290
   - 10nm dsi phy tuning support
   - bridge support for dp encoder
   - gpu support for additional 7c3 SKUs

  ingenic:
   - HDMI support for JZ4780
   - aux channel EDID support

  ast:
   - AST2600 support
   - add wide screen support
   - create DP/DVI connectors

  omapdrm:
   - fix implicit dma_buf fencing

  vc4:
   - add CSC + full range support
   - better display firmware handoff

  panfrost:
   - add initial dual-core GPU support

  stm:
   - new revision support
   - fb handover support

  mediatek:
   - transfer display binding document to yaml format.
   - add mt8195 display device binding.
   - allow commands to be sent during video mode.
   - add wait_for_event for crtc disable by cmdq.

  tegra:
   - YUV format support

  rcar-du:
   - LVDS support for M3-W+ (R8A77961)

  exynos:
   - BGR pixel format for FIMD device"

* tag 'drm-next-2022-03-24' of git://anongit.freedesktop.org/drm/drm: (1529 commits)
  drm/i915/display: Do not re-enable PSR after it was marked as not reliable
  drm/i915/display: Fix HPD short pulse handling for eDP
  drm/amdgpu: Use drm_mode_copy()
  drm/radeon: Use drm_mode_copy()
  drm/amdgpu: Use ternary operator in `vcn_v1_0_start()`
  drm/amdgpu: Remove pointless on stack mode copies
  drm/amd/pm: fix indenting in __smu_cmn_reg_print_error()
  drm/amdgpu/dc: fix typos in comments
  drm/amdgpu: fix typos in comments
  drm/amd/pm: fix typos in comments
  drm/amdgpu: Add stolen reserved memory for MI25 SRIOV.
  drm/amdgpu: Merge get_reserved_allocation to get_vbios_allocations.
  drm/amdkfd: evict svm bo worker handle error
  drm/amdgpu/vcn: fix vcn ring test failure in igt reload test
  drm/amdgpu: only allow secure submission on rings which support that
  drm/amdgpu: fixed the warnings reported by kernel test robot
  drm/amd/display: 3.2.177
  drm/amd/display: [FW Promotion] Release 0.0.108.0
  drm/amd/display: Add save/restore PANEL_PWRSEQ_REF_DIV2
  drm/amd/display: Wait for hubp read line for Pollock
  ...
2022-03-24 16:19:43 -07:00
Linus Torvalds
52deda9551 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Various misc subsystems, before getting into the post-linux-next
  material.

  41 patches.

  Subsystems affected by this patch series: procfs, misc, core-kernel,
  lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
  taskstats, panic, kcov, resource, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
  kernel/resource: fix kfree() of bootmem memory again
  kcov: properly handle subsequent mmap calls
  kcov: split ioctl handling into locked and unlocked parts
  panic: move panic_print before kmsg dumpers
  panic: add option to dump all CPUs backtraces in panic_print
  docs: sysctl/kernel: add missing bit to panic_print
  taskstats: remove unneeded dead assignment
  kasan: no need to unset panic_on_warn in end_report()
  ubsan: no need to unset panic_on_warn in ubsan_epilogue()
  panic: unset panic_on_warn inside panic()
  docs: kdump: add scp example to write out the dump file
  docs: kdump: update description about sysfs file system support
  arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
  cgroup: use irqsave in cgroup_rstat_flush_locked().
  fat: use pointer to simple type in put_user()
  minix: fix bug when opening a file with O_DIRECT
  ...
2022-03-24 14:14:07 -07:00
Linus Torvalds
169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Linus Torvalds
7403e6d826 VFIO updates for v5.18-rc1
- Introduce new device migration uAPI and implement device specific
    mlx5 vfio-pci variant driver supporting new protocol (Jason Gunthorpe,
    Yishai Hadas, Leon Romanovsky)
 
  - New HiSilicon acc vfio-pci variant driver, also supporting migration
    interface (Shameer Kolothum, Longfang Liu)
 
  - D3hot fixes for vfio-pci-core (Abhishek Sahu)
 
  - Document new vfio-pci variant driver acceptance criteria
    (Alex Williamson)
 
  - Fix UML build unresolved ioport_{un}map() functions
    (Alex Williamson)
 
  - Fix MAINTAINERS due to header movement (Lukas Bulwahn)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmI6HGwbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiyxcP/18Mh4eYJudvqU7ARH/H
 8E2V+5YhkdVG088KZcB/sTEfVKAbROZrJ3zxkZMXU/OU2cYELHG2pgaI8yCMzHJK
 krz+kZ2p+nA/AMKp8V0xB0MCspTpX/3/6zHV2wDals+gTTLH34N0r6swh0wCjoSa
 wN+3ahE+c6KkX41H8X2Dup5YVM4ohg8MbCd3jSIFBrRDj6SMRGr7zytezCdLhnVs
 TwadlReOYSqKsuvcVnHObWbsOj5WCmuld2u9j0kTPknRm6VtxkfNFQTpKk3sbAcO
 SaPwDP0485plwCVZkNJELZVaF+qYIFW5WZLD5wlJNoH/mZE68a5BKbYFKSLt1gs3
 ntYdktcmsBLVQxTNxcZ6/gwEV2/wuY6v7C3cm0jT0AqXgPIdOqrwlzafTwP+Z/KU
 TC9x4EzPPvdsnBCut0XJZg4QUNlJ7Cp+62vxXqhLGPA2cd4tjGO/8B1KOm05B7VQ
 2XiDtlsW7pwx4v6jRPPdvoqUMd5qqjKF9RepTktirUSXv8z6NIjSyzGn3HZLrk6f
 7AHnlltUg56y/c6hmLxe25PrXKpGqO1fFIcuPYpC+IbBHrE4NVqOhi3ieoonO5GZ
 nwe6IT/fLxsLOudUG/dJ3swuoE8o2Glf17rV9e53K8zF9J9LoFJQsqSFbUzR17pD
 NGN+nA8dWFmmLDS4uYiY9WBg
 =Sv96
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Introduce new device migration uAPI and implement device specific
   mlx5 vfio-pci variant driver supporting new protocol (Jason
   Gunthorpe, Yishai Hadas, Leon Romanovsky)

 - New HiSilicon acc vfio-pci variant driver, also supporting migration
   interface (Shameer Kolothum, Longfang Liu)

 - D3hot fixes for vfio-pci-core (Abhishek Sahu)

 - Document new vfio-pci variant driver acceptance criteria
   (Alex Williamson)

 - Fix UML build unresolved ioport_{un}map() functions
   (Alex Williamson)

 - Fix MAINTAINERS due to header movement (Lukas Bulwahn)

* tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio: (31 commits)
  vfio-pci: Provide reviewers and acceptance criteria for variant drivers
  MAINTAINERS: adjust entry for header movement in hisilicon qm driver
  hisi_acc_vfio_pci: Use its own PCI reset_done error handler
  hisi_acc_vfio_pci: Add support for VFIO live migration
  crypto: hisilicon/qm: Set the VF QM state register
  hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver
  hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region
  hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  hisi_acc_qm: Move VF PCI device IDs to common header
  crypto: hisilicon/qm: Move few definitions to common header
  crypto: hisilicon/qm: Move the QM header to include/linux
  vfio/mlx5: Fix to not use 0 as NULL pointer
  PCI/IOV: Fix wrong kernel-doc identifier
  vfio/mlx5: Use its own PCI reset_done error handler
  vfio/pci: Expose vfio_pci_core_aer_err_detected()
  vfio/mlx5: Implement vfio_pci driver for mlx5 devices
  vfio/mlx5: Expose migration commands over mlx5 device
  vfio: Remove migration protocol v1 documentation
  vfio: Extend the device migration protocol with RUNNING_P2P
  vfio: Define device migration protocol v2
  ...
2022-03-24 12:35:59 -07:00
Linus Torvalds
1ebdbeb03e ARM:
- Proper emulation of the OSLock feature of the debug architecture
 
 - Scalibility improvements for the MMU lock when dirty logging is on
 
 - New VMID allocator, which will eventually help with SVA in VMs
 
 - Better support for PMUs in heterogenous systems
 
 - PSCI 1.1 support, enabling support for SYSTEM_RESET2
 
 - Implement CONFIG_DEBUG_LIST at EL2
 
 - Make CONFIG_ARM64_ERRATUM_2077057 default y
 
 - Reduce the overhead of VM exit when no interrupt is pending
 
 - Remove traces of 32bit ARM host support from the documentation
 
 - Updated vgic selftests
 
 - Various cleanups, doc updates and spelling fixes
 
 RISC-V:
 
 - Prevent KVM_COMPAT from being selected
 
 - Optimize __kvm_riscv_switch_to() implementation
 
 - RISC-V SBI v0.3 support
 
 s390:
 
 - memop selftest
 
 - fix SCK locking
 
 - adapter interruptions virtualization for secure guests
 
 - add Claudio Imbrenda as maintainer
 
 - first step to do proper storage key checking
 
 x86:
 
 - Continue switching kvm_x86_ops to static_call(); introduce
   static_call_cond() and __static_call_ret0 when applicable.
 
 - Cleanup unused arguments in several functions
 
 - Synthesize AMD 0x80000021 leaf
 
 - Fixes and optimization for Hyper-V sparse-bank hypercalls
 
 - Implement Hyper-V's enlightened MSR bitmap for nested SVM
 
 - Remove MMU auditing
 
 - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
   page tracking is enabled
 
 - Cleanup the implementation of the guest PGD cache
 
 - Preparation for the implementation of Intel IPI virtualization
 
 - Fix some segment descriptor checks in the emulator
 
 - Allow AMD AVIC support on systems with physical APIC ID above 255
 
 - Better API to disable virtualization quirks
 
 - Fixes and optimizations for the zapping of page tables:
 
   - Zap roots in two passes, avoiding RCU read-side critical sections
     that last too long for very large guests backed by 4 KiB SPTEs.
 
   - Zap invalid and defunct roots asynchronously via concurrency-managed
     work queue.
 
   - Allowing yielding when zapping TDP MMU roots in response to the root's
     last reference being put.
 
   - Batch more TLB flushes with an RCU trick.  Whoever frees the paging
     structure now holds RCU as a proxy for all vCPUs running in the guest,
     i.e. to prolongs the grace period on their behalf.  It then kicks the
     the vCPUs out of guest mode before doing rcu_read_unlock().
 
 Generic:
 
 - Introduce __vcalloc and use it for very large allocations that
   need memcg accounting
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
 wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
 pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
 dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
 RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
 BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
 =keox
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Proper emulation of the OSLock feature of the debug architecture

   - Scalibility improvements for the MMU lock when dirty logging is on

   - New VMID allocator, which will eventually help with SVA in VMs

   - Better support for PMUs in heterogenous systems

   - PSCI 1.1 support, enabling support for SYSTEM_RESET2

   - Implement CONFIG_DEBUG_LIST at EL2

   - Make CONFIG_ARM64_ERRATUM_2077057 default y

   - Reduce the overhead of VM exit when no interrupt is pending

   - Remove traces of 32bit ARM host support from the documentation

   - Updated vgic selftests

   - Various cleanups, doc updates and spelling fixes

  RISC-V:
   - Prevent KVM_COMPAT from being selected

   - Optimize __kvm_riscv_switch_to() implementation

   - RISC-V SBI v0.3 support

  s390:
   - memop selftest

   - fix SCK locking

   - adapter interruptions virtualization for secure guests

   - add Claudio Imbrenda as maintainer

   - first step to do proper storage key checking

  x86:
   - Continue switching kvm_x86_ops to static_call(); introduce
     static_call_cond() and __static_call_ret0 when applicable.

   - Cleanup unused arguments in several functions

   - Synthesize AMD 0x80000021 leaf

   - Fixes and optimization for Hyper-V sparse-bank hypercalls

   - Implement Hyper-V's enlightened MSR bitmap for nested SVM

   - Remove MMU auditing

   - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
     page tracking is enabled

   - Cleanup the implementation of the guest PGD cache

   - Preparation for the implementation of Intel IPI virtualization

   - Fix some segment descriptor checks in the emulator

   - Allow AMD AVIC support on systems with physical APIC ID above 255

   - Better API to disable virtualization quirks

   - Fixes and optimizations for the zapping of page tables:

      - Zap roots in two passes, avoiding RCU read-side critical
        sections that last too long for very large guests backed by 4
        KiB SPTEs.

      - Zap invalid and defunct roots asynchronously via
        concurrency-managed work queue.

      - Allowing yielding when zapping TDP MMU roots in response to the
        root's last reference being put.

      - Batch more TLB flushes with an RCU trick. Whoever frees the
        paging structure now holds RCU as a proxy for all vCPUs running
        in the guest, i.e. to prolongs the grace period on their behalf.
        It then kicks the the vCPUs out of guest mode before doing
        rcu_read_unlock().

  Generic:
   - Introduce __vcalloc and use it for very large allocations that need
     memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  KVM: use kvcalloc for array allocations
  KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
  kvm: x86: Require const tsc for RT
  KVM: x86: synthesize CPUID leaf 0x80000021h if useful
  KVM: x86: add support for CPUID leaf 0x80000021
  KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
  Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
  kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
  KVM: arm64: fix typos in comments
  KVM: arm64: Generalise VM features into a set of flags
  KVM: s390: selftests: Add error memop tests
  KVM: s390: selftests: Add more copy memop tests
  KVM: s390: selftests: Add named stages for memop test
  KVM: s390: selftests: Add macro as abstraction for MEM_OP
  KVM: s390: selftests: Split memop tests
  KVM: s390x: fix SCK locking
  RISC-V: KVM: Implement SBI HSM suspend call
  RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
  RISC-V: Add SBI HSM suspend related defines
  RISC-V: KVM: Implement SBI v0.3 SRST extension
  ...
2022-03-24 11:58:57 -07:00
Linus Torvalds
3ce62cf4dc flexible-array transformations for 5.18-rc1
Hi Linus,
 
 Please, pull the following treewide patch that replaces zero-length arrays with
 flexible-array members. This patch has been baking in linux-next for a
 whole development cycle.
 
 Thanks
 --
 Gustavo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmI6GIUACgkQRwW0y0cG
 2zFLWw/+OB1gZeQD3boKpUMntWnn6wjhUxdrO8CYkpzG+B+8TFECXNjy8HV1CSiw
 GKKRndYELOyYaD5o/F2vtPe10iPHbrdIlMFRPBRoht0/cvSZgzHlfT8EjWQwerYY
 dieztUFKjeSj0MXivdNDnKOTm8o9cz8KmCrWFP+My37Fasn/9+nBX8iNVIvAX4xy
 T+IVmjtDifQUsTs298UGnBvDeuZOiGHhXXU5rq6lIX0Rl554OsWZW94d6jUPj/h7
 t1v6jdojNuyaMKn45/xnPj9VvmDiSu3K67m3fjRdzLPDOhISjr2fw4KEUOKdsebh
 yJ9t5u8IufyPbm9kyI+rZt+T8ZlV2/qt2+mt6QgtDMnWrs+4nU15JY0SHImMSBZQ
 rBEZcQlrIcGJ+CsNB8Y7jIGYO0SSkhodAvfl0LRA0AbTqLGqq0OkAQS5D52r3H2r
 uz6xdYb7kG43XaRyaAIPqhZsp/jk2NrXvEvin2tSaXZFR1cxp+oxcV2UajmnOU6i
 EIBS4PzJnYx2RZRa+h8YbBa/+D4N6+fj/tjmwBawiUBPjjaLAsGFNwUHqvBoD05S
 bk6oXi654NBwVjsknZ0grVz0TtSvdZ3uJL5FZApTOHITqH8vlxlNefmHri4vZRZO
 NN7NIQ0yaUCnorzMg+vP8ZtflhQwrMJbjwIS9YD0RHd7MBhYX8k=
 =xZD2
 -----END PGP SIGNATURE-----

Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull flexible-array transformations from Gustavo Silva:
 "Treewide patch that replaces zero-length arrays with flexible-array
  members.

  This has been baking in linux-next for a whole development cycle"

* tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  treewide: Replace zero-length arrays with flexible-array members
2022-03-24 11:39:32 -07:00
Linus Torvalds
cd4699c5fd prlimit and set/getpriority tasklist_lock optimizations
The tasklist_lock popped up as a scalability bottleneck on some testing
 workloads.  The readlocks in do_prlimit and set/getpriority are not
 necessary in all cases.
 
 Based on a cycles profile, it looked like ~87% of the time was spent in
 the kernel, ~42% of which was just trying to get *some* spinlock
 (queued_spin_lock_slowpath, not necessarily the tasklist_lock).
 
 The big offenders (with rough percentages in cycles of the overall trace):
 
 - do_wait 11%
 - setpriority 8% (this patchset)
 - kill 8%
 - do_exit 5%
 - clone 3%
 - prlimit64 2%   (this patchset)
 - getrlimit 1%   (this patchset)
 
 I can't easily test this patchset on the original workload for various
 reasons.  Instead, I used the microbenchmark below to at least verify
 there was some improvement.  This patchset had a 28% speedup (12% from
 baseline to set/getprio, then another 14% for prlimit).
 
 One interesting thing is that my libc's getrlimit() was calling
 prlimit64, so hoisting the read_lock(tasklist_lock) into sys_prlimit64
 had no effect - it essentially optimized the older syscalls only.  I
 didn't do that in this patchset, but figured I'd mention it since it was
 an option from the previous patch's discussion.
 
 v3: https://lkml.kernel.org/r/20220106172041.522167-1-brho@google.com
 v2: https://lore.kernel.org/lkml/20220105212828.197013-1-brho@google.com/
 - update_rlimit_cpu on the group_leader instead of for_each_thread.
 - update_rlimit_cpu still returns 0 or -ESRCH, even though we don't care
   about the error here.  it felt safer that way in case someone uses
   that function again.
 
 v1: https://lore.kernel.org/lkml/20211213220401.1039578-1-brho@google.com/
 
 int main(int argc, char **argv)
 {
         pid_t child;
         struct rlimit rlim[1];
 
         fork(); fork(); fork(); fork(); fork(); fork();
 
         for (int i = 0; i < 5000; i++) {
                 child = fork();
                 if (child < 0)
                         exit(1);
                 if (child > 0) {
                         usleep(1000);
                         kill(child, SIGTERM);
                         waitpid(child, NULL, 0);
                 } else {
                         for (;;) {
                                 setpriority(PRIO_PROCESS, 0,
                                             getpriority(PRIO_PROCESS, 0));
                                 getrlimit(RLIMIT_CPU, rlim);
                         }
                 }
         }
 
         return 0;
 }
 
 Barret Rhoden (3):
   setpriority: only grab the tasklist_lock for PRIO_PGRP
   prlimit: make do_prlimit() static
   prlimit: do not grab the tasklist_lock
 
  include/linux/posix-timers.h   |   2 +-
  include/linux/resource.h       |   2 -
  kernel/sys.c                   | 127 +++++++++++++++++----------------
  kernel/time/posix-cpu-timers.c |  12 +++-
  4 files changed, 76 insertions(+), 67 deletions(-)
 
 I have dropped the first change in this series as an almost identical
 change was merged as commit 7f8ca0edfe ("kernel/sys.c: only take
 tasklist_lock for get/setpriority(PRIO_PGRP)").
 
 Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmI7eCAACgkQC/v6Eiaj
 j0CN8w/+MEol1+sB/mDKgDgqbNE0sIXHTjQF37KPrsqB51aas9LSX7E7CBzvxF3M
 Y0MSk0VzSt4oGpmrNQOAEueeMeaMucPxI5JejGHEhtdHFBMqYXKpWuhqewIHx1pc
 lUcYpDeUOOBjwLO/VT5hfAKzIEMUl6tEDfzexl9IvpVwd661nVjDe+z12mDplJTi
 tjO8ZiSHkjkLE3cAYaTCajsaqpj7NLuIYB1d4CbbpU3vO5LYoffj/vtQ1e+7UxMB
 jhgaP/ylo0Ab8udYJ0PFIDmmQG/6s7csc3I1wtMgf8mqv88z4xspXNZBwYvf2hxa
 lBpSo+zD8Q88XipC+w63iBUa7YElLaai9xpLInO/Ir42G03/H/8TS9me1OLG+1Cz
 vloOid6CqH7KkNQ842txXeyj3xjW1DGR7U0QOrSxFQuWc6WZ2Q/l8KIZsuXuyt9G
 EwTjtoQvr1R+FNMtT/4g5WZ8sTYooIaHFvFQ745T6FzBp8mCVjINg4SUbVV3Wvck
 JRMxuHSFFBXj8IIJi9Bv6UE/j5APwa209KthvFCQayniNZU3XPKVa/bDWVoBk+SK
 Hch3M//QdAjKYmRf5gmDaBbRyqzaeiFjvX1MSnkbFryBX4/yIoEfo0/QsDRzSrJV
 vSSSU79h/XDI080gILOzNX4HiI4cpNcpOIB63Pmajyr6MxhrMqE=
 =VVGP
 -----END PGP SIGNATURE-----

Merge tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull tasklist_lock optimizations from Eric Biederman:
 "prlimit and getpriority tasklist_lock optimizations

  The tasklist_lock popped up as a scalability bottleneck on some
  testing workloads. The readlocks in do_prlimit and set/getpriority are
  not necessary in all cases.

  Based on a cycles profile, it looked like ~87% of the time was spent
  in the kernel, ~42% of which was just trying to get *some* spinlock
  (queued_spin_lock_slowpath, not necessarily the tasklist_lock).

  The big offenders (with rough percentages in cycles of the overall
  trace):
   - do_wait 11%
   - setpriority 8% (done previously in commit 7f8ca0edfe)
   - kill 8%
   - do_exit 5%
   - clone 3%
   - prlimit64 2%   (this patchset)
   - getrlimit 1%   (this patchset)

  I can't easily test this patchset on the original workload for various
  reasons. Instead, I used the microbenchmark below to at least verify
  there was some improvement. This patchset had a 28% speedup (12% from
  baseline to set/getprio, then another 14% for prlimit).

  This series used to do the setpriority case, but an almost identical
  change was merged as commit 7f8ca0edfe ("kernel/sys.c: only take
  tasklist_lock for get/setpriority(PRIO_PGRP)") so that has been
  dropped from here.

  One interesting thing is that my libc's getrlimit() was calling
  prlimit64, so hoisting the read_lock(tasklist_lock) into sys_prlimit64
  had no effect - it essentially optimized the older syscalls only. I
  didn't do that in this patchset, but figured I'd mention it since it
  was an option from the previous patch's discussion"

micobenchmark.c:
---------------
	int main(int argc, char **argv)
	{
		pid_t child;
		struct rlimit rlim[1];

		fork(); fork(); fork(); fork(); fork(); fork();

		for (int i = 0; i < 5000; i++) {
			child = fork();
			if (child < 0)
				exit(1);
			if (child > 0) {
				usleep(1000);
				kill(child, SIGTERM);
				waitpid(child, NULL, 0);
			} else {
				for (;;) {
					setpriority(PRIO_PROCESS, 0,
						    getpriority(PRIO_PROCESS, 0));
					getrlimit(RLIMIT_CPU, rlim);
				}
			}
		}

		return 0;
	}

Link: https://lore.kernel.org/lkml/20211213220401.1039578-1-brho@google.com/ [v1]
Link: https://lore.kernel.org/lkml/20220105212828.197013-1-brho@google.com/ [v2]
Link: https://lore.kernel.org/lkml/20220106172041.522167-1-brho@google.com/ [v3]

* tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  prlimit: do not grab the tasklist_lock
  prlimit: make do_prlimit() static
2022-03-24 10:16:00 -07:00
Phil Sutter
d645552e9b netfilter: egress: Report interface as outgoing
Otherwise packets in egress chains seem like they are being received by
the interface, not sent out via it.

Fixes: 42df6e1d22 ("netfilter: Introduce egress hook")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-24 15:09:53 +01:00
Jisheng Zhang
f05fa10901 kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2.

Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by
a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.

I only modified x86, arm, arm64 and riscv, other architectures such as
sh, powerpc and s390 are better to be kept kexec code as-is so they are
not touched.

This patch (of 5):

Make the forward declarations of crashk_res, crashk_low_res and
crash_notes always visible.  Code referring to these symbols can then just
check for IS_ENABLED(CONFIG_KEXEC_CORE), instead of requiring conditional
compilation using an #ifdef, thus preparing to increase compile coverage
and simplify the code.

Link: https://lkml.kernel.org/r/20211206160514.2000-1-jszhang@kernel.org
Link: https://lkml.kernel.org/r/20211206160514.2000-2-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Randy Dunlap
abc7da58c4 init.h: improve __setup and early_param documentation
Igor noted in [1] that there are quite a few __setup() handling
functions that return incorrect values.  Doing this can be harmless, but
it can also cause strings to be added to init's argument or environment
list, polluting them.

Since __setup() handling and return values are not documented, first add
documentation for that.  Also add more documentation for early_param()
handling and return values.

For __setup() functions, returning 0 (not handled) has questionable
value if it is just a malformed option value, as in

  rodata=junk

since returning 0 would just cause "rodata=junk" to be added to init's
environment unnecessarily:

  Run /sbin/init as init process
    with arguments:
      /sbin/init
    with environment:
      HOME=/
      TERM=linux
      splash=native
      rodata=junk

Also, there are no recommendations on whether to print a warning when an
unknown parameter value is seen.  I am not addressing that here.

[1] lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru

Link: https://lkml.kernel.org/r/20220221050852.1147-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Andy Shevchenko
25cb5b7ac6 bitfield: add explicit inclusions to the example
It's not obvious that bitfield.h doesn't guarantee the bits.h inclusion
and the example in the former is confusing.  Some developers think that
it's okay to just include bitfield.h to get it working.  Change example
to explicitly include necessary headers in order to avoid confusion.

Link: https://lkml.kernel.org/r/20220207123341.47533-1-andriy.shevchenko@linux.intel.com
Fixes: 3e9b3112ec ("add basic register-field manipulation macros")
Depends-on: 8bd9cb51da ("locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Jan Dąbroś <jsd@semihalf.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Christophe Leroy
f334f5668b ilog2: force inlining of __ilog2_u32() and __ilog2_u64()
Building a kernel with CONFIG_CC_OPTIMISE_FOR_SIZE leads to
__ilog2_u32() being duplicated 50 times and __ilog2_u64() 3 times in
vmlinux on a tiny powerpc32 config.

__ilog2_u32() being 2 instructions it is not worth being kept out of
line, so force inlining.  Allthough the u64 version is a bit bigger,
there is still a small benefit in keeping it inlined.  On a 64 bits
config there's a real benefit.

With this change the size of vmlinux text is reduced by 1 kbytes, which
is approx 50% more than the size of the removed functions.

Before the patch there is for instance:

	c00d2a94 <__ilog2_u32>:
	c00d2a94:	7c 63 00 34 	cntlzw  r3,r3
	c00d2a98:	20 63 00 1f 	subfic  r3,r3,31
	c00d2a9c:	4e 80 00 20 	blr

	c00d36d8 <__order_base_2>:
	c00d36d8:	28 03 00 01 	cmplwi  r3,1
	c00d36dc:	40 81 00 2c 	ble     c00d3708 <__order_base_2+0x30>
	c00d36e0:	94 21 ff f0 	stwu    r1,-16(r1)
	c00d36e4:	7c 08 02 a6 	mflr    r0
	c00d36e8:	38 63 ff ff 	addi    r3,r3,-1
	c00d36ec:	90 01 00 14 	stw     r0,20(r1)
	c00d36f0:	4b ff f3 a5 	bl      c00d2a94 <__ilog2_u32>
	c00d36f4:	80 01 00 14 	lwz     r0,20(r1)
	c00d36f8:	38 63 00 01 	addi    r3,r3,1
	c00d36fc:	7c 08 03 a6 	mtlr    r0
	c00d3700:	38 21 00 10 	addi    r1,r1,16
	c00d3704:	4e 80 00 20 	blr
	c00d3708:	38 60 00 00 	li      r3,0
	c00d370c:	4e 80 00 20 	blr

With the patch it has become:

	c00d356c <__order_base_2>:
	c00d356c:	28 03 00 01 	cmplwi  r3,1
	c00d3570:	40 81 00 14 	ble     c00d3584 <__order_base_2+0x18>
	c00d3574:	38 63 ff ff 	addi    r3,r3,-1
	c00d3578:	7c 63 00 34 	cntlzw  r3,r3
	c00d357c:	20 63 00 20 	subfic  r3,r3,32
	c00d3580:	4e 80 00 20 	blr
	c00d3584:	38 60 00 00 	li      r3,0
	c00d3588:	4e 80 00 20 	blr

No more need for __order_base_2() to setup a stack frame and
save/restore caller address. And the following 'add 1' is
merged in the subtract.

Another typical use of it:

	c080ff28 <hugepagesz_setup>:
	...
	c080fff8:	7f c3 f3 78 	mr      r3,r30
	c080fffc:	4b 8f 81 f1 	bl      c01081ec <__ilog2_u32>
	c0810000:	38 63 ff f2 	addi    r3,r3,-14
	...

Becomes

	c080ff1c <hugepagesz_setup>:
	...
	c080ffec:	7f c3 00 34 	cntlzw  r3,r30
	c080fff0:	20 63 00 11 	subfic  r3,r3,17
	...

Here no need to move r30 argument to r3 then substract 14 to result.  Just
work on r30 and merge the 'sub 14' with the 'sub from 31'.

Link: https://lkml.kernel.org/r/803a2ac3d923ebcfd0dd40f5886b05cae7bb0aba.1644243860.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Rasmus Villemoes
14e83077d5 include: drop pointless __compiler_offsetof indirection
(1) compiler_types.h is unconditionally included via an -include flag
    (see scripts/Makefile.lib), and it defines __compiler_offsetof
    unconditionally.  So testing for definedness of __compiler_offsetof is
    mostly pointless.

(2) Every relevant compiler provides __builtin_offsetof (even sparse
    has had that for 14 years), and if for whatever reason one would end
    up picking up the poor man's fallback definition (C file compiler with
    completely custom CFLAGS?), newer clang versions won't treat the
    result as an Integer Constant Expression, so if used in place where
    such is required (static initializer or static_assert), one would get
    errors like

      t.c:11:16: error: static_assert expression is not an integral constant expression
      t.c:11:16: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression
      t.c:4:33: note: expanded from macro 'offsetof'
      #define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)

So just define offsetof unconditionally and directly in terms of
__builtin_offsetof.

Link: https://lkml.kernel.org/r/20220202102147.326672-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Bjorn Helgaas
179fd6ba3b Documentation/sparse: add hints about __CHECKER__
Several attributes depend on __CHECKER__, but previously there was no
clue in the tree about when __CHECKER__ might be defined.  Add hints at
the most common places (__kernel, __user, __iomem, __bitwise) and in the
sparse documentation.

Link: https://lkml.kernel.org/r/20220310220927.245704-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Linus Torvalds
b4bc93bd76 ARM driver updates for 5.18
There are a few separately maintained driver subsystems that we merge through
 the SoC tree, notable changes are:
 
  - Memory controller updates, mainly for Tegra and Mediatek SoCs,
    and clarifications for the memory controller DT bindings
 
  - SCMI firmware interface updates, in particular a new transport based
    on OPTEE and support for atomic operations.
 
  - Cleanups to the TEE subsystem, refactoring its memory management
 
 For SoC specific drivers without a separate subsystem, changes include
 
  - Smaller updates and fixes for TI, AT91/SAMA5, Qualcomm and NXP
    Layerscape SoCs.
 
  - Driver support for Microchip SAMA5D29, Tesla FSD, Renesas RZ/G2L,
    and Qualcomm SM8450.
 
  - Better power management on Mediatek MT81xx, NXP i.MX8MQ
    and older NVIDIA Tegra chips
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI4nOUACgkQmmx57+YA
 GNlNNhAApPQw+FKQ6yVj2EZYcaAgik8PJAJoNQWYED52iQfm5uXgjt3aQewvrPNW
 nkKx5Mx+fPUfaKx5mkVOFMhME5Bw9tYbXHm2/RpRp+n8jOdUlQpAhzIPOyWPHOJS
 QX6qu4t+agrQzjbOCGouAJXgyxhTJFUMviM2EgVHbQHXPtdF8i2kyanfCP7Rw8cx
 sVtLwpvhbLm849+deYRXuv2Xw9I3M1Np7018s5QciimI2eLLEb+lJ/C5XWz5pMYn
 M1nZ7uwCLKPCewpMETTuhKOv0ioOXyY9C1ghyiGZFhHQfoCYTu94Hrx9t8x5gQmL
 qWDinXWXVk8LBegyrs8Bp4wcjtmvMMLnfWtsGSfT5uq24JOGg22OmtUNhNJbS9+p
 VjEvBgkXYD7UEl5npI9v9/KQWr3/UDir0zvkuV40gJyeBWNEZ/PB8olXAxgL7wZv
 cXRYSaUYYt3DKQf1k5I4GUyQtkP/4RaBy6AqvH5Sx0lCwuY6G6ISK+kCPaaSRKnX
 WR+nFw84dKCu7miehmW9qSzMQ4kiSCKIDqk7ilHcwv0J2oXDrlqVPKGGGTzZjUc8
 +feqM/eSoYvDDEDemuXNSnl3hc1Zlvm7Apd5AN6kdTaNgoACDYdyvGuJ3CvzcA+K
 1gBHUBvGS/ODA25KnYabr7wCMgxYqf7dXfkyKIBwFHwxOnRHtgs=
 =Cfbk
 -----END PGP SIGNATURE-----

Merge tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM driver updates from Arnd Bergmann:
 "There are a few separately maintained driver subsystems that we merge
  through the SoC tree, notable changes are:

   - Memory controller updates, mainly for Tegra and Mediatek SoCs, and
     clarifications for the memory controller DT bindings

   - SCMI firmware interface updates, in particular a new transport
     based on OPTEE and support for atomic operations.

   - Cleanups to the TEE subsystem, refactoring its memory management

  For SoC specific drivers without a separate subsystem, changes include

   - Smaller updates and fixes for TI, AT91/SAMA5, Qualcomm and NXP
     Layerscape SoCs.

   - Driver support for Microchip SAMA5D29, Tesla FSD, Renesas RZ/G2L,
     and Qualcomm SM8450.

   - Better power management on Mediatek MT81xx, NXP i.MX8MQ and older
     NVIDIA Tegra chips"

* tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (154 commits)
  ARM: spear: fix typos in comments
  soc/microchip: fix invalid free in mpfs_sys_controller_delete
  soc: s4: Add support for power domains controller
  dt-bindings: power: add Amlogic s4 power domains bindings
  ARM: at91: add support in soc driver for new SAMA5D29
  soc: mediatek: mmsys: add sw0_rst_offset in mmsys driver data
  dt-bindings: memory: renesas,rpc-if: Document RZ/V2L SoC
  memory: emif: check the pointer temp in get_device_details()
  memory: emif: Add check for setup_interrupts
  dt-bindings: arm: mediatek: mmsys: add support for MT8186
  dt-bindings: mediatek: add compatible for MT8186 pwrap
  soc: mediatek: pwrap: add pwrap driver for MT8186 SoC
  soc: mediatek: mmsys: add mmsys reset control for MT8186
  soc: mediatek: mtk-infracfg: Disable ACP on MT8192
  soc: ti: k3-socinfo: Add AM62x JTAG ID
  soc: mediatek: add MTK mutex support for MT8186
  soc: mediatek: mmsys: add mt8186 mmsys routing table
  soc: mediatek: pm-domains: Add support for mt8186
  dt-bindings: power: Add MT8186 power domains
  soc: mediatek: pm-domains: Add support for mt8195
  ...
2022-03-23 18:23:13 -07:00
Linus Torvalds
baaa68a979 ARM: SoC updates for 5.18
SoC specific code is generally used for older platforms that don't (yet)
 use device tree to do the same things.
 
  - Support is added for i.MXRT10xx, a Cortex-M7 based microcontroller
    from NXP. At the moment this is still incomplete as other portions
    are merged through different trees.
 
  - Long abandoned support for running NOMMU ARMv4 or ARMv5 platforms
    gets removed, now the Arm NOMMU platforms are limited to the
    Cortex-M family of microcontrollers
 
  - Two old PXA boards get removed, along with corresponding driver
    bits.
 
  - Continued cleanup of the Intel IXP4xx platforms, removing some
    remnants of the old board files.
 
  - Minor Cleanups and fixes for Orion, PXA, MMP, Mstar, Samsung
 
  - CPU idle support for AT91
 
  - A system controller driver for Polarfire
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI7I0sACgkQmmx57+YA
 GNkfsQ/+KHy6byGCcPiB3T+be2/WFnc7ANnniYku4o27703BpROLCltNAr4VTiyM
 Ucin72wmuPx840RiP0o8st7D9Ms7fG3/j4hoxJDG6v1aHr8CazCSPZR2EgVAOVeD
 n4jGuLzICqP3RLw/qdfTT4lARKGqKBW1l5ss0D4PxFECyKq6kzqEOt9wCw29vAJy
 Vw8CmcDhGr9sI8voZYN1dMyIV4FujkmOm/mNSHNTKKN0vt+GFU0gVxDAG2i7Rh1g
 cO7593Vg/U4daw97231uoW0q+9vZ6OKajZt1Mm6LFe4AsGRpV+eN5UpQeZzkm7ET
 D6GFE8/NTkcJHm50OYYER7t69uHe1O/Sf5+MIax1l5pthuWRZGolb1xOBeWJ9Al7
 Qgym9XNCGf0AoaUeXIuxVbhxNp8GXqBzL35qMK1hV4WkdrJSRGq+2GQLBgtb6owi
 ZIpDYAFnUNFkYFdtX5qez8zXy4LHtUf5bO+qnLXPT2Sk0MtYWx9Gn0P4kgMqezkn
 HQg1inPRQS7PB40xE+7Ap3pzvE/1IWgYblsS8CFekJ4+Nm0X4IRx6/s9KEDHU1ZQ
 RADI6jwwVe/ioOSNen7S60GNrFKDyt9ZbLq/+x/GE3SkmdTeAmcd+RPmQvc5SHnl
 jvUnjN1nsyqhOICIGMwvdkFkW749/af713xoiXyCUedZKIxAgkc=
 =2fmA
 -----END PGP SIGNATURE-----

Merge tag 'arm-soc-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC updates from Arnd Bergmann:
 "SoC specific code is generally used for older platforms that don't
  (yet) use device tree to do the same things.

   - Support is added for i.MXRT10xx, a Cortex-M7 based microcontroller
     from NXP. At the moment this is still incomplete as other portions
     are merged through different trees.

   - Long abandoned support for running NOMMU ARMv4 or ARMv5 platforms
     gets removed, now the Arm NOMMU platforms are limited to the
     Cortex-M family of microcontrollers

   - Two old PXA boards get removed, along with corresponding driver
     bits.

   - Continued cleanup of the Intel IXP4xx platforms, removing some
     remnants of the old board files.

   - Minor Cleanups and fixes for Orion, PXA, MMP, Mstar, Samsung

   - CPU idle support for AT91

   - A system controller driver for Polarfire"

* tag 'arm-soc-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
  ARM: remove support for NOMMU ARMv4/v5
  ARM: PXA: fix up decompressor code
  soc: microchip: make mpfs_sys_controller_put static
  ARM: pxa: remove Intel Imote2 and Stargate 2 boards
  ARM: mmp: Fix failure to remove sram device
  ARM: mstar: Select ARM_ERRATA_814220
  soc: add microchip polarfire soc system controller
  ARM: at91: Kconfig: select PM_OPP
  ARM: at91: PM: add cpu idle support for sama7g5
  ARM: at91: ddr: fix typo to align with datasheet naming
  ARM: at91: ddr: align macro definitions
  ARM: at91: ddr: remove CONFIG_SOC_SAMA7 dependency
  ARM: ixp4xx: Convert to SPARSE_IRQ and P2V
  ARM: ixp4xx: Drop all common code
  ARM: ixp4xx: Drop custom DMA coherency and bouncing
  ARM: ixp4xx: Remove feature bit accessors
  net: ixp4xx_hss: Check features using syscon
  net: ixp4xx_eth: Drop platform data support
  soc: ixp4xx-npe: Access syscon regs using regmap
  soc: ixp4xx: Add features from regmap helper
  ...
2022-03-23 18:20:09 -07:00
Linus Torvalds
194dfe88d6 asm-generic updates for 5.18
There are three sets of updates for 5.18 in the asm-generic tree:
 
  - The set_fs()/get_fs() infrastructure gets removed for good. This
    was already gone from all major architectures, but now we can
    finally remove it everywhere, which loses some particularly
    tricky and error-prone code.
    There is a small merge conflict against a parisc cleanup, the
    solution is to use their new version.
 
  - The nds32 architecture ends its tenure in the Linux kernel. The
    hardware is still used and the code is in reasonable shape, but
    the mainline port is not actively maintained any more, as all
    remaining users are thought to run vendor kernels that would never
    be updated to a future release.
    There are some obvious conflicts against changes to the removed
    files.
 
  - A series from Masahiro Yamada cleans up some of the uapi header
    files to pass the compile-time checks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
 GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
 IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
 cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
 sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
 DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
 G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
 a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
 ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
 QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
 CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
 h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
 =vtCN
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "There are three sets of updates for 5.18 in the asm-generic tree:

   - The set_fs()/get_fs() infrastructure gets removed for good.

     This was already gone from all major architectures, but now we can
     finally remove it everywhere, which loses some particularly tricky
     and error-prone code. There is a small merge conflict against a
     parisc cleanup, the solution is to use their new version.

   - The nds32 architecture ends its tenure in the Linux kernel.

     The hardware is still used and the code is in reasonable shape, but
     the mainline port is not actively maintained any more, as all
     remaining users are thought to run vendor kernels that would never
     be updated to a future release.

   - A series from Masahiro Yamada cleans up some of the uapi header
     files to pass the compile-time checks"

* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
  nds32: Remove the architecture
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  sparc64: fix building assembly files
  ...
2022-03-23 18:03:08 -07:00
Linus Torvalds
c7d4b15372 ata changes for 5.18-rc1
For this cycle, no big change but many small fixes and code cleanup to
 libata, the ahci driver and various pata drivers. In more details:
 
 * Code simplification in pata_platform using platform_get_mem_or_io(),
   from Lad.
 
 * Fix read-only arrays declarations as const in pata_atiixp and
   pata_pdc202xx_old, from Colin.
 
 * Various cleanups and code simplification in libata-scsi, from me.
 
 * Remove dead code in libata-acpi, from Sergey.
 
 * Skip device scan deboune delay for Marvell 88SE9235 adapters (ahci) to
   speedup boot, from Paul.
 
 * Simplify functions declaration and use for functions always returning
   0 in libata-core, from Sergey.
 
 * Non-fatal error fixes and in the pata_hpt366 and pata_hpt3x2n drivers,
   from Sergey.
 
 * Various code cleanup in the pata_artop, pata_hpt37x, pata_hpt366,
   pata_hpt3x2n, pata_samsung_cf and sata_rcar drivers, from Sergey.
 
 * Some libata-sff and libata-scsi code cleanup (e.g. change functions
   to return "bool"), from Sergey.
 
 * Renae ahci_board_mobile to board_ahci_low_power to be more descriptive
   of the feature as that is also used on PC and server AHCI adapters,
   from Mario.
 
 * Cleanup of OF match tables, from Geert.
 
 * Simplify the pata_pxa driver initialization using platform_get_irq(),
   from Minghao.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYjlsYwAKCRDdoc3SxdoY
 dn35AP43C5aPtM1JDd+uGZ6JC5QsFPsHYtsX3S7UsO5QhtFeXgD/d+XVYt+pD7wk
 WEaUpH9bB0jRuEFp9yISZeqJzxeuzw8=
 =nxBY
 -----END PGP SIGNATURE-----

Merge tag 'ata-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata updates from Damien Le Moal:
 "For this cycle, no big change but many small fixes and code cleanup to
  libata, the ahci driver and various pata drivers. In more details:

   - Code simplification in pata_platform using
     platform_get_mem_or_io(), from Lad.

   - Fix read-only arrays declarations as const in pata_atiixp and
     pata_pdc202xx_old, from Colin.

   - Various cleanups and code simplification in libata-scsi, from me.

   - Remove dead code in libata-acpi, from Sergey.

   - Skip device scan deboune delay for Marvell 88SE9235 adapters (ahci)
     to speedup boot, from Paul.

   - Simplify functions declaration and use for functions always
     returning 0 in libata-core, from Sergey.

   - Non-fatal error fixes and in the pata_hpt366 and pata_hpt3x2n
     drivers, from Sergey.

   - Various code cleanup in the pata_artop, pata_hpt37x, pata_hpt366,
     pata_hpt3x2n, pata_samsung_cf and sata_rcar drivers, from Sergey.

   - Some libata-sff and libata-scsi code cleanup (e.g. change functions
     to return "bool"), from Sergey.

   - Renae ahci_board_mobile to board_ahci_low_power to be more
     descriptive of the feature as that is also used on PC and server
     AHCI adapters, from Mario.

   - Cleanup of OF match tables, from Geert.

   - Simplify the pata_pxa driver initialization using
     platform_get_irq(), from Minghao"

* tag 'ata-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (38 commits)
  ata: pata_pxa: Use platform_get_irq() to get the interrupt
  ata: Drop commas after OF match table sentinels
  ata: ahci: Rename CONFIG_SATA_LPM_MOBILE_POLICY configuration item
  ata: ahci: Rename `AHCI_HFLAG_IS_MOBILE`
  ata: ahci: Rename board_ahci_mobile
  ata: pata_hpt37x: merge transfer mode setting methods
  ata: libata-sff: use *switch* statement in ata_sff_dev_classify()
  ata: add/use ata_taskfile::{error|status} fields
  ata: Kconfig: fix sata gemini compile test condition
  ata: libata-scsi: use *switch* statements to check SCSI command codes
  ata: libata-sff: refactor ata_sff_altstatus()
  ata: libata-sff: refactor ata_sff_set_devctl()
  ata: libata-sff: make ata_resources_present() return 'bool'
  ata: pata_hpt3x2n: disable fast interrupts in prereset() method
  ata: pata_hpt37x: disable fast interrupts in prereset() method
  ata: pata_hpt366: disable fast interrupts in prereset() method
  ata: pata_mpc52xx: use GFP_KERNEL
  ata: sata_rcar: drop unused #define's
  ata: pata_hpt366: check channel enable bits
  ata: sata_rcar: make sata_rcar_ata_devchk() return 'bool'
  ...
2022-03-23 14:35:59 -07:00
Linus Torvalds
c5c009e250 slab updates for 5.18
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmI4yWsACgkQ4CHKc/GJ
 qRAUhQf8CE5dBIblFU6Nfbiv/GHSu2AoHw8DRseeC7uSLGUzP9h2l1c0zMBzLIrf
 ujKQGU/8n2AgUZBwxd59vbi0WHLD7K9qr3Yj6hH0QnTmHiv89GXEnYXrHHAJ506q
 OGzT7NUiz5pEHrFyE5yqKQ2axy+Q+JrAfyzHQPEdzNtl0DJurVuHATYz5b9Pk6zW
 27PrCIFjQyIkqw2s9oBCYZevAZkiAoxXntzhXicrqksZs4htArzn7LDTulpccnhy
 6hwrCSg91E1Tza8oTNOCNvXt/YUGa6Q1RBEeUDmLOLwHSuOIfTPb5zW25vCvpyMK
 008OYHcw7tspWnEfAEWqIDwMFWcjDg==
 =2g9n
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - A few non-trivial SLUB code cleanups, most notably a refactoring of
   deactivate_slab().

 - A bunch of trivial changes, such as removal of unused parameters,
   making stuff static, and employing helper functions.

* tag 'slab-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm: slub: Delete useless parameter of alloc_slab_page()
  mm: slab: Delete unused SLAB_DEACTIVATED flag
  mm/slub: remove forced_order parameter in calculate_sizes
  mm/slub: refactor deactivate_slab()
  mm/slub: limit number of node partial slabs only in cache creation
  mm/slub: use helper macro __ATTR_XX_MODE for SLAB_ATTR(_RO)
  mm/slab_common: use helper function is_power_of_2()
  mm/slob: make kmem_cache_boot static
2022-03-23 12:33:21 -07:00
Kajol Jain
de7a9e949f drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set
The following build failure occurs when CONFIG_PERF_EVENTS is not set
as generic pmu functions are not visible in that scenario.

|-- s390-randconfig-r044-20220313
|   |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_migrate_context
|   |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_register
|   `-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_unregister

Similar build failure in nds32 architecture:
nd_perf.c:(.text+0x21e): undefined reference to `perf_pmu_migrate_context'
nd_perf.c:(.text+0x434): undefined reference to `perf_pmu_register'
nd_perf.c:(.text+0x57c): undefined reference to `perf_pmu_unregister'

Fix this issue by adding check for CONFIG_PERF_EVENTS config option
and disabling the nvdimm perf interface incase this config is not set.

Also remove function declaration of perf_pmu_migrate_context,
perf_pmu_register, perf_pmu_unregister functions from nd.h as these are
common pmu functions which are part of perf_event.h and since we
are disabling nvdimm perf interface incase CONFIG_PERF_EVENTS option
is not set, we not need to declare them in nd.h

Also move the platform_device header file addition part from nd.h to
nd_perf.c and add stub functions for register_nvdimm_pmu and
unregister_nvdimm_pmu functions to handle CONFIG_PERF_EVENTS=n
case.

Fixes: 0fab1ba6ad ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats") (Commit id based on libnvdimm-for-next tree)
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Link: https://lore.kernel.org/all/62317124.YBQFU33+s%2FwdvWGj%25lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20220323164550.109768-1-kjain@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-23 12:17:36 -07:00
Alexandre Belloni
1a31d63632 rtc: remove uie_unsupported
uie_unsupported is not used by any drivers anymore, remove it.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220309162301.61679-29-alexandre.belloni@bootlin.com
2022-03-23 19:58:41 +01:00
Alexandre Belloni
5c0a04a663 rtc: ds1685: drop no_irq
No platforms are currently setting no_irq. Anyway, letting platform_get_irq
fail is fine as this means that there is no IRQ. In that case, clear
RTC_FEATURE_ALARM so the core knows there are no alarms.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220309162301.61679-2-alexandre.belloni@bootlin.com
2022-03-23 19:58:38 +01:00
Samuel Holland
d91612d7f0 clk: sunxi-ng: Add support for the sun6i RTC clocks
The RTC power domain in sun6i and newer SoCs manages the 16 MHz RC
oscillator (called "IOSC" or "osc16M") and the optional 32 kHz crystal
oscillator (called "LOSC" or "osc32k"). Starting with the H6, this power
domain also handles the 24 MHz DCXO (called variously "HOSC", "dcxo24M",
or "osc24M") as well. The H6 also adds a calibration circuit for IOSC.

Later SoCs introduce further variations on the design:
 - H616 adds an additional mux for the 32 kHz fanout source.
 - R329 adds an additional mux for the RTC timekeeping clock, a clock
   for the SPI bus between power domains inside the RTC, and removes the
   IOSC calibration functionality.

Take advantage of the CCU framework to handle this increased complexity.
This driver is intended to be a drop-in replacement for the existing RTC
clock provider. So some runtime adjustment of the clock parents is
needed, both to handle hardware differences, and to support the old
binding which omitted some of the input clocks.

Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220203021736.13434-6-samuel@sholland.org
2022-03-23 19:58:38 +01:00
Linus Torvalds
1bc191051d Tracing updates for 5.18:
- New user_events interface. User space can register an event with the kernel
   describing the format of the event. Then it will receive a byte in a page
   mapping that it can check against. A privileged task can then enable that
   event like any other event, which will change the mapped byte to true,
   telling the user space application to start writing the event to the
   tracing buffer.
 
 - Add new "ftrace_boot_snapshot" kernel command line parameter. When set,
   the tracing buffer will be saved in the snapshot buffer at boot up when
   the kernel hands things over to user space. This will keep the traces that
   happened at boot up available even if user space boot up has tracing as
   well.
 
 - Have TRACE_EVENT_ENUM() also update trace event field type descriptions.
   Thus if a static array defines its size with an enum, the user space trace
   event parsers can still know how to parse that array.
 
 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This will
   make one tracepoint be able to trace different content and not be stuck at
   only what the original TRACE_EVENT() macro exports.
 
 - Fixes to tracing error logging.
 
 - Better saving of cmdlines to PIDs when tracing (use the wakeup events for
   mapping).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjiO3RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhQzAQDtek5p80p/zkMGymm14wSH6qq0NdgN
 Kv7fTBwEewUa0gD/UCOVLw4Oj+JtHQhCa3sCGZopmRv0BT1+4UQANqosKQY=
 =Au08
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - New user_events interface. User space can register an event with the
   kernel describing the format of the event. Then it will receive a
   byte in a page mapping that it can check against. A privileged task
   can then enable that event like any other event, which will change
   the mapped byte to true, telling the user space application to start
   writing the event to the tracing buffer.

 - Add new "ftrace_boot_snapshot" kernel command line parameter. When
   set, the tracing buffer will be saved in the snapshot buffer at boot
   up when the kernel hands things over to user space. This will keep
   the traces that happened at boot up available even if user space boot
   up has tracing as well.

 - Have TRACE_EVENT_ENUM() also update trace event field type
   descriptions. Thus if a static array defines its size with an enum,
   the user space trace event parsers can still know how to parse that
   array.

 - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
   TRACE_EVENT() macro, but will attach to an existing tracepoint. This
   will make one tracepoint be able to trace different content and not
   be stuck at only what the original TRACE_EVENT() macro exports.

 - Fixes to tracing error logging.

 - Better saving of cmdlines to PIDs when tracing (use the wakeup events
   for mapping).

* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
  tracing: Have type enum modifications copy the strings
  user_events: Add trace event call as root for low permission cases
  tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
  tracing: Add snapshot at end of kernel boot up
  tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
  tracing: Fix strncpy warning in trace_events_synth.c
  user_events: Prevent dyn_event delete racing with ioctl add/delete
  tracing: Add TRACE_CUSTOM_EVENT() macro
  tracing: Move the defines to create TRACE_EVENTS into their own files
  tracing: Add sample code for custom trace events
  tracing: Allow custom events to be added to the tracefs directory
  tracing: Fix last_cmd_set() string management in histogram code
  user_events: Fix potential uninitialized pointer while parsing field
  tracing: Fix allocation of last_cmd in last_cmd_set()
  user_events: Add documentation file
  user_events: Add sample code for typical usage
  user_events: Add self-test for validator boundaries
  user_events: Add self-test for perf_event integration
  user_events: Add self-test for dynamic_events integration
  user_events: Add self-test for ftrace integration
  ...
2022-03-23 11:40:25 -07:00
Herbert Xu
30d024b505 cacheflush.h: Add forward declaration for struct folio
The struct folio is not declared in cacheflush.h so we need to provide
a forward declaration as otherwise users of this header file may get
warnings.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 522a0032af ("Add linux/cacheflush.h")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 10:11:03 -07:00
Miquel Raynal
6cadd424ab Raw NAND core changes:
* Rework of_get_nand_bus_width()
 * Remove of_get_nand_on_flash_bbt() wrapper
 * Protect access to rawnand devices while in suspend
 * bindings: Document the wp-gpios property
 
 Rax NAND controller driver changes:
 * atmel: Fix refcount issue in atmel_nand_controller_init
 * nandsim:
   - Add NS_PAGE_BYTE_SHIFT macro to replace the repeat pattern
   - Merge repeat codes in ns_switch_state
   - Replace overflow check with kzalloc to single kcalloc
 * rockchip: Fix platform_get_irq.cocci warning
 * stm32_fmc2: Add NAND Write Protect support
 * pl353: Set the nand chip node as the flash node
 * brcmnand: Fix sparse warnings in bcma_nand
 * omap_elm: Remove redundant variable 'errors'
 * gpmi:
   - Support fast edo timings for mx28
   - Validate controller clock rate
   - Fix controller timings setting
 * brcmnand:
   - Add BCMA shim
   - BCMA controller uses command shift of 0
   - Allow platform data instantation
   - Add platform data structure for BCMA
   - Allow working without interrupts
   - Move OF operations out of brcmnand_init_cs()
   - Avoid pdev in brcmnand_init_cs()
   - Allow SoC to provide I/O operations
   - Assign soc as early as possible
 
 Onenand changes:
 * Check for error irq
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmI021sACgkQJWrqGEe9
 VoTEcggAiFXD+oR0VXTsykiNDiopsAUZwLPGuqk8gvD4ozOYCoAEoYrtnBM6Uybz
 W6Hu6Eow/ri1H+uVygUw3RYa4TpxrZrmZnJ1YimXxZjbLYjgE3FS9vzh2l4Bu3yo
 fkkQH+nFvk9qVIK8qolAny+LWl37gkSnCd6mPPksYaG5Ds1n1ZgyTZVUz5TOWAjG
 QAWUQQfO1iu7+u4CXa9JRTkCf55bT6v6c9Ryq6MA+ok6jVRN6Cj9WhxHtCB5vmOH
 Ndmu4V8BqaNKg39ltolqSPuwt3GEh707LRr+YakfnaOM7Sf8E/evn/THSkHwY/yn
 bAjpU1gvS13nJ++s8nHwIhHKhoYjTg==
 =4Uke
 -----END PGP SIGNATURE-----

Merge tag 'nand/for-5.18' into mtd/next

Raw NAND core changes:
* Rework of_get_nand_bus_width()
* Remove of_get_nand_on_flash_bbt() wrapper
* Protect access to rawnand devices while in suspend
* bindings: Document the wp-gpios property

Rax NAND controller driver changes:
* atmel: Fix refcount issue in atmel_nand_controller_init
* nandsim:
  - Add NS_PAGE_BYTE_SHIFT macro to replace the repeat pattern
  - Merge repeat codes in ns_switch_state
  - Replace overflow check with kzalloc to single kcalloc
* rockchip: Fix platform_get_irq.cocci warning
* stm32_fmc2: Add NAND Write Protect support
* pl353: Set the nand chip node as the flash node
* brcmnand: Fix sparse warnings in bcma_nand
* omap_elm: Remove redundant variable 'errors'
* gpmi:
  - Support fast edo timings for mx28
  - Validate controller clock rate
  - Fix controller timings setting
* brcmnand:
  - Add BCMA shim
  - BCMA controller uses command shift of 0
  - Allow platform data instantation
  - Add platform data structure for BCMA
  - Allow working without interrupts
  - Move OF operations out of brcmnand_init_cs()
  - Avoid pdev in brcmnand_init_cs()
  - Allow SoC to provide I/O operations
  - Assign soc as early as possible

Onenand changes:
* Check for error irq

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2022-03-23 18:08:03 +01:00
YueHaibing
553f685ebf mfd: db8500-prcmu: Remove unused inline function
commit b0e846248d ("mfd: db8500-prcmu: Remove dead code for a non-existing config")
left behind this, remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220311125518.31064-1-yuehaibing@huawei.com
2022-03-23 14:51:51 +00:00
Ming Lei
d578c770c8 block: avoid calling blkg_free() in atomic context
blkg_free() can currently be called in atomic context, either spin lock is
held, or run in rcu callback. Meantime either request queue's release
handler or ->pd_free_fn can sleep.

Fix the issue by scheduling a work function for freeing blkcg_gq the
instance.

[  148.553894] BUG: sleeping function called from invalid context at block/blk-sysfs.c:767
[  148.557381] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/13
[  148.560741] preempt_count: 101, expected: 0
[  148.562577] RCU nest depth: 0, expected: 0
[  148.564379] 1 lock held by swapper/13/0:
[  148.566127]  #0: ffffffff82615f80 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x0/0x1b
[  148.569640] Preemption disabled at:
[  148.569642] [<ffffffff8123f9c3>] ___slab_alloc+0x554/0x661
[  148.573559] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Not tainted 5.17.0_up+ #110
[  148.576834] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
[  148.579768] Call Trace:
[  148.580567]  <IRQ>
[  148.581262]  dump_stack_lvl+0x56/0x7c
[  148.582367]  ? ___slab_alloc+0x554/0x661
[  148.583526]  __might_resched+0x1af/0x1c8
[  148.584678]  blk_release_queue+0x24/0x109
[  148.585861]  kobject_cleanup+0xc9/0xfe
[  148.586979]  blkg_free+0x46/0x63
[  148.587962]  rcu_do_batch+0x1c5/0x3db
[  148.589057]  rcu_core+0x14a/0x184
[  148.590065]  __do_softirq+0x14d/0x2c7
[  148.591167]  __irq_exit_rcu+0x7a/0xd4
[  148.592264]  sysvec_apic_timer_interrupt+0x82/0xa5
[  148.593649]  </IRQ>
[  148.594354]  <TASK>
[  148.595058]  asm_sysvec_apic_timer_interrupt+0x12/0x20

Cc: Tejun Heo <tj@kernel.org>
Fixes: 0a9a25ca78 ("block: let blkcg_gq grab request queue's refcnt")
Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-block/20220322093322.GA27283@lst.de/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220323011308.2010380-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-22 19:52:23 -06:00
Linus Torvalds
6b1f86f8e9 Filesystem folio changes for 5.18
Primarily this series converts some of the address_space operations
 to take a folio instead of a page.
 
 ->is_partially_uptodate() takes a folio instead of a page and changes the
 type of the 'from' and 'count' arguments to make it obvious they're bytes.
 ->invalidatepage() becomes ->invalidate_folio() and has a similar type change.
 ->launder_page() becomes ->launder_folio()
 ->set_page_dirty() becomes ->dirty_folio() and adds the address_space as
 an argument.
 
 There are a couple of other misc changes up front that weren't worth
 separating into their own pull request.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4hqMACgkQDpNsjXcp
 gj7r7Af/fVJ7m8kKqjP/IayX3HiJRuIDQw+vM++BlRNXdjz+IyED6whdmFGxJeOY
 BMyT+8ApOAz7ErS4G+7fAv4ScJK/aEgFUsnSeAiCp0PliiEJ5NNJzElp6sVmQ7H5
 SX7+Ek444FZUGsQuy0qL7/ELpR3ditnD7x+5U2g0p5TeaHGUQn84crRyfR4xuhNG
 EBD9D71BOb7OxUcOHe93pTkK51QsQ0aCrcIsB1tkK5KR0BAthn1HqF7ehL90Rvrr
 omx5M7aDWGY4oj7IKrhlAs+55Ah2WaOzrZBp0FXNbr4UENDBKWKyUxErwa4xPkf6
 Gm1iQG/CspOHnxN3YWsd5WjtlL3A+A==
 =cOiq
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache

Pull filesystem folio updates from Matthew Wilcox:
 "Primarily this series converts some of the address_space operations to
  take a folio instead of a page.

  Notably:

   - a_ops->is_partially_uptodate() takes a folio instead of a page and
     changes the type of the 'from' and 'count' arguments to make it
     obvious they're bytes.

   - a_ops->invalidatepage() becomes ->invalidate_folio() and has a
     similar type change.

   - a_ops->launder_page() becomes ->launder_folio()

   - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the
     address_space as an argument.

  There are a couple of other misc changes up front that weren't worth
  separating into their own pull request"

* tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits)
  fs: Remove aops ->set_page_dirty
  fb_defio: Use noop_dirty_folio()
  fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
  fs: Convert __set_page_dirty_buffers to block_dirty_folio
  nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio()
  mm: Convert swap_set_page_dirty() to swap_dirty_folio()
  ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio
  f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio
  f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio
  f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio
  afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio()
  btrfs: Convert extent_range_redirty_for_io() to use folios
  fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio
  btrfs: Convert from set_page_dirty to dirty_folio
  fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio()
  fs: Add aops->dirty_folio
  fs: Remove aops->launder_page
  orangefs: Convert launder_page to launder_folio
  nfs: Convert from launder_page to launder_folio
  fuse: Convert from launder_page to launder_folio
  ...
2022-03-22 18:26:56 -07:00
Linus Torvalds
9030fb0bb9 Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention
    on i_mmap_rwsem (Hugh Dickins):
    https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
  - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
    https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
  - Convert GUP to use folios and make pincount available for order-1
    pages. (Matthew Wilcox)
  - Convert a few more truncation functions to use folios (Matthew Wilcox)
  - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
  - Convert rmap_walk to use folios (Matthew Wilcox)
  - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
  - Add support for creating large folios in readahead (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
 gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
 P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
 s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
 Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
 CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
 v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
 =n9Ad
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Saeed Mahameed
2af7e566a8 net/mlx5e: Fix build warning, detected write beyond size of field
When merged with Linus tree, the cited patch below will cause the
following build warning:

In function 'fortify_memset_chk',
    inlined from 'mlx5e_xmit_xdp_frame' at drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c:438:3:
include/linux/fortify-string.h:242:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
  242 |                         __write_overflow_field(p_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix that by grouping the fields to memeset in struct_group() to avoid
the false alarm.

Fixes: 9ded70fa1d ("net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20220322172224.31849-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22 16:48:56 -07:00
Linus Torvalds
3bf03b9a08 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
2022-03-22 16:11:53 -07:00
SeongJae Park
5257f36ec2 mm/damon/core: add number of each enum type values
This commit declares the number of legal values for each DAMON enum types
to make traversals of such DAMON enum types easy and safe.

Link: https://lkml.kernel.org/r/20220228081314.5770-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:13 -07:00
SeongJae Park
8b9b0d335a mm/damon/core: allow non-exclusive DAMON start/stop
Patch series "Introduce DAMON sysfs interface", v3.

Introduction
============

DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so
far.  However, it unnecessarily depends on debugfs, while DAMON is not
aimed to be used for only debugging.  Also, the interface receives
multiple values via one file.  For example, schemes file receives 18
values.  As a result, it is inefficient, hard to be used, and difficult to
be extended.  Especially, keeping backward compatibility of user space
tools is getting only challenging.  It would be better to implement
another reliable and flexible interface and deprecate DAMON_DBGFS in long
term.

For the reason, this patchset introduces a sysfs-based new user interface
of DAMON.  The idea of the new interface is, using directory hierarchies
and having one dedicated file for each value.  For a short example, users
can do the virtual address monitoring via the interface as below:

    # cd /sys/kernel/mm/damon/admin/
    # echo 1 > kdamonds/nr_kdamonds
    # echo 1 > kdamonds/0/contexts/nr_contexts
    # echo vaddr > kdamonds/0/contexts/0/operations
    # echo 1 > kdamonds/0/contexts/0/targets/nr_targets
    # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
    # echo on > kdamonds/0/state

A brief representation of the files hierarchy of DAMON sysfs interface is
as below.  Childs are represented with indentation, directories are having
'/' suffix, and files in each directory are separated by comma.

    /sys/kernel/mm/damon/admin
    │ kdamonds/nr_kdamonds
    │ │ 0/state,pid
    │ │ │ contexts/nr_contexts
    │ │ │ │ 0/operations
    │ │ │ │ │ monitoring_attrs/
    │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
    │ │ │ │ │ │ nr_regions/min,max
    │ │ │ │ │ targets/nr_targets
    │ │ │ │ │ │ 0/pid_target
    │ │ │ │ │ │ │ regions/nr_regions
    │ │ │ │ │ │ │ │ 0/start,end
    │ │ │ │ │ │ │ │ ...
    │ │ │ │ │ │ ...
    │ │ │ │ │ schemes/nr_schemes
    │ │ │ │ │ │ 0/action
    │ │ │ │ │ │ │ access_pattern/
    │ │ │ │ │ │ │ │ sz/min,max
    │ │ │ │ │ │ │ │ nr_accesses/min,max
    │ │ │ │ │ │ │ │ age/min,max
    │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
    │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
    │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
    │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
    │ │ │ │ │ │ ...
    │ │ │ │ ...
    │ │ ...

Detailed usage of the files will be described in the final Documentation
patch of this patchset.

Main Difference Between DAMON_DBGFS and DAMON_SYSFS
---------------------------------------------------

At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features.  One
important difference between them is their exclusiveness.  DAMON_DBGFS
works in an exclusive manner, so that no DAMON worker thread (kdamond) in
the system can run concurrently and interfere somehow.  For the reason,
DAMON_DBGFS asks users to construct all monitoring contexts and start them
at once.  It's not a big problem but makes the operation a little bit
complex and unflexible.

For more flexible usage, DAMON_SYSFS moves the responsibility of
preventing any possible interference to the admins and work in a
non-exclusive manner.  That is, users can configure and start contexts one
by one.  Note that DAMON respects both exclusive groups and non-exclusive
groups of contexts, in a manner similar to that of reader-writer locks.
That is, if any exclusive monitoring contexts (e.g., contexts that started
via DAMON_DBGFS) are running, DAMON_SYSFS does not start new contexts, and
vice versa.

Future Plan of DAMON_DBGFS Deprecation
======================================

Once this patchset is merged, DAMON_DBGFS development will be frozen.
That is, we will maintain it to work as is now so that no users will be
break.  But, it will not be extended to provide any new feature of DAMON.
The support will be continued only until next LTS release.  After that, we
will drop DAMON_DBGFS.

User-space Tooling Compatibility
--------------------------------

As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space
tooling can move to DAMON_SYSFS.  As we will continue supporting
DAMON_DBGFS until next LTS kernel release, user space tools would have
enough time to move to DAMON_SYSFS.

The official user space tool, damo[1], is already supporting both
DAMON_SYSFS and DAMON_DBGFS.  Both correctness tests[2] and performance
tests[3] of DAMON using DAMON_SYSFS also passed.

[1] https://github.com/awslabs/damo
[2] https://github.com/awslabs/damon-tests/tree/master/corr
[3] https://github.com/awslabs/damon-tests/tree/master/perf

Sequence of Patches
===================

First two patches (patches 1-2) make core changes for DAMON_SYSFS.  The
first one (patch 1) allows non-exclusive DAMON contexts so that
DAMON_SYSFS can work in non-exclusive mode, while the second one (patch 2)
adds size of DAMON enum types so that DAMON API users can safely iterate
the enums.

Third patch (patch 3) implements basic sysfs stub for virtual address
spaces monitoring.  Note that this implements only sysfs files and DAMON
is not linked.  Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so
that users can control DAMON using the sysfs files.

Following six patches (patches 5-10) implements other DAMON features that
DAMON_DBGFS supports one by one (physical address space monitoring,
DAMON-based operation schemes, schemes quotas, schemes prioritization
weights, schemes watermarks, and schemes stats).

Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the
final one (patch 12) documents DAMON_SYSFS.

This patch (of 13):

To avoid interference between DAMON contexts monitoring overlapping memory
regions, damon_start() works in an exclusive manner.  That is,
damon_start() does nothing bug fails if any context that started by
another instance of the function is still running.  This makes its usage a
little bit restrictive.  However, admins could aware each DAMON usage and
address such interferences on their own in some cases.

This commit hence implements non-exclusive mode of the function and allows
the callers to select the mode.  Note that the exclusive groups and
non-exclusive groups of contexts will respect each other in a manner
similar to that of reader-writer locks.  Therefore, this commit will not
cause any behavioral change to the exclusive groups.

Link: https://lkml.kernel.org/r/20220228081314.5770-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220228081314.5770-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:13 -07:00
SeongJae Park
851040566a mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
Because DAMON debugfs interface and DAMON-based proactive reclaim are now
using monitoring operations via registration mechanism,
damon_{p,v}a_{target_valid,set_operations}() functions have no user.  This
commit clean them up.

Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:12 -07:00
SeongJae Park
9f7b053a0f mm/damon: let monitoring operations can be registered and selected
In-kernel DAMON user code like DAMON debugfs interface should set 'struct
damon_operations' of its 'struct damon_ctx' on its own.  Therefore, the
client code should depend on all supporting monitoring operations
implementations that it could use.  For example, DAMON debugfs interface
depends on both vaddr and paddr, while some of the users are not always
interested in both.

To minimize such unnecessary dependencies, this commit makes the
monitoring operations can be registered by implementing code and then
dynamically selected by the user code without build-time dependency.

Link: https://lkml.kernel.org/r/20220215184603.1479-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:12 -07:00
SeongJae Park
f7d911c39c mm/damon: rename damon_primitives to damon_operations
Patch series "Allow DAMON user code independent of monitoring primitives".

In-kernel DAMON user code is required to configure the monitoring context
(struct damon_ctx) with proper monitoring primitives (struct
damon_primitive).  This makes the user code dependent to all supporting
monitoring primitives.  For example, DAMON debugfs interface depends on
both DAMON_VADDR and DAMON_PADDR, though some users have interest in only
one use case.  As more monitoring primitives are introduced, the problem
will be bigger.

To minimize such unnecessary dependency, this patchset makes monitoring
primitives can be registered by the implemnting code and later dynamically
searched and selected by the user code.

In addition to that, this patchset renames monitoring primitives to
monitoring operations, which is more easy to intuitively understand what
it means and how it would be structed.

This patch (of 8):

DAMON has a set of callback functions called monitoring primitives and let
it can be configured with various implementations for easy extension for
different address spaces and usages.  However, the word 'primitive' is not
so explicit.  Meanwhile, many other structs resembles similar purpose
calls themselves 'operations'.  To make the code easier to be understood,
this commit renames 'damon_primitives' to 'damon_operations' before it is
too late to rename.

Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:12 -07:00
SeongJae Park
1971bd6304 mm/damon: remove the target id concept
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context.  Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.

This makes the code flexible, but ugly, not well-documented, and
type-unsafe[1].  Also, identification of each target can be done via its
index.  For the reason, this commit removes the concept and uses clear
type definition.  For now, only 'struct pid' pointer is used for the
virtual address spaces monitoring.  If DAMON is extended in future so that
we need to put another identifier field in the struct, we will use a union
for such primitives-dependent fields and document which primitives are
using which type.

[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/

Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:12 -07:00
SeongJae Park
436428255d mm/damon/core: move damon_set_targets() into dbgfs
damon_set_targets() function is defined in the core for general use cases,
but called from only dbgfs.  Also, because the function is for general use
cases, dbgfs does additional handling of pid type target id case.  To make
the situation simpler, this commit moves the function into dbgfs and makes
it to do the pid type case handling on its own.

Link: https://lkml.kernel.org/r/20211230100723.2238-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:12 -07:00
Ira Weiny
d7ca25c53e highmem: document kunmap_local()
Some users of kmap() add an offset to the kmap() address to be used
during the mapping.

When converting to kmap_local_page() the base address does not need to
be stored because any address within the page can be used in
kunmap_local().  However, this was not clear from the documentation and
cause some questions.[1]

Document that any address in the page can be used in kunmap_local() to
clarify this for future users.

[1] https://lore.kernel.org/lkml/20211213154543.GM3538886@iweiny-DESK2.sc.intel.com/

[ira.weiny@intel.com: updates per Christoph]
  Link: https://lkml.kernel.org/r/20220124182138.816693-1-ira.weiny@intel.com

Link: https://lkml.kernel.org/r/20220124013045.806718-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:11 -07:00
Christophe Leroy
ad7489d526 mm: uninline copy_overflow()
While building a small config with CONFIG_CC_OPTIMISE_FOR_SIZE, I ended
up with more than 50 times the following function in vmlinux because GCC
doesn't honor the 'inline' keyword:

	c00243bc <copy_overflow>:
	c00243bc:	94 21 ff f0 	stwu    r1,-16(r1)
	c00243c0:	7c 85 23 78 	mr      r5,r4
	c00243c4:	7c 64 1b 78 	mr      r4,r3
	c00243c8:	3c 60 c0 62 	lis     r3,-16286
	c00243cc:	7c 08 02 a6 	mflr    r0
	c00243d0:	38 63 5e e5 	addi    r3,r3,24293
	c00243d4:	90 01 00 14 	stw     r0,20(r1)
	c00243d8:	4b ff 82 45 	bl      c001c61c <__warn_printk>
	c00243dc:	0f e0 00 00 	twui    r0,0
	c00243e0:	80 01 00 14 	lwz     r0,20(r1)
	c00243e4:	38 21 00 10 	addi    r1,r1,16
	c00243e8:	7c 08 03 a6 	mtlr    r0
	c00243ec:	4e 80 00 20 	blr

With -Winline, GCC tells:

	/include/linux/thread_info.h:212:20: warning: inlining failed in call to 'copy_overflow': call is unlikely and code size would grow [-Winline]

copy_overflow() is a non conditional warning called by check_copy_size()
on an error path.

check_copy_size() have to remain inlined in order to benefit from
constant folding, but copy_overflow() is not worth inlining.

Uninline the warning when CONFIG_BUG is selected.

When CONFIG_BUG is not selected, WARN() does nothing so skip it.

This reduces the size of vmlinux by almost 4kbytes.

Link: https://lkml.kernel.org/r/e1723b9cfa924bcefcd41f69d0025b38e4c9364e.1644819985.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:11 -07:00
Christophe Leroy
6eada26ffc mm: remove usercopy_warn()
Users of usercopy_warn() were removed by commit 53944f171a ("mm:
remove HARDENED_USERCOPY_FALLBACK")

Remove it.

Link: https://lkml.kernel.org/r/5f26643fc70b05f8455b60b99c30c17d635fa640.1644231910.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Stephen Kitt <steve@sk2.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:11 -07:00
Oscar Salvador
734c15700c mm: only re-generate demotion targets when a numa node changes its N_CPU state
Abhishek reported that after patch [1], hotplug operations are taking
roughly double the expected time.  [2]

The reason behind is that the CPU callbacks that
migrate_on_reclaim_init() sets always call set_migration_target_nodes()
whenever a CPU is brought up/down.

But we only care about numa nodes going from having cpus to become
cpuless, and vice versa, as that influences the demotion_target order.

We do already have two CPU callbacks (vmstat_cpu_online() and
vmstat_cpu_dead()) that check exactly that, so get rid of the CPU
callbacks in migrate_on_reclaim_init() and only call
set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a
numa node change its N_CPU state.

[1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/
[2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/

[osalvador@suse.de: add feedback from Huang Ying]
  Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de

Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de
Fixes: 884a6e5d1f ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Abhishek Goel <huntbag@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:11 -07:00