Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken does
not seem to work, breaking too many guests in the process. We'll need to
do length validation using some other approach.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmGe0sEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp8WEH/imDIq1iduDeAuvFnmrm5eEO9w3wzXCT4NiG
8Pla241FzQ1pEFEAne16KP0+SlLhj7P0oc5FR8vkYvxxuyneDbCzcS2M1kYMOpA1
ry28PuObAnekzE/WXxvC031ozB5Zb/FL54gmw+/1EdAOdMGL0CdQ1aJxREBHRTBo
p4ZHr83GA2D2C/IyKCsgQ8cB9ZrMqImTQQ4vRD89HoFBp+GH2u2Di1iyXEWuOqdI
n1+7M9jjbyW8A+N1bkOicpShS/6UcyJQOOcg8kvUQOV6srVkYhfaiWC/CbOP2g73
8PKK+/K2Htf92s6RdvDUPSKmvqGR/4KPZWPtWThXBYXGgWul0uI=
=q6tO
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin:
"Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken
does not seem to work, breaking too many guests in the process. We'll
need to do length validation using some other approach"
[ This merge also ends up reverting commit f7a36b03a7 ("vsock/virtio:
suppress used length validation"), which came in through the
networking tree in the meantime, and was part of that whole used
length validation series - Linus ]
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa_sim: avoid putting an uninitialized iova_domain
vhost-vdpa: clean irqs before reseting vdpa device
virtio-blk: modify the value type of num in virtio_queue_rq()
vhost/vsock: cleanup removing `len` variable
vhost/vsock: fix incorrect used length reported to the guest
Revert "virtio_ring: validate used buffer length"
Revert "virtio-net: don't let virtio core to validate used length"
Revert "virtio-blk: don't let virtio core to validate used length"
Revert "virtio-scsi: don't let virtio core to validate used buffer length"
Current release - regressions:
- r8169: fix incorrect mac address assignment
- vlan: fix underflow for the real_dev refcnt when vlan creation fails
- smc: avoid warning of possible recursive locking
Current release - new code bugs:
- vsock/virtio: suppress used length validation
- neigh: fix crash in v6 module initialization error path
Previous releases - regressions:
- af_unix: fix change in behavior in read after shutdown
- igb: fix netpoll exit with traffic, avoid warning
- tls: fix splice_read() when starting mid-record
- lan743x: fix deadlock in lan743x_phy_link_status_change()
- marvell: prestera: fix bridge port operation
Previous releases - always broken:
- tcp_cubic: fix spurious Hystart ACK train detections for
not-cwnd-limited flows
- nexthop: fix refcount issues when replacing IPv6 groups
- nexthop: fix null pointer dereference when IPv6 is not enabled
- phylink: force link down and retrigger resolve on interface change
- mptcp: fix delack timer length calculation and incorrect early
clearing
- ieee802154: handle iftypes as u32, prevent shift-out-of-bounds
- nfc: virtual_ncidev: change default device permissions
- netfilter: ctnetlink: fix error codes and flags used for kernel side
filtering of dumps
- netfilter: flowtable: fix IPv6 tunnel addr match
- ncsi: align payload to 32-bit to fix dropped packets
- iavf: fix deadlock and loss of config during VF interface reset
- ice: avoid bpf_prog refcount underflow
- ocelot: fix broken PTP over IP and PTP API violations
Misc:
- marvell: mvpp2: increase MTU limit when XDP enabled
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGhSCwACgkQMUZtbf5S
IrvAdw//aR54mgc9rc0mkvS5sbDeKzDscmTzav5ANGjl+2ooKTOe8Qd07s59z6TJ
H9IJlTu0Uc9Psbb2RvRo1T1HDohSpWy7SEN/Qlo6N+z1WzDHWbuXyC/KTQDM+8I1
coMYBBTwBGkblBosuoMUi60GWLbBslLv9gR7HUZj7gbtxMfk36BrX5UYz1ONy+tx
HiVshtOmzOgumBi+/j0tkI4lpI/ajf9eYaG6Vvd0A6F3idcbhWKNKfLPgw9qQF36
sQrbz1SYwL5Ucgk47EG+Lpk7oSzbkdNoO6Ro9ncsebB8OMoLUhddclmG/fbgPG0o
SWJ4kK3kmaRSTvSi6q4e5BM89oIhtFWhGRB6vURokrAQU1Ds+sq5F+8IwCaMqEYb
GNyEZ8cdJhLc50RU+/Im3lN6IrRHvQiirE1BN+ZuCMjeSTrsqX18ZYMh1pSJhxkZ
wRC03sSd2ZcaooFrSNJ5Scr3ndacrWNtVr78IQYCNrTjqJn1QUK7ZegTjP04FUfD
JLB7+en8Hd6EKosJLKyoAPRwoFPZN6mDAPC6RfF45B3OoZAHbvXmJrOT6PatcqHe
i0YwDkAJKPRijfcepN1IQYlY2Za5HwNWzCV6v0bf4tUCluDsSkczTKS02dZ1hegR
oYW1Ra1BIyYK4cbG4H0lD7iBQGLGgwt38U1NlFpawbJa/fECUSs=
=LtZ7
-----END PGP SIGNATURE-----
Merge tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter.
Current release - regressions:
- r8169: fix incorrect mac address assignment
- vlan: fix underflow for the real_dev refcnt when vlan creation
fails
- smc: avoid warning of possible recursive locking
Current release - new code bugs:
- vsock/virtio: suppress used length validation
- neigh: fix crash in v6 module initialization error path
Previous releases - regressions:
- af_unix: fix change in behavior in read after shutdown
- igb: fix netpoll exit with traffic, avoid warning
- tls: fix splice_read() when starting mid-record
- lan743x: fix deadlock in lan743x_phy_link_status_change()
- marvell: prestera: fix bridge port operation
Previous releases - always broken:
- tcp_cubic: fix spurious Hystart ACK train detections for
not-cwnd-limited flows
- nexthop: fix refcount issues when replacing IPv6 groups
- nexthop: fix null pointer dereference when IPv6 is not enabled
- phylink: force link down and retrigger resolve on interface change
- mptcp: fix delack timer length calculation and incorrect early
clearing
- ieee802154: handle iftypes as u32, prevent shift-out-of-bounds
- nfc: virtual_ncidev: change default device permissions
- netfilter: ctnetlink: fix error codes and flags used for kernel
side filtering of dumps
- netfilter: flowtable: fix IPv6 tunnel addr match
- ncsi: align payload to 32-bit to fix dropped packets
- iavf: fix deadlock and loss of config during VF interface reset
- ice: avoid bpf_prog refcount underflow
- ocelot: fix broken PTP over IP and PTP API violations
Misc:
- marvell: mvpp2: increase MTU limit when XDP enabled"
* tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
net: dsa: microchip: implement multi-bridge support
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
net: mscc: ocelot: set up traps for PTP packets
net: ptp: add a definition for the UDP port for IEEE 1588 general messages
net: mscc: ocelot: create a function that replaces an existing VCAP filter
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
net: hns3: fix incorrect components info of ethtool --reset command
net: hns3: fix one incorrect value of page pool info when queried by debugfs
net: hns3: add check NULL address for page pool
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
net: qed: fix the array may be out of bound
net/smc: Don't call clcsock shutdown twice when smc shutdown
net: vlan: fix underflow for the real_dev refcnt
ptp: fix filter names in the documentation
ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
nfc: virtual_ncidev: change default device permissions
net/sched: sch_ets: don't peek at classes beyond 'nbands'
net: stmmac: Disable Tx queues when reconfiguring the interface
selftests: tls: test for correct proto_ops
tls: fix replacing proto_ops
...
- Fix NULL pointer dereference in the CPPC library code occuring
on hybrid systems without CPPC support (Rafael Wysocki).
- Avoid attempts to acquire a semaphore with interrupts off when
printing the names of ACPI device nodes and clean up code on
top of that fix (Sakari Ailus).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmGhMxoSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxSUUP/RXsaC+O7KIMZTZm962GLZl8Wcb7LxdK
oJhcw4OnH5R53Pl1TJrU3iPTnM5CAwEwX0biIVQyqgpfx94Qdu0oJCFuXaMB1FiB
hY3Q5wIa+olFyc1KDPrdVCga4RVlaJBvQnuV3H30eoziSirt7d8dIlmjtGscOU3l
JSgVVIc6CHPnf7qpZ1EFfdi5xGBSfOpUSwVPkj1r5AZkDNXUk5kPu3MNT36F0weZ
zjz6Q7BWVjca4Ih6eB6H2IOjkizCXpnGYeZwZQqV+h4920AkCJhOHxOmx/Gwqs1J
bVAOg4s+DvnObh6D7JMsKm8VhT4FFcYwIVEDXl8z5oXXrINbCHj0WKBiKfKhdrRt
BPXJ2q+LbqCRKCVKB082nEkVB2KpM0QlHUdFUVTIXnsKDyYwPAXHO/fXAa8qjTGT
X73bnGuYQJ/Xob8mnj7/UJ2SSEgqwZRByJ9YZGdTbDk4B2yrSeylQha2Dr3p75fM
pcKC/jOzbV2yOsp1GH5mjexyytsmF68gvPFk/S4VYQYEHpC3zEm8UrV2yaj01yOL
a51aOFwhwuJ3LxsNpoX3qBEL6TKYg6ueqmunQ9Mffgjx/1gu/afQqs2KVX9duKD/
UpxW2qeVV6OnViyLQ0n5lgBEFZ1jIXlSm0wEsH4gJNaxgdZHa7ZNZnGPL1Ay9Op9
apFI9Gdk8X1e
=yXqv
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a NULL pointer dereference in the CPPC library code and a
locking issue related to printing the names of ACPI device nodes in
the device properties framework.
Specifics:
- Fix NULL pointer dereference in the CPPC library code occuring on
hybrid systems without CPPC support (Rafael Wysocki).
- Avoid attempts to acquire a semaphore with interrupts off when
printing the names of ACPI device nodes and clean up code on top of
that fix (Sakari Ailus)"
* tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
ACPI: Make acpi_node_get_parent() local
ACPI: Get acpi_device's parent from the parent field
As opposed to event messages (Sync, PdelayReq etc) which require
timestamping, general messages (Announce, FollowUp etc) do not.
In PTP they are part of different streams of data.
IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
destination port assigned by IANA is 319 for event messages, and 320 for
general messages. Yet the kernel seems to be missing the definition for
general messages. This patch adds it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fix compilation warnings on csky and sparc
- Rename multipage folios to large folios
- Rename AS_THP_SUPPORT and FS_THP_SUPPORT
- Add functions to zero portions of a folio
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmGem6wACgkQDpNsjXcp
gj7uvwgAjNqDWOVgwYU98daN6nKQQf5Vv35f0bzeKcKcHIOEWZ2+MUeXkI55h8TD
ss5L3O86sPtQmpKUQJJChZC4AhpIPRyjPA0JW6vYqXQd912M331WpGgFFyX5eI+3
OxfKLRULmopeWP1RjWmkWqlhYQHL5OLgAMC4VaBSfDHd1UMRf+F9JNm9qR7GCp9Q
Vb0qcmBMaQYt/K5sWRQyPUACVTF+27RLKAs+Om37NGekv1UqgOPMzi9nAyi9RjCi
rRY6oGupNgC+Y41jzlpaNoL71RPS92H769FBh/Fe4qu55VSPjfcN77qAnVhX5Ykn
4RhzZcEUoqlx9xG9xynk0mmbx2Bf4g==
=kvqM
-----END PGP SIGNATURE-----
Merge tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache
Pull folio fixes from Matthew Wilcox:
"In the course of preparing the folio changes for iomap for next merge
window, we discovered some problems that would be nice to address now:
- Renaming multi-page folios to large folios.
mapping_multi_page_folio_support() is just a little too long, so we
settled on mapping_large_folio_support(). That meant renaming, eg
folio_test_multi() to folio_test_large().
Rename AS_THP_SUPPORT to match
- I hadn't included folio wrappers for zero_user_segments(), etc.
Also, multi-page^W^W large folio support is now independent of
CONFIG_TRANSPARENT_HUGEPAGE, so machines with HIGHMEM always need
to fall back to the out-of-line zero_user_segments().
Remove FS_THP_SUPPORT to match
- The build bots finally got round to telling me that I missed a
couple of architectures when adding flush_dcache_folio(). Christoph
suggested that we just add linux/cacheflush.h and not rely on
asm-generic/cacheflush.h"
* tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache:
mm: Add functions to zero portions of a folio
fs: Rename AS_THP_SUPPORT and mapping_thp_support
fs: Remove FS_THP_SUPPORT
mm: Remove folio_test_single
mm: Rename folio_test_multi to folio_test_large
Add linux/cacheflush.h
This reverts commit 939779f515.
Attempts to validate length in the core did not work out: there turn out
to exist multiple broken devices, and in particular legacy devices are
known to be broken in this respect.
We have ideas for handling this better in the next version but for now
let's revert to a known good state to make sure drivers work for people.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull HID fixes from Jiri Kosina:
- fix for Intel-ISH driver to make sure it gets aoutoloaded only on
matching devices and not universally (Thomas Weißschuh)
- fix for Wacom driver reporting invalid contact under certain
circumstances (Jason Gerecke)
- probing fix for ft260 dirver (Michael Zaidman)
- fix for generic keycode remapping (Thomas Weißschuh)
- fix for division by zero in hid-magicmouse (Claudia Pellegrino)
- other tiny assorted fixes and new device IDs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: multitouch: Fix Iiyama ProLite T1931SAW (0eef:0001 again!)
HID: nintendo: eliminate dead datastructures in !CONFIG_NINTENDO_FF case
HID: magicmouse: prevent division by 0 on scroll
HID: thrustmaster: fix sparse warnings
HID: Ignore battery for Elan touchscreen on HP Envy X360 15-eu0xxx
HID: input: set usage type to key on keycode remap
HID: input: Fix parsing of HID_CP_CONSUMER_CONTROL fields
HID: ft260: fix i2c probing for hwmon devices
Revert "HID: hid-asus.c: Maps key 0x35 (display off) to KEY_SCREENLOCK"
HID: intel-ish-hid: fix module device-id handling
mod_devicetable: fix kdocs for ishtp_device_id
HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts
HID: nintendo: unlock on error in joycon_leds_create()
platform/x86: isthp_eclite: only load for matching devices
platform/chrome: chros_ec_ishtp: only load for matching devices
HID: intel-ish-hid: hid-client: only load for matching devices
HID: intel-ish-hid: fw-loader: only load for matching devices
HID: intel-ish-hid: use constants for modaliases
HID: intel-ish-hid: add support for MODULE_DEVICE_TABLE()
acpi_node_get_parent() isn't used outside drivers/acpi/property.c.
Make it local.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge misc fixes from Andrew Morton:
"15 patches.
Subsystems affected by this patch series: ipc, hexagon, mm (swap,
slab-generic, kmemleak, hugetlb, kasan, damon, and highmem), and proc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
proc/vmcore: fix clearing user buffer by properly using clear_user()
kmap_local: don't assume kmap PTEs are linear arrays in memory
mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
kasan: test: silence intentional read overflow warnings
hugetlb, userfaultfd: fix reservation restore on userfaultfd error
hugetlb: fix hugetlb cgroup refcounting during mremap
mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
hexagon: ignore vmlinux.lds
hexagon: clean up timer-regs.h
hexagon: export raw I/O routines for modules
mm: emit the "free" trace report before freeing memory in kmem_cache_free()
shm: extend forced shm destroy to support objects from several IPC nses
ipc: WARN if trying to remove ipc object which is absent
mm/swap.c:put_pages_list(): reinitialise the page list
- Fix some stubs causing compile issues for ACPI.
- Fix some wakeups on AMD IRQs shared between GPIO and SCI.
- Fix a build warning in the Tegra driver.
- Fix a Kconfig issue in the Qualcomm driver.
- Add a missing include the RALink driver.
- Return a valid type for the Apple pinctrl IRQs.
- Implement some Qualcomm SDM845 dual-edge errata.
- Remove the unused <linux/sdb.h> header. (The subsystem was
once deleted by the pinctrl maintainer...)
- Fix a duplicate initialized in the Tegra driver.
- Fix register offsets for UFS and SDC in the Qualcomm SM8350
driver.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmGYW4MACgkQQRCzN7AZ
XXNI/BAAmbPnEdjOpa/qjQRae7VV9ycCVhFjs37+0HSOOiMFjQieTz3n4dUQ7JX9
guK7pqn9+ZPBqkya75X4pvDWVW7IuquifflVPg0c3V4yW/+tgt7ZR4JnZo18xt+L
OzW/SnR1O8wXvV7O+6ee8jH3NL7g1SB2bdLuvAwIM1uMdBse0F0nDvdxfSiaLcGk
zFdht2MVXOz4JT0Qq9HYujxw3cJ8Z8fBSS8Y7hdWaNRxYdQe3mVJzaSgCTnEXLj5
DTFuzx64g44DNor5D1KzU/WYkHe+MX2tPxwnfXjckrnQbw1TZzl8Zmk2mUxViesi
KaC1mTBYUjLDj++fiFW5MP3yK+sigcXZJ9COMAr2ue6zpdzc6ja097lIRZO0dreD
iV5YkYj9uZOxji5m18jfuaTvjGbDjfDH9ZHRNmARUOPPmn7xGF+dPqkcKaSIn3KW
gpP0L5oF1mP0iNuOU0bI9gi6J6UAjfJz9E3yukqrteObw+F4SMEulNPq+WQzxOYw
FeNaakufIF8SYii7yoWKK6qG30zHds+BMBxxdj3dB+Px23J1J1R2kDGD8Y13fNkN
bygFgK6z6A6Qw/4O4m8BcO99rrNet+0+dd1tA4mc8GNAqA4jXRCJgWeoy6eLB3y7
Cx6QecJ0YOHnsyBrrpxxFiPDkhWsFL2DeBY6iQOqjagQPJWKKcI=
=iBZ7
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"There is an ACPI stubs fix which is ACKed by the ACPI maintainer for
merging through my tree.
One item stand out and that is that I delete the <linux/sdb.h> header
that is used by nothing. I deleted this subsystem (through the GPIO
tree) a while back so I feel responsible for tidying up the floor.
Other than that it is the usual mistakes, a bit noisy around build
issue and Kconfig then driver fixes.
Specifics:
- Fix some stubs causing compile issues for ACPI.
- Fix some wakeups on AMD IRQs shared between GPIO and SCI.
- Fix a build warning in the Tegra driver.
- Fix a Kconfig issue in the Qualcomm driver.
- Add a missing include the RALink driver.
- Return a valid type for the Apple pinctrl IRQs.
- Implement some Qualcomm SDM845 dual-edge errata.
- Remove the unused <linux/sdb.h> header. (The subsystem was once
deleted by the pinctrl maintainer...)
- Fix a duplicate initialized in the Tegra driver.
- Fix register offsets for UFS and SDC in the Qualcomm SM8350 driver"
* tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: sm8350: Correct UFS and SDC offsets
pinctrl: tegra194: remove duplicate initializer again
Remove unused header <linux/sdb.h>
pinctrl: qcom: sdm845: Enable dual edge errata
pinctrl: apple: Always return valid type in apple_gpio_irq_type
pinctrl: ralink: include 'ralink_regs.h' in 'pinctrl-mt7620.c'
pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP
pinctrl: tegra: Return const pointer from tegra_pinctrl_get_group()
pinctrl: amd: Fix wakeups when IRQ is shared with SCI
ACPI: Add stubs for wakeup handler functions
When hugetlb_vm_op_open() is called during copy_vma(), we may take the
reference to resv_map->css. Later, when clearing the reservation
pointer of old_vma after transferring it to new_vma, we forget to drop
the reference to resv_map->css. This leads to a reference leak of css.
Fixes this by adding a check to drop reservation css reference in
clear_vma_resv_huge_pages()
Link: https://lkml.kernel.org/r/20211113154412.91134-1-minhquangbui99@gmail.com
Fixes: 550a7d60bd ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f7991 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull exit-vs-signal handling fixes from Eric Biederman:
"This is a small set of changes where debuggers were no longer able to
intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
cleanups.
This is essentially the change you suggested with all of i's dotted
and the t's crossed so that ptrace can intercept all of the cases it
has been able to intercept the past, and all of the cases that made it
to exit without giving ptrace a chance still don't give ptrace a
chance"
* 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Replace force_fatal_sig with force_exit_sig when in doubt
signal: Don't always set SA_IMMUTABLE for forced signals
Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.
Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit. This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.
This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.
In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig. That can be done where it matters on
a case-by-case basis with careful analysis.
Reported-by: Kyle Huey <me@kylehuey.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c02 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27ad7 ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf07 ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444f8 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306d5 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d634 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df8a ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5bf1 ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Current release - regressions:
- devlink: don't throw an error if flash notification sent before
devlink visible
- page_pool: Revert "page_pool: disable dma mapping support...",
turns out there are active arches who need it
Current release - new code bugs:
- amt: cancel delayed_work synchronously in amt_fini()
Previous releases - regressions:
- xsk: fix crash on double free in buffer pool
- bpf: fix inner map state pruning regression causing program
rejections
- mac80211: drop check for DONT_REORDER in __ieee80211_select_queue,
preventing mis-selecting the best effort queue
- mac80211: do not access the IV when it was stripped
- mac80211: fix radiotap header generation, off-by-one
- nl80211: fix getting radio statistics in survey dump
- e100: fix device suspend/resume
Previous releases - always broken:
- tcp: fix uninitialized access in skb frags array for Rx 0cp
- bpf: fix toctou on read-only map's constant scalar tracking
- bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
- tipc: only accept encrypted MSG_CRYPTO msgs
- smc: transfer remaining wait queue entries during fallback,
fix missing wake ups
- udp: validate checksum in udp_read_sock() (when sockmap is used)
- sched: act_mirred: drop dst for the direction from egress to ingress
- virtio_net_hdr_to_skb: count transport header in UFO, prevent
allowing bad skbs into the stack
- nfc: reorder the logic in nfc_{un,}register_device, fix unregister
- ipsec: check return value of ipv6_skip_exthdr
- usb: r8152: add MAC passthrough support for more Lenovo Docks
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGWf08ACgkQMUZtbf5S
Irt+lxAAj8FAoLoSmQKUK3LttLLh0ZQQXu8Riey+wrP8Z9Yp8xWXIaVRF1c0vCE6
clbrF+mLfk6Wvv/RzOgwyBMHvK+djr/oVDNSmjlRvss4MLDfOQZhUV8V4XpvF4Up
hI7wyKfHtd7niosNqil6wklJFpLU8WyIAWrPSIPE6JlPkJmcm3GUGsliwEPwdLY1
yl7z4zsxigjA+hKxYqNQX6tixF3xnbDUbAnWshrSPL89melwz4GMao45qmcxJEVr
EipPhKifk0hT067jG08FMXcKBFKt6rKk7SVNo4mtq8Tl6HleJBj8fdaJAjSdFahB
+rYJ0sDZwGoDL5CxZ5mD3fM1cDgh4WFEM0z//0b/bZhoPDRKEpLr9LPuv+N6+/rA
8D98EHsvyNjlFgdyd8celMstiGtBn4YLEoLNYYh9Qibgm0XsCuv0yox7g0AOLMmQ
QiBmh2EnaXNPQ8PRZNMK3VH5ol2KoYWL6yrpJYV+wOWVLfezwlSsjkPSfW5pF9FG
hU0iQBp/YTCdCadR9YLj8qfDWDUAkCN7WpqIu9EA9FXJcYjJVaix0MA/tAVlzXyR
xlB7cU6O5NABcs/+04zPkKLwKbVYNMqgvKE+FVDVm+BKxo0UMxcmz/Np/ZYxfhkF
bwKplaiPb2H4D6t0sdxqaeYirPwt1BcleLilae6vHG1jO90H9Vw=
=tlqV
-----END PGP SIGNATURE-----
Merge tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, mac80211.
Current release - regressions:
- devlink: don't throw an error if flash notification sent before
devlink visible
- page_pool: Revert "page_pool: disable dma mapping support...",
turns out there are active arches who need it
Current release - new code bugs:
- amt: cancel delayed_work synchronously in amt_fini()
Previous releases - regressions:
- xsk: fix crash on double free in buffer pool
- bpf: fix inner map state pruning regression causing program
rejections
- mac80211: drop check for DONT_REORDER in __ieee80211_select_queue,
preventing mis-selecting the best effort queue
- mac80211: do not access the IV when it was stripped
- mac80211: fix radiotap header generation, off-by-one
- nl80211: fix getting radio statistics in survey dump
- e100: fix device suspend/resume
Previous releases - always broken:
- tcp: fix uninitialized access in skb frags array for Rx 0cp
- bpf: fix toctou on read-only map's constant scalar tracking
- bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing
progs
- tipc: only accept encrypted MSG_CRYPTO msgs
- smc: transfer remaining wait queue entries during fallback, fix
missing wake ups
- udp: validate checksum in udp_read_sock() (when sockmap is used)
- sched: act_mirred: drop dst for the direction from egress to
ingress
- virtio_net_hdr_to_skb: count transport header in UFO, prevent
allowing bad skbs into the stack
- nfc: reorder the logic in nfc_{un,}register_device, fix unregister
- ipsec: check return value of ipv6_skip_exthdr
- usb: r8152: add MAC passthrough support for more Lenovo Docks"
* tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
ipv6: check return value of ipv6_skip_exthdr
e100: fix device suspend/resume
devlink: Don't throw an error if flash notification sent before devlink visible
page_pool: Revert "page_pool: disable dma mapping support..."
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
octeontx2-af: debugfs: don't corrupt user memory
NFC: add NCI_UNREG flag to eliminate the race
NFC: reorder the logic in nfc_{un,}register_device
NFC: reorganize the functions in nci_request
tipc: check for null after calling kmemdup
i40e: Fix display error code in dmesg
i40e: Fix creation of first queue by omitting it if is not power of two
i40e: Fix warning message and call stack during rmmod i40e driver
i40e: Fix ping is lost after configuring ADq on VF
i40e: Fix changing previously set num_queue_pairs for PFs
i40e: Fix NULL ptr dereference on VSI filter sync
i40e: Fix correct max_pkt_size on VF RX queue
...
These functions are wrappers around zero_user_segments(), which means
that zero_user_segments() can now be called for compound pages even when
CONFIG_TRANSPARENT_HUGEPAGE is disabled.
Use 'xend' as the name of the parameter to indicate that this is an
excluded end, not the more usual included end. Excluding the end makes
more sense to the callers, but can cause confusion to readers who are
more used to seeing included ends.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
* Cleanups for the perf test infrastructure and mapping hugepages
* Avoid contention on mmap_sem when the guests start to run
* Add event channel upcall support to xen_shinfo_test
x86 changes:
* Fixes for Xen emulation
* Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
* Fixes for migration of 32-bit nested guests on 64-bit hypervisor
* Compilation fixes
* More SEV cleanups
Generic:
* Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
and num_online_cpus(). Most architectures were only using one of the two.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGV/PAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMrogf/eAyilGRQL7lLETn3DTVlgLVv82+z
giX11HlUhUmATHIDluj/wVQUjVcY6AO4SnvFaudX7B+mibndkw4L19IubP/koQZu
xnKSJTn+mVANdzz3UdsHl0ujbPdQJaFCIPW6iewbn2GRRZMwA5F3vMK/H09XRApL
I7kq8CPA6sC0I3TPzPN3ROxigexzYunZmGQ4qQe0GUdtxHrJOYQN++ddmWbQoEIC
gdFTyF7CUQ+lmJe0b/Y88yhISFAJCEBuKFlg9tOTuxSfwvPX6lUu+pi+utEx9M+O
ckTSQli/apZ4RVcSzxMIwX/BciYqhqOz5uMG+w4DRlJixtGSHtjiEVxGxw==
=Iij4
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Selftest changes:
- Cleanups for the perf test infrastructure and mapping hugepages
- Avoid contention on mmap_sem when the guests start to run
- Add event channel upcall support to xen_shinfo_test
x86 changes:
- Fixes for Xen emulation
- Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
- Fixes for migration of 32-bit nested guests on 64-bit hypervisor
- Compilation fixes
- More SEV cleanups
Generic:
- Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
and num_online_cpus(). Most architectures were only using one of
the two"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
KVM: x86: Assume a 64-bit hypercall for guests with protected state
selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
riscv: kvm: fix non-kernel-doc comment block
KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
KVM: SEV: Drop a redundant setting of sev->asid during initialization
KVM: SEV: WARN if SEV-ES is marked active but SEV is not
KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
KVM: x86/xen: Use sizeof_field() instead of open-coding it
KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmGWF6YACgkQUqAMR0iA
lPJJ+RAAm9pi/EElKKl+lOlBl+ehJlKuNLnPQWFmmaRc9xd0ruUipp1nsaktLJ8f
R/PkSwR/YWpBWlF8P4o+x9sOFyTNyLasoHtqsinEcAJI4lb7d1KOrPliTXyr15Ai
A303djwJmwCw5KxAPOjkG/nMBlpMIAQRee9GDWs1ykfSlIsI4jp7isVbCFNCQNVF
auHYq1bfJ5MJYPjxIDZUt+NF7kg7dD4k4g+VCVjaH1u8pGeaCUCtnNjMFOk1XfU8
yFQnaDcrAu4zJPq3d74z4eN9Bk+su8+DhnfrAEFjuFxGTgYc2MyRt0gGFeiUtNs4
rvST/eHBO4zeuL18S8G+fLcig/9ZYE73xzjdOCzRzLDjn0VQr9t06ez1QqJOb4D6
A4SSufwek5NIqYKMlhV/az2EceQYK8Wv3KAz8w98KDfwvVVhUSgE23MbTCO0hvQU
PWF35d3hQ+9oH0ZGYRumb8OpXtKJ+2KmzyN8Z0xhivHFBIKlcW6IBGhWRANclJO8
jNAM3jiwi8fRDVM2wI1fmgeEmMhG+WuTI3dJVu3tu4vI923FW5GdY6ev5EvH0Ts0
khTwIjtmCHUJGSeWajy3Gi9irdyhPyPNRMqgal4GS+gGpVU2mMMKTG+NzxxtCRKR
BUgfCjFDoDJWrNWIzzOwNqgF0Y+V9GgCZOkb73u/y+xVx0Rmc6U=
=wbBy
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
- Try to flush backtraces from other CPUs also on the local one. This
was a regression caused by printk_safe buffers removal.
- Remove header dependency warning.
* tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Remove printk.h inclusion in percpu.h
printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
This reverts commit d00e60ee54.
As reported by Guillaume in [1]:
Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT
in 32-bit systems, which breaks the bootup proceess when a
ethernet driver is using page pool with PP_FLAG_DMA_MAP flag.
As we were hoping we had no active consumers for such system
when we removed the dma mapping support, and LPAE seems like
a common feature for 32 bits system, so revert it.
1. https://www.spinics.net/lists/netdev/msg779890.html
Fixes: d00e60ee54 ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: "kernelci.org bot" <bot@kernelci.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Fixes for Xen emulation
* Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
* Fixes for migration of 32-bit nested guests on 64-bit hypervisor
* Compilation fixes
* More SEV cleanups
In commit 7e2175ebd6 ("KVM: x86: Fix recording of guest steal time /
preempted status") I removed the only user of these functions because
it was basically impossible to use them safely.
There are two stages to the GFN->PFN mapping; first through the KVM
memslots to a userspace HVA and then through the page tables to
translate that HVA to an underlying PFN. Invalidations of the former
were being handled correctly, but no attempt was made to use the MMU
notifiers to invalidate the cache when the HVA->GFN mapping changed.
As a prelude to reinventing the gfn_to_pfn_cache with more usable
semantics, rip it out entirely and untangle the implementation of
the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it.
All current users of kvm_vcpu_map() also look broken right now, and
will be dealt with separately. They broadly fall into two classes:
* Those which map, access the data and immediately unmap. This is
mostly gratuitous and could just as well use the existing user
HVA, and could probably benefit from a gfn_to_hva_cache as they
do so.
* Those which keep the mapping around for a longer time, perhaps
even using the PFN directly from the guest. These will need to
be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map()
can be removed too.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-8-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of setting a bit in the fs_flags to set a bit in the
address_space, set the bit in the address_space directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Many architectures do not include asm-generic/cacheflush.h, so turn
the includes on their head and add linux/cacheflush.h which includes
asm/cacheflush.h.
Move the flush_dcache_folio() declaration from asm-generic/cacheflush.h
to linux/cacheflush.h and change linux/highmem.h to include
linux/cacheflush.h instead of asm/cacheflush.h so that all necessary
places will see flush_dcache_folio().
More functions should have their default implementations moved in the
future, but those are for follow-on patches. This fixes csky, sparc and
sparc64 which were missed in the commit which added flush_dcache_folio().
Fixes: 08b0b0059b ("mm: Add flush_dcache_folio()")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
virtio_net_hdr_to_skb does not set the skb's gso_size and gso_type
correctly for UFO packets received via virtio-net that are a little over
the GSO size. This can lead to problems elsewhere in the networking
stack, e.g. ovs_vport_send dropping over-sized packets if gso_size is
not set.
This is due to the comparison
if (skb->len - p_off > gso_size)
not properly accounting for the transport layer header.
p_off includes the size of the transport layer header (thlen), so
skb->len - p_off is the size of the TCP/UDP payload.
gso_size is read from the virtio-net header. For UFO, fragmentation
happens at the IP level so does not need to include the UDP header.
Hence the calculation could be comparing a TCP/UDP payload length with
an IP payload length, causing legitimate virtio-net packets to have
lack gso_type/gso_size information.
Example: a UDP packet with payload size 1473 has IP payload size 1481.
If the guest used UFO, it is not fragmented and the virtio-net header's
flags indicate that it is a GSO frame (VIRTIO_NET_HDR_GSO_UDP), with
gso_size = 1480 for an MTU of 1500. skb->len will be 1515 and p_off
will be 42, so skb->len - p_off = 1473. Hence the comparison fails, and
shinfo->gso_size and gso_type are not set as they should be.
Instead, add the UDP header length before comparing to gso_size when
using UFO. In this way, it is the size of the IP payload that is
compared to gso_size.
Fixes: 6dd912f826 ("net: check untrusted gso_size at kernel entry")
Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmGUEocACgkQSD+KveBX
+j6ynwf/UKQ4I+ZexNfgDmINn4qNgGjnOOq8EfaPPBUzUODObmgPHV7fPT8ADX3b
sLx7CgVGRHnk36XqDhptYrIDEtbfa26epRvr3PGU4FcUmz+PoJprQtWBK2B5PDiT
nwSmxerEauiji7LxlXx9YmtfpnbeHDFanjJyYbVnuZkqREMP7S/B73KLFnW7WEyU
EyEKu+oahGx3C4CiF7Fg20w9mP2fujC7pw+og+g6T9dDiF/ynTCf1KZMXq1EwA1M
+/GS37Lm4VEoCrumiF6HGYmr9gRIFhMXBTaTRYzuz/d7FPxFrtF/88IF0xfmFmLr
7SNBWjmNnjnCaXNIlY1IbyHh9/P83Q==
=a7ES
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-fixes-2021-11-16
Please pull this mlx5 fixes series, or let me know in case of any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2021-11-16
We've added 12 non-merge commits during the last 5 day(s) which contain
a total of 23 files changed, 573 insertions(+), 73 deletions(-).
The main changes are:
1) Fix pruning regression where verifier went overly conservative rejecting
previsouly accepted programs, from Alexei Starovoitov and Lorenz Bauer.
2) Fix verifier TOCTOU bug when using read-only map's values as constant
scalars during verification, from Daniel Borkmann.
3) Fix a crash due to a double free in XSK's buffer pool, from Magnus Karlsson.
4) Fix libbpf regression when cross-building runqslower, from Jean-Philippe Brucker.
5) Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_*() helpers in tracing
programs due to deadlock possibilities, from Dmitrii Banshchikov.
6) Fix checksum validation in sockmap's udp_read_sock() callback, from Cong Wang.
7) Various BPF sample fixes such as XDP stats in xdp_sample_user, from Alexander Lobakin.
8) Fix libbpf gen_loader error handling wrt fd cleanup, from Kumar Kartikeya Dwivedi.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
udp: Validate checksum in udp_read_sock()
bpf: Fix toctou on read-only map's constant scalar tracking
samples/bpf: Fix build error due to -isystem removal
selftests/bpf: Add tests for restricted helpers
bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
libbpf: Perform map fd cleanup for gen_loader in case of error
samples/bpf: Fix incorrect use of strlen in xdp_redirect_cpu
tools/runqslower: Fix cross-build
samples/bpf: Fix summary per-sec stats in xdp_sample_user
selftests/bpf: Check map in map pruning
bpf: Fix inner map state pruning regression.
xsk: Fix crash on double free in buffer pool
====================
Link: https://lore.kernel.org/r/20211116141134.6490-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
E-Switch encap mode is relevant only when in switchdev mode.
The RDMA driver can query the encap configuration via
mlx5_eswitch_get_encap_mode(). Make sure it returns the currently
used mode and not the set one.
This reverts the cited commit which reset the encap mode
on entering switchdev and fixes the original issue properly.
Fixes: 9a64144d68 ("net/mlx5: E-Switch, Fix default encap mode")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Commit a23740ec43 ("bpf: Track contents of read-only maps as scalars") is
checking whether maps are read-only both from BPF program side and user space
side, and then, given their content is constant, reading out their data via
map->ops->map_direct_value_addr() which is then subsequently used as known
scalar value for the register, that is, it is marked as __mark_reg_known()
with the read value at verification time. Before a23740ec43, the register
content was marked as an unknown scalar so the verifier could not make any
assumptions about the map content.
The current implementation however is prone to a TOCTOU race, meaning, the
value read as known scalar for the register is not guaranteed to be exactly
the same at a later point when the program is executed, and as such, the
prior made assumptions of the verifier with regards to the program will be
invalid which can cause issues such as OOB access, etc.
While the BPF_F_RDONLY_PROG map flag is always fixed and required to be
specified at map creation time, the map->frozen property is initially set to
false for the map given the map value needs to be populated, e.g. for global
data sections. Once complete, the loader "freezes" the map from user space
such that no subsequent updates/deletes are possible anymore. For the rest
of the lifetime of the map, this freeze one-time trigger cannot be undone
anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_*
cmd calls which would update/delete map entries will be rejected with -EPERM
since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also
means that pending update/delete map entries must still complete before this
guarantee is given. This corner case is not an issue for loaders since they
create and prepare such program private map in successive steps.
However, a malicious user is able to trigger this TOCTOU race in two different
ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is
used to expand the competition interval, so that map_update_elem() can modify
the contents of the map after map_freeze() and bpf_prog_load() were executed.
This works, because userfaultfd halts the parallel thread which triggered a
map_update_elem() at the time where we copy key/value from the user buffer and
this already passed the FMODE_CAN_WRITE capability test given at that time the
map was not "frozen". Then, the main thread performs the map_freeze() and
bpf_prog_load(), and once that had completed successfully, the other thread
is woken up to complete the pending map_update_elem() which then changes the
map content. For ii) the idea of the batched update is similar, meaning, when
there are a large number of updates to be processed, it can increase the
competition interval between the two. It is therefore possible in practice to
modify the contents of the map after executing map_freeze() and bpf_prog_load().
One way to fix both i) and ii) at the same time is to expand the use of the
map's map->writecnt. The latter was introduced in fc9702273e ("bpf: Add mmap()
support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2 ("bpf:
Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with
the rationale to make a writable mmap()'ing of a map mutually exclusive with
read-only freezing. The counter indicates writable mmap() mappings and then
prevents/fails the freeze operation. Its semantics can be expanded beyond
just mmap() by generally indicating ongoing write phases. This would essentially
span any parallel regular and batched flavor of update/delete operation and
then also have map_freeze() fail with -EBUSY. For the check_mem_access() in
the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all
last pending writes have completed via bpf_map_write_active() test. Once the
map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0
only then we are really guaranteed to use the map's data as known constants.
For map->frozen being set and pending writes in process of still being completed
we fall back to marking that register as unknown scalar so we don't end up
making assumptions about it. With this, both TOCTOU reproducers from i) and
ii) are fixed.
Note that the map->writecnt has been converted into a atomic64 in the fix in
order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating
map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning
the freeze_mutex over entire map update/delete operations in syscall side
would not be possible due to then causing everything to be serialized.
Similarly, something like synchronize_rcu() after setting map->frozen to wait
for update/deletes to complete is not possible either since it would also
have to span the user copy which can sleep. On the libbpf side, this won't
break d66562fba1 ("libbpf: Add BPF object skeleton support") as the
anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed
mmap()-ed memory where for .rodata it's non-writable.
Fixes: a23740ec43 ("bpf: Track contents of read-only maps as scalars")
Reported-by: w1tcher.bupt@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 6a80b30086 ("fmc: Delete the FMC subsystem") removed the last user
of <linux/sdb.h>, but left the header file behind. Nothing uses this file,
delete it now.
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Link: https://lore.kernel.org/r/20211102220203.940290-5-corbet@lwn.net
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The commit ddfd9dcf27 ("ACPI: PM: Add acpi_[un]register_wakeup_handler()")
added new functions for drivers to use during the s2idle wakeup path, but
didn't add stubs for when CONFIG_ACPI wasn't set.
Add those stubs in for other drivers to be able to use.
Fixes: ddfd9dcf27 ("ACPI: PM: Add acpi_[un]register_wakeup_handler()")
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20211101014853.6177-1-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
After the commit 42a0bb3f71 ("printk/nmi: generic solution for safe
printk in NMI") the printk.h is not needed anymore in percpu.h.
Moreover `make headerdep` complains (an excerpt)
In file included from linux/printk.h,
from linux/dynamic_debug.h:188
from linux/printk.h:559 <-- here
from linux/percpu.h:9
from linux/idr.h:17
include/net/9p/client.h:13: warning: recursive header inclusion
Yeah, it's not a root cause of this, but removing will help to reduce
the noise.
Fixes: 42a0bb3f71 ("printk/nmi: generic solution for safe printk in NMI")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211112140749.80042-1-andriy.shevchenko@linux.intel.com
A fix to only copy the size of the field to the histogram string
did not take into account that the size can be larger than the
storage.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYZHGYBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qi4RAP9Lr7RqTRQQ3C9BHZfCmIgwZtAqT+Z4
U+nHva6FcI9ufQEAtWAAHleVHUcfVB90mahMFSEnJ7yESKC3k1ZKXsTsYwo=
=X961
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Update to tracing histogram variable string copy
A fix to only copy the size of the field to the histogram string did
not take into account that the size can be larger than the storage"
* tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add length protection to histogram string copies
The string copies to the histogram storage has a max size of 256 bytes
(defined by MAX_FILTER_STR_VAL). Only the string size of the event field
needs to be copied to the event storage, but no more than what is in the
event storage. Although nothing should be bigger than 256 bytes, there's
no protection against overwriting of the storage if one day there is.
Copy no more than the destination size, and enforce it.
Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the
min() comparison of the string sizes of comparable types.
Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 63f84ae6b8 ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
timer delivery stops working for a new child task because copy_process()
copies state information which is only valid for the parent task.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDVUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYocOFD/42NOdli73N+Jdq7APHUIHXzu+6DVT6
CI5toLQw+0KPoF0s1wg4+J0YCDt2k0Pu4lOabF3Ze/+c6RlR5zfCXESqsXdHaCjh
E91Vs57u0ataRMEHo6KB6eBIutuF8hyxfY6vVXfkTRNAreUIWiwWYrlB0G64JVOG
+/l1W7adovjLcLwcW+ArrnLJwkBKtXunK6PVv2IrdRHwpMHbwoNRCCCFvzkqnWmQ
4Yy2/NaB/PEBK5kezP1/j9EMcGCTWk1JJIm+l/PEwCCcbIgIdUahpW3XHAaqms6R
oukqCvE5ukfmVzBFYBhCamhF8heyEeBVRqGU+Yyk48+I+DQFBCqaqa1NKSuEUdNL
Nycy6Rp1yn7CHVSB461shMS6NJGOSNDBjv7vxer3WjV3HPJu7y0RrN7jXbkSfQnm
hVKjkmbDEYwylgzFE5+T857NqD5MEXeuIBtTO08hNRnpd61aB3x+qq+8ElE6ST8Y
pm6rMzw0AZ5buPK8QdGVDk0dD4WKObj1LzmRZvBtYeWynO6sxyKUl6B2CgAxrvn5
D1Li2/arkJMCVeIuIL5uE6DPoxSh8J7OuEC4KeWX8M8xQSEDImqfZ+tDL2Esv6jv
xDmymq584hiCBc1CJjCOA9kZYe6KNXC7lkVOns6GaKKzLhkrcvUR3dUGhMyzxAMO
t9QIAinR6JwRRA==
=EBbc
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for POSIX CPU timers to address a problem where POSIX CPU
timer delivery stops working for a new child task because
copy_process() copies state information which is only valid for the
parent task"
* tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
- Core code:
A regression fix for the Open Firmware interrupt mapping code where a
interrupt controller property in a node caused a map property in the
same node to be ignored.
- Interrupt chip drivers:
- Workaround a limitation in SiFive PLIC interrupt chip which silently
ignores an EOI when the interrupt line is masked.
- Provide the missing mask/unmask implementation for the CSKY MP
interrupt controller.
- PCI/MSI:
- Prevent a use after free when PCI/MSI interrupts are released by
destroying the sysfs entries before freeing the memory which is
accessed in the sysfs show() function.
- Implement a mask quirk for the Nvidia ION AHCI chip which does not
advertise masking capability despite implementing it. Even worse the
chip comes out of reset with all MSI entries masked, which due to the
missing masking capability never get unmasked.
- Move the check which prevents accessing the MSI[X] masking for XEN
back into the low level accessors. The recent consolidation missed
that these accessors can be invoked from places which do not have
that check which broke XEN. Move them back to he original place
instead of sprinkling tons of these checks all over the code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDCsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTL5D/4n7CUudohHPckr0Rl3LbnSUfyY9g3H
irTKur71AT392YerJtQp+WBp3AKYMDD8wPTgydfpWe95ouIjx5jhb/co7uSifG6k
ZssXYS10bkvjqyS8E2s5FnA5xbnagunK/R981qju14Ec39xqx1JzlUnO/Pra0Kcr
5rBV7br9jJMBleBI4OFuS9fS8dVL1MH/yushkuDNfIKEnaElnaxaYUk/ZdzkMMAW
lt1B+dPhK24t1hXQvZKp/iVQUGrJWdzzy9aDiUYPv1IZP+V5nbLMgmFvEv8jNdNa
6kkfp0l30nXM9rgvcp2KkasVUPVhurVEwitzz9+tT6LRA+/kSwi2yx8/FwCVUcL6
xD0AgKQgxOj/WwGJTZswvPu3afsLuw3rGmx5uH1IV40P9mPX0AiHWgvoaInHjzlJ
QKFQ7mJEuUcC6cJ36RGqX9njhKvPIcUENGCTjGSffcXsWltPrOCg2mQFcsDa9fSH
qPfXDVv4YINI+0MAlOULh6TLWQ07xy37HiskJu/AgILOfipoDi8pXdqNJRfvxB1S
D3O8vB+SH3lPj69w4dtj7539SdNZn8CCyN3RbNlstl2vHV5Bus3cVk0CcOhG8qNW
KwK/tSH8O0ZYHAsUu8OqBipXy6qOPi/10MJQn3NOpvvOmS4oDd+82bq+jp5qJpsG
42WNuzEoBdaUiA==
=LBQL
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem
Core code:
- A regression fix for the Open Firmware interrupt mapping code where
a interrupt controller property in a node caused a map property in
the same node to be ignored.
Interrupt chip drivers:
- Workaround a limitation in SiFive PLIC interrupt chip which
silently ignores an EOI when the interrupt line is masked.
- Provide the missing mask/unmask implementation for the CSKY MP
interrupt controller.
PCI/MSI:
- Prevent a use after free when PCI/MSI interrupts are released by
destroying the sysfs entries before freeing the memory which is
accessed in the sysfs show() function.
- Implement a mask quirk for the Nvidia ION AHCI chip which does not
advertise masking capability despite implementing it. Even worse
the chip comes out of reset with all MSI entries masked, which due
to the missing masking capability never get unmasked.
- Move the check which prevents accessing the MSI[X] masking for XEN
back into the low level accessors. The recent consolidation missed
that these accessors can be invoked from places which do not have
that check which broke XEN. Move them back to he original place
instead of sprinkling tons of these checks all over the code"
* tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
of/irq: Don't ignore interrupt-controller when interrupt-map failed
irqchip/sifive-plic: Fixup EOI failed when masked
irqchip/csky-mpintc: Fixup mask/unmask implementation
PCI/MSI: Destroy sysfs before freeing entries
PCI: Add MSI masking quirk for Nvidia ION AHCI
PCI/MSI: Deal with devices lying about their MSI mask capability
PCI/MSI: Move non-mask check back into low level accessors
the preemption model
- clear cluster CPU masks too, on the CPU unplug path
- prevent use-after-free in cfs
- Prevent a race condition when updating CPU cache domains
- Factor out common shared part of smp_prepare_cpus() into a common
helper which can be called by both baremetal and Xen, in order to fix a
booting of Xen PV guests
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ8HcACgkQEsHwGGHe
VUouoA//WAZ/dZu7IiM06JhZWswa2yNsdU8qQHys81lEqstaBqiWuZdg1qJTVIir
2d0aN0keiPcsLyAsp1UJ2g/K/7D5vSJWDzsHKfEAToiAm8Tntai2LlSocWWfeSQm
10grDHWpEHbj0hTHTA6HYOr2WbY4/LnR4cdL0WobIzivIrRTx49d0XUOUfWLP5KX
60uM6dSjwpJrQUnvzk+bhGiHVmutFrEJy+UU/0o+nxkdhwraNiSbLi0007BGRCof
6dokRRvLLR09dl1LMG51gVjQch4j/lCx6EWWUhYOFeV3I3gibSCNkmu7dpmMCBTR
QWO01cR9gyFN4xQ2is4I36M5L0/8T+sbGvvXIXNDT/XWr0/p+g6p2mx0cd2XiYIr
ZthGRcxxV/KGmxfPaygKS9tpQseMEIrdd6VjAnGfZ3OS6CtUvYt8d0B2Soj8FALQ
N9fMXDIEP3uUZim8UvCT6HBKlj9LR5uI5n+dAQ6uzsenO9WqeGeldc/N26/+osdN
vo4lNYTqiXJPhJvunYW5t4j5JnUa3grDHioAPWaQRJlWtEZBGKs9SXTcweg/KURb
mNfe1RfSlGJt28RD3E18gXeSS7xWdKgpcVX1rmW/9tUjX04NNDWjq4sAzOj7c+Ir
4sr78XgCY0pUxFaFYxvQWFUy7wcm0zAczo1RGUhcDTf1edDEvjo=
=s2MX
-----END PGP SIGNATURE-----
Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Avoid touching ~100 config files in order to be able to select the
preemption model
- clear cluster CPU masks too, on the CPU unplug path
- prevent use-after-free in cfs
- Prevent a race condition when updating CPU cache domains
- Factor out common shared part of smp_prepare_cpus() into a common
helper which can be called by both baremetal and Xen, in order to fix
a booting of Xen PV guests
* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt: Restore preemption model selection configs
arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
sched/fair: Prevent dead task groups from regaining cfs_rq's
sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
x86/smp: Factor out parts of native_smp_prepare_cpus()
This patch reverts two prior patches, e7310c9402
("security: implement sctp_assoc_established hook in selinux") and
7c2ef0240e ("security: add sctp_assoc_established hook"), which
create the security_sctp_assoc_established() LSM hook and provide a
SELinux implementation. Unfortunately these two patches were merged
without proper review (the Reviewed-by and Tested-by tags from
Richard Haines were for previous revisions of these patches that
were significantly different) and there are outstanding objections
from the SELinux maintainers regarding these patches.
Work is currently ongoing to correct the problems identified in the
reverted patches, as well as others that have come up during review,
but it is unclear at this point in time when that work will be ready
for inclusion in the mainline kernel. In the interest of not keeping
objectionable code in the kernel for multiple weeks, and potentially
a kernel release, we are reverting the two problematic patches.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This PR includes 5 commits that update the zstd library version:
1. Adds a new kernel-style wrapper around zstd. This wrapper API
is functionally equivalent to the subset of the current zstd API that is
currently used. The wrapper API changes to be kernel style so that the symbols
don't collide with zstd's symbols. The update to zstd-1.4.10 maintains the same
API and preserves the semantics, so that none of the callers need to be
updated. All callers are updated in the commit, because there are zero
functional changes.
2. Adds an indirection for `lib/decompress_unzstd.c` so it
doesn't depend on the layout of `lib/zstd/` to include every source file.
This allows the next patch to be automatically generated.
3. Imports the zstd-1.4.10 source code. This commit is automatically generated
from upstream zstd (https://github.com/facebook/zstd).
4. Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`.
5. Fixes a newly added build warning for clang.
The discussion around this patchset has been pretty long, so I've included a
FAQ-style summary of the history of the patchset, and why we are taking this
approach.
Why do we need to update?
-------------------------
The zstd version in the kernel is based off of zstd-1.3.1, which is was released
August 20, 2017. Since then zstd has seen many bug fixes and performance
improvements. And, importantly, upstream zstd is continuously fuzzed by OSS-Fuzz,
and bug fixes aren't backported to older versions. So the only way to sanely get
these fixes is to keep up to date with upstream zstd. There are no known security
issues that affect the kernel, but we need to be able to update in case there
are. And while there are no known security issues, there are relevant bug fixes.
For example the problem with large kernel decompression has been fixed upstream
for over 2 years https://lkml.org/lkml/2020/9/29/27.
Additionally the performance improvements for kernel use cases are significant.
Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:
- BtrFS zstd compression at levels 1 and 3 is 5% faster
- BtrFS zstd decompression+read is 15% faster
- SquashFS zstd decompression+read is 15% faster
- F2FS zstd compression+write at level 3 is 8% faster
- F2FS zstd decompression+read is 20% faster
- ZRAM decompression+read is 30% faster
- Kernel zstd decompression is 35% faster
- Initramfs zstd decompression+build is 5% faster
On top of this, there are significant performance improvements coming down the
line in the next zstd release, and the new automated update patch generation
will allow us to pull them easily.
How is the update patch generated?
----------------------------------
The first two patches are preparation for updating the zstd version. Then the
3rd patch in the series imports upstream zstd into the kernel. This patch is
automatically generated from upstream. A script makes the necessary changes and
imports it into the kernel. The changes are:
- Replace all libc dependencies with kernel replacements and rewrite includes.
- Remove unncessary portability macros like: #if defined(_MSC_VER).
- Use the kernel xxhash instead of bundling it.
This automation gets tested every commit by upstream's continuous integration.
When we cut a new zstd release, we will submit a patch to the kernel to update
the zstd version in the kernel.
The automated process makes it easy to keep the kernel version of zstd up to
date. The current zstd in the kernel shares the guts of the code, but has a lot
of API and minor changes to work in the kernel. This is because at the time
upstream zstd was not ready to be used in the kernel envrionment as-is. But,
since then upstream zstd has evolved to support being used in the kernel as-is.
Why are we updating in one big patch?
-------------------------------------
The 3rd patch in the series is very large. This is because it is restructuring
the code, so it both deletes the existing zstd, and re-adds the new structure.
Future updates will be directly proportional to the changes in upstream zstd
since the last import. They will admittidly be large, as zstd is an actively
developed project, and has hundreds of commits between every release. However,
there is no other great alternative.
One option ruled out is to replay every upstream zstd commit. This is not feasible
for several reasons:
- There are over 3500 upstream commits since the zstd version in the kernel.
- The automation to automatically generate the kernel update was only added recently,
so older commits cannot easily be imported.
- Not every upstream zstd commit builds.
- Only zstd releases are "supported", and individual commits may have bugs that were
fixed before a release.
Another option to reduce the patch size would be to first reorganize to the new
file structure, and then apply the patch. However, the current kernel zstd is formatted
with clang-format to be more "kernel-like". But, the new method imports zstd as-is,
without additional formatting, to allow for closer correlation with upstream, and
easier debugging. So the patch wouldn't be any smaller.
It also doesn't make sense to import upstream zstd commit by commit going
forward. Upstream zstd doesn't support production use cases running of the
development branch. We have a lot of post-commit fuzzing that catches many bugs,
so indiviudal commits may be buggy, but fixed before a release. So going forward,
I intend to import every (important) zstd release into the Kernel.
So, while it isn't ideal, updating in one big patch is the only patch I see forward.
Who is responsible for this code?
---------------------------------
I am. This patchset adds me as the maintainer for zstd. Previously, there was no tree
for zstd patches. Because of that, there were several patches that either got ignored,
or took a long time to merge, since it wasn't clear which tree should pick them up.
I'm officially stepping up as maintainer, and setting up my tree as the path through
which zstd patches get merged. I'll make sure that patches to the kernel zstd get
ported upstream, so they aren't erased when the next version update happens.
How is this code tested?
------------------------
I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS, Kernel,
InitRAMFS). I also tested Kernel & InitRAMFS on i386 and aarch64. I checked both
performance and correctness.
Also, thanks to many people in the community who have tested these patches locally.
If you have tested the patches, please reply with a Tested-By so I can collect them
for the PR I will send to Linus.
Lastly, this code will bake in linux-next before being merged into v5.16.
Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
------------------------------------------------------------
This patchset has been outstanding since 2020, and zstd-1.4.10 was the latest
release when it was created. Since the update patch is automatically generated
from upstream, I could generate it from zstd-1.5.0. However, there were some
large stack usage regressions in zstd-1.5.0, and are only fixed in the latest
development branch. And the latest development branch contains some new code that
needs to bake in the fuzzer before I would feel comfortable releasing to the
kernel.
Once this patchset has been merged, and we've released zstd-1.5.1, we can update
the kernel to zstd-1.5.1, and exercise the update process.
You may notice that zstd-1.4.10 doesn't exist upstream. This release is an
artifical release based off of zstd-1.4.9, with some fixes for the kernel
backported from the development branch. I will tag the zstd-1.4.10 release after
this patchset is merged, so the Linux Kernel is running a known version of zstd
that can be debugged upstream.
Why was a wrapper API added?
----------------------------
The first versions of this patchset migrated the kernel to the upstream zstd
API. It first added a shim API that supported the new upstream API with the old
code, then updated callers to use the new shim API, then transitioned to the
new code and deleted the shim API. However, Cristoph Hellwig suggested that we
transition to a kernel style API, and hide zstd's upstream API behind that.
This is because zstd's upstream API is supports many other use cases, and does
not follow the kernel style guide, while the kernel API is focused on the
kernel's use cases, and follows the kernel style guide.
Where is the previous discussion?
---------------------------------
Links for the discussions of the previous versions of the patch set.
The largest changes in the design of the patchset are driven by the discussions
in V11, V5, and V1. Sorry for the mix of links, I couldn't find most of the the
threads on lkml.org.
V12: https://www.spinics.net/lists/linux-crypto/msg58189.html
V11: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/
V10: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/
V9: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/
V8: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/
V7: https://lkml.org/lkml/2020/12/3/1195
V6: https://lkml.org/lkml/2020/12/2/1245
V5: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/
V4: https://www.spinics.net/lists/linux-btrfs/msg105783.html
V3: https://lkml.org/lkml/2020/9/23/1074
V2: https://www.spinics.net/lists/linux-btrfs/msg105505.html
V1: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/
Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEmIwAqlFIzbQodPwyuzRpqaNEqPUFAmGJyKIACgkQuzRpqaNE
qPXnmw/+PKyCn6LvRQqNfdpF5f59j/B1Fab15tkpVyz3UWnCw+EKaPZOoTfIsjRf
7TMUVm4iGsm+6xBO/YrGdRl4IxocNgXzsgnJ1lTGDbvfRC1tG+YNwuv+EEXwKYq5
Yz3DRwDotgsrV0Kg05b+VIgkmAuY3ukmu2n09LnAdKkxoIgmHw3MIDCdVZW2Br4c
sjJmYI+fiJd7nAlbDa42VOrdTiLzkl/2BsjWBqTv6zbiQ5uuJGsKb7P3kpcybWzD
5C118pyE3qlVyvFz+UFu8WbN0NSf47DP22KV/3IrhNX7CVQxYBe+9/oVuPWTgRx0
4Vl0G6u7rzh4wDZuGqTC3LYWwH9GfycI0fnVC0URP2XMOcGfPlGd3L0PEmmAeTmR
fEbaGAN4dr0jNO3lmbyAGe/G8tvtXQx/4ZjS9Pa3TlQP24GARU/f78/blbKR87Vz
BGMndmSi92AscgXb9buO3bCwAY1YtH5WiFaZT1XVk42cj4MiOLvPTvP4UMzDDxcZ
56ahmAP/84kd6H+cv9LmgEMqcIBmxdUcO1nuAItJ4wdrMUgw3+lrbxwFkH9xPV7I
okC1K0TIVEobADbxbdMylxClAylbuW+37Pko97NmAlnzNCPNE38f3s3gtXRrUTaR
IP8jv5UQ7q3dFiWnNLLodx5KM6s32GVBKRLRnn/6SJB7QzlyHXU=
=Xb18
-----END PGP SIGNATURE-----
Merge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux
Pull zstd update from Nick Terrell:
"Update to zstd-1.4.10.
Add myself as the maintainer of zstd and update the zstd version in
the kernel, which is now 4 years out of date, to a much more recent
zstd release. This includes bug fixes, much more extensive fuzzing,
and performance improvements. And generates the kernel zstd
automatically from upstream zstd, so it is easier to keep the zstd
verison up to date, and we don't fall so far out of date again.
This includes 5 commits that update the zstd library version:
- Adds a new kernel-style wrapper around zstd.
This wrapper API is functionally equivalent to the subset of the
current zstd API that is currently used. The wrapper API changes to
be kernel style so that the symbols don't collide with zstd's
symbols. The update to zstd-1.4.10 maintains the same API and
preserves the semantics, so that none of the callers need to be
updated. All callers are updated in the commit, because there are
zero functional changes.
- Adds an indirection for `lib/decompress_unzstd.c` so it doesn't
depend on the layout of `lib/zstd/` to include every source file.
This allows the next patch to be automatically generated.
- Imports the zstd-1.4.10 source code. This commit is automatically
generated from upstream zstd (https://github.com/facebook/zstd).
- Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`.
- Fixes a newly added build warning for clang.
The discussion around this patchset has been pretty long, so I've
included a FAQ-style summary of the history of the patchset, and why
we are taking this approach.
Why do we need to update?
-------------------------
The zstd version in the kernel is based off of zstd-1.3.1, which is
was released August 20, 2017. Since then zstd has seen many bug fixes
and performance improvements. And, importantly, upstream zstd is
continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to
older versions. So the only way to sanely get these fixes is to keep
up to date with upstream zstd.
There are no known security issues that affect the kernel, but we need
to be able to update in case there are. And while there are no known
security issues, there are relevant bug fixes. For example the problem
with large kernel decompression has been fixed upstream for over 2
years [1]
Additionally the performance improvements for kernel use cases are
significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:
- BtrFS zstd compression at levels 1 and 3 is 5% faster
- BtrFS zstd decompression+read is 15% faster
- SquashFS zstd decompression+read is 15% faster
- F2FS zstd compression+write at level 3 is 8% faster
- F2FS zstd decompression+read is 20% faster
- ZRAM decompression+read is 30% faster
- Kernel zstd decompression is 35% faster
- Initramfs zstd decompression+build is 5% faster
On top of this, there are significant performance improvements coming
down the line in the next zstd release, and the new automated update
patch generation will allow us to pull them easily.
How is the update patch generated?
----------------------------------
The first two patches are preparation for updating the zstd version.
Then the 3rd patch in the series imports upstream zstd into the
kernel. This patch is automatically generated from upstream. A script
makes the necessary changes and imports it into the kernel. The
changes are:
- Replace all libc dependencies with kernel replacements and rewrite
includes.
- Remove unncessary portability macros like: #if defined(_MSC_VER).
- Use the kernel xxhash instead of bundling it.
This automation gets tested every commit by upstream's continuous
integration. When we cut a new zstd release, we will submit a patch to
the kernel to update the zstd version in the kernel.
The automated process makes it easy to keep the kernel version of zstd
up to date. The current zstd in the kernel shares the guts of the
code, but has a lot of API and minor changes to work in the kernel.
This is because at the time upstream zstd was not ready to be used in
the kernel envrionment as-is. But, since then upstream zstd has
evolved to support being used in the kernel as-is.
Why are we updating in one big patch?
-------------------------------------
The 3rd patch in the series is very large. This is because it is
restructuring the code, so it both deletes the existing zstd, and
re-adds the new structure. Future updates will be directly
proportional to the changes in upstream zstd since the last import.
They will admittidly be large, as zstd is an actively developed
project, and has hundreds of commits between every release. However,
there is no other great alternative.
One option ruled out is to replay every upstream zstd commit. This is
not feasible for several reasons:
- There are over 3500 upstream commits since the zstd version in the
kernel.
- The automation to automatically generate the kernel update was only
added recently, so older commits cannot easily be imported.
- Not every upstream zstd commit builds.
- Only zstd releases are "supported", and individual commits may have
bugs that were fixed before a release.
Another option to reduce the patch size would be to first reorganize
to the new file structure, and then apply the patch. However, the
current kernel zstd is formatted with clang-format to be more
"kernel-like". But, the new method imports zstd as-is, without
additional formatting, to allow for closer correlation with upstream,
and easier debugging. So the patch wouldn't be any smaller.
It also doesn't make sense to import upstream zstd commit by commit
going forward. Upstream zstd doesn't support production use cases
running of the development branch. We have a lot of post-commit
fuzzing that catches many bugs, so indiviudal commits may be buggy,
but fixed before a release. So going forward, I intend to import every
(important) zstd release into the Kernel.
So, while it isn't ideal, updating in one big patch is the only patch
I see forward.
Who is responsible for this code?
---------------------------------
I am. This patchset adds me as the maintainer for zstd. Previously,
there was no tree for zstd patches. Because of that, there were
several patches that either got ignored, or took a long time to merge,
since it wasn't clear which tree should pick them up. I'm officially
stepping up as maintainer, and setting up my tree as the path through
which zstd patches get merged. I'll make sure that patches to the
kernel zstd get ported upstream, so they aren't erased when the next
version update happens.
How is this code tested?
------------------------
I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS,
Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and
aarch64. I checked both performance and correctness.
Also, thanks to many people in the community who have tested these
patches locally.
Lastly, this code will bake in linux-next before being merged into
v5.16.
Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
------------------------------------------------------------
This patchset has been outstanding since 2020, and zstd-1.4.10 was the
latest release when it was created. Since the update patch is
automatically generated from upstream, I could generate it from
zstd-1.5.0.
However, there were some large stack usage regressions in zstd-1.5.0,
and are only fixed in the latest development branch. And the latest
development branch contains some new code that needs to bake in the
fuzzer before I would feel comfortable releasing to the kernel.
Once this patchset has been merged, and we've released zstd-1.5.1, we
can update the kernel to zstd-1.5.1, and exercise the update process.
You may notice that zstd-1.4.10 doesn't exist upstream. This release
is an artifical release based off of zstd-1.4.9, with some fixes for
the kernel backported from the development branch. I will tag the
zstd-1.4.10 release after this patchset is merged, so the Linux Kernel
is running a known version of zstd that can be debugged upstream.
Why was a wrapper API added?
----------------------------
The first versions of this patchset migrated the kernel to the
upstream zstd API. It first added a shim API that supported the new
upstream API with the old code, then updated callers to use the new
shim API, then transitioned to the new code and deleted the shim API.
However, Cristoph Hellwig suggested that we transition to a kernel
style API, and hide zstd's upstream API behind that. This is because
zstd's upstream API is supports many other use cases, and does not
follow the kernel style guide, while the kernel API is focused on the
kernel's use cases, and follows the kernel style guide.
Where is the previous discussion?
---------------------------------
Links for the discussions of the previous versions of the patch set
below. The largest changes in the design of the patchset are driven by
the discussions in v11, v5, and v1. Sorry for the mix of links, I
couldn't find most of the the threads on lkml.org"
Link: https://lkml.org/lkml/2020/9/29/27 [1]
Link: https://www.spinics.net/lists/linux-crypto/msg58189.html [v12]
Link: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ [v11]
Link: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ [v10]
Link: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ [v9]
Link: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ [v8]
Link: https://lkml.org/lkml/2020/12/3/1195 [v7]
Link: https://lkml.org/lkml/2020/12/2/1245 [v6]
Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v5]
Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html [v4]
Link: https://lkml.org/lkml/2020/9/23/1074 [v3]
Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html [v2]
Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v1]
Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
* tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux:
lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical
MAINTAINERS: Add maintainer entry for zstd
lib: zstd: Upgrade to latest upstream zstd version 1.4.10
lib: zstd: Add decompress_sources.h for decompress_unzstd
lib: zstd: Add kernel-specific API
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGP7FYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiZ2D/4rpInoHla32L3LcmLO+strpoBrRHCrVeVG
KUmSKcd1H0PWVxNrcAP/ilaOZ32A3iRInN7WcMd7CQhfFPaIGD1f8XPZuXPPhjXA
rj+NnvNXWciGYjPOROs5j5i/ZxuYC8MiMJp55D5cYiCc25Pi1dW6upRuHBnvNli0
YdFBBQGYQcBXjqxImyCTME61T2HxC+6899ah+s57+meDhzEnKeaEka5ADGb8wpf7
D5PSSTGROOEsEtoGaKqCJIq0mpiswFkTvsa9aD/3kBQOEGVQJ8aEQenCObCM9GDr
WgT+N4w89oIFKqzwLBJ4k8rlXzfAV91+1X95dhLBYsxPfuE5qVDNmWwNGhpRSe01
IUG0fG4t9QjEGCY8eYNDKzsyZ8xxg4auWPf/lkcMlf3zyaCxWv1FFegPjz99k5tg
wVDsQhrLb2tknXpE5ZCVjvv9w9gvx1AH2DUJJQYQBx/++HLRBw57Ua+ULwEs+fwc
lsqaE7yYSj85+MfUiV6yPrAGUnsqPOP5kNAXWgH2mSo8C+XKo8Lb1CPV1qdV/oN7
WBc1JMD/AFRJLR1X9y2XMWeONYf7/g/lIRrjHN+ykHSvOaVkgQ3Fk9NGpRxWdSRn
kS6IvdY6VFcf1db2YD47UbrZLMrz38MzGVtVmIMU7l8ZfOqRP82M54wrTFohxJgL
Zw3cAYWz+g==
=DrMK
-----END PGP SIGNATURE-----
Merge tag 'block-5.16-2021-11-13' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Set of fixes that should go into this merge window:
- ioctl vs read data race fixes (Shin'ichiro)
- blkcg use-after-free fix (Laibin)
- Last piece of the puzzle for add_disk() error handling, enable
__must_check for (Luis)
- Request allocation fixes (Ming)
- Misc fixes (me)"
* tag 'block-5.16-2021-11-13' of git://git.kernel.dk/linux-block:
blk-mq: fix filesystem I/O request allocation
blkcg: Remove extra blkcg_bio_issue_init
block: Hold invalidate_lock in BLKRESETZONE ioctl
blk-mq: rename blk_attempt_bio_merge
blk-mq: don't grab ->q_usage_counter in blk_mq_sched_bio_merge
block: fix kerneldoc for disk_register_independent_access__ranges()
block: add __must_check for *add_disk*() callers
block: use enum type for blk_mq_alloc_data->rq_flags
block: Hold invalidate_lock in BLKZEROOUT ioctl
block: Hold invalidate_lock in BLKDISCARD ioctl
in 5.7 are now enabled by default. This should greatly speed up things
like rm, tar and rsync. To opt out, wsync mount option can be used.
Other than that we have a pile of bug fixes all across the filesystem
from Jeff, Xiubo and Kotresh and a metrics infrastructure rework from
Luis.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmGORY8THGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi5XmB/0SRQTW+BRAhYvSD/Ib2/c6uVZGnmJU
MUCO/uuD4fZvfxyMVb3qnAzHIGh3YlFWa/dgSZNStvLmY0L9P0MIvyaYolVuC4Tu
7llX1I+yckTns9VmiULBNy9D812eRY282nMbRzikMGPO1eb6Yqo3r50AdTvkam/R
Qs5pfwRqLerbP7VUv4vSsrBflwVyHOrCFZaUUVTu4f2kKz/FzZd/FCw5VV51KYzq
ygN+8eFotf5zluMX0tl0FXVgJ13N9vbg+YNxyOzHmOAV0AhSmmtV8vJ+j+m7+sOj
b7wNl5AuI5Ogeg8PHSGsXOHBVn6IbhdGDcougEVL+1gHkxxOQ5hfBHWQ
=22Vd
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"One notable change here is that async creates and unlinks introduced
in 5.7 are now enabled by default. This should greatly speed up things
like rm, tar and rsync. To opt out, wsync mount option can be used.
Other than that we have a pile of bug fixes all across the filesystem
from Jeff, Xiubo and Kotresh and a metrics infrastructure rework from
Luis"
* tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client:
ceph: add a new metric to keep track of remote object copies
libceph, ceph: move ceph_osdc_copy_from() into cephfs code
ceph: clean-up metrics data structures to reduce code duplication
ceph: split 'metric' debugfs file into several files
ceph: return the real size read when it hits EOF
ceph: properly handle statfs on multifs setups
ceph: shut down mount on bad mdsmap or fsmap decode
ceph: fix mdsmap decode when there are MDS's beyond max_mds
ceph: ignore the truncate when size won't change with Fx caps issued
ceph: don't rely on error_string to validate blocklisted session.
ceph: just use ci->i_version for fscache aux info
ceph: shut down access to inode when async create fails
ceph: refactor remove_session_caps_cb
ceph: fix auth cap handling logic in remove_session_caps_cb
ceph: drop private list from remove_session_caps_cb
ceph: don't use -ESTALE as special return code in try_get_cap_refs
ceph: print inode numbers instead of pointer values
ceph: enable async dirops by default
libceph: drop ->monmap and err initialization
ceph: convert to noop_direct_IO
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmGNO7sACgkQ+7dXa6fL
C2sDTA//SLJBMoY719Z/RvZcZb3PUkuwYtqu8j/F6C1n231yg5TrlkchslV635Ph
4lUuy/+pEVtsWot4JBxBsziBsi5WCjucn6opFAO2UOyT0aiE9ucY2MG+fNo6/b5k
RRGqbEODKHScC7pm00AJlqd8gJUvHz6Zy08KetHvkSI4bqBz5VpKxDSNxcWvsbx1
T6FMY+E61vSd0bEOp1/sZ1gRKK5nG9BJrba9V/MuOzj94MqVd9Ajdarr2vfQ7IyS
Qe16rOBM8u6oCRJqRcwdz2Ma0Zf/0Bm6JpoP7LsEdbzXLKPiHPWM6kNd9WZFmMt1
KFGGC3xG3Yxufasdpf5KCa6wlw1U5hWVITqRubHGxg49IdnrwxNK2zLqpJNr/lZC
vsOg31PkAWmJiMCAxhwL+u++Qar27jlXcdiO6tyqIDYHWyzwGWqF5zlzZ46NR26W
SX7oB36drIzH+UMDqxKGti2hRYTPKKwjCJ6p0EfhEYQ0oGa/bNFJA3bPxupJCkJe
PD0pnXdEmhKwvY0fLH6Ghr/PAckQttstTFpaHZ40XFzglgd3Sm5DEcg2Xm38LQ+n
4elMUA+c807ZTDLSDkGTL9QKPmgoM3AFoAsVxV9eqtEYAzYdnYM1GJ0CGOWM1lvs
vDrB7rqn/CFB6ks8k1BOalq2DTmPVU1I1f1GwnkyOiVtlxyrslk=
=K23c
-----END PGP SIGNATURE-----
Merge tag 'netfs-folio-20211111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfs, 9p, afs and ceph (partial) foliation from David Howells:
"This converts netfslib, 9p and afs to use folios. It also partially
converts ceph so that it uses folios on the boundaries with netfslib.
To help with this, a couple of folio helper functions are added in the
first two patches.
These patches don't touch fscache and cachefiles as I intend to remove
all the code that deals with pages directly from there. Only nfs and
cifs are using the old fscache I/O API now. The new API uses iov_iter
instead.
Thanks to Jeff Layton, Dominique Martinet and AuriStor for testing and
retesting the patches"
* tag 'netfs-folio-20211111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Use folios in directory handling
netfs, 9p, afs, ceph: Use folios
folio: Add a function to get the host inode for a folio
folio: Add a function to change the private data attached to a folio
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmGO6qEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOBLhAAlcQQnj1Rqkle+c3qnJLshq7/ZU12
xr4R7OxAsGlQfAsQzJZLBrZhgu9LtZeLx5wSCYyJKJrWvpFQ5nhwzFAybq3qiWNf
jh0hVTij9WNoDqFP58nUTAmhNH4qqkVvT4qu0Vulnfx7nuyDFF+nJKtpAnVC+Jgo
84uHx2ehjU/FNq+UCnIOstgu62zjJ8YalKKZmyREQn80tWR73sRz47NO+vGqXoER
/Pub67jKKc7CnbxCJODX8mhWf3f+usa7myl+z2xPDj32e17eR/QXU3ZmTR7w+SpH
49TNYIeOyrapD+9hRosdMVDskNz5yuyySk8ZBm+v2EpgmFMXhnYpqVLX5P2ilm1z
etzv8APCqJTOn05Alissx6bUH0D3OfKlmNgpRHaj2Iuc9S7fJlrvrlB5+1aWF2DS
O5NIu6ctmNBYAeFe2/Af1qRnoPBX099XDlGon4PPVROQkrKyz/OIrQHo2W2+Jppg
HysLRjAXuwOw1ODnvfkI2fNAYgRdcv0C9kU7Nqxxhafk87cO7v22grK0zFRjUutp
k9CfXFkXA1Eg32moxHOzijl1sohL70I2Z7wQj/xeuU7gYqayG1HoZXZVBJmfpMmP
pCeqe+pGaVBGoLF9ipENLT38eV6wbi5hG3ZFFlUGkTiKb6qokzj4gPQVyDXHLoQe
TIbdTIOfw8udATo=
=vK/E
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20211112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
"Unfortunately I need to request a revert for two LSM/SELinux patches
that came in via the network tree. The two patches in question add a
new SCTP/LSM hook as well as an SELinux implementation of that LSM
hook. The short version of "why?" is in the commit description of the
revert patch, but I'll copy-n-paste the important bits below to save
some time for the curious:
... Unfortunately these two patches were merged without proper
review (the Reviewed-by and Tested-by tags from Richard Haines
were for previous revisions of these patches that were
significantly different) and there are outstanding objections from
the SELinux maintainers regarding these patches.
Work is currently ongoing to correct the problems identified in
the reverted patches, as well as others that have come up during
review, but it is unclear at this point in time when that work
will be ready for inclusion in the mainline kernel. In the
interest of not keeping objectionable code in the kernel for
multiple weeks, and potentially a kernel release, we are reverting
the two problematic patches.
As usual with these things there is plenty of context to go with this
and I'll try to do my best to provide that now. This effort started
with a report of SCTP client side peel-offs not working correctly with
SELinux, Ondrej Mosnacek put forth a patch which he believed properly
addressed the problem but upon review by the netdev folks Xin Long
described some additional issues and submitted an improved patchset
for review. The SELinux folks reviewed Xin Long's initial patchset and
suggested some changes which resulted in a second patchset (v2) from
Xin Long; this is the patchset that is currently in your tree.
Unfortunately this v2 patchset from Xin Long was merged before it had
spent even just 24 hours on the mailing lists during the early days of
the merge window, a time when many of us were busy doing verification
of the newly released v5.15 kernel as well final review and testing of
our v5.16 pull requests. Making matters worse, upon reviewing the v2
patchset there were both changes which were found objectionable by
SELinux standards as well as additional outstanding SCTP/SELinux
interaction problems. At this point we did two things: resumed working
on a better fix for the SCTP/SELinux issue(s) - thank you Ondrej - and
we asked the networking folks to revert the v2 patchset.
The revert request was obviously rejected, but at the time I believed
it was just going to be an issue for linux-next; I wasn't expecting
something this significant that was merged into the networking tree
during the merge window to make it into your tree in the same window,
yet as of last night that is exactly what happened. While we continue
to try and resolve the SCTP/SELinux problem I am asking once again to
revert the v2 patches and not ship the current
security_sctp_assoc_established() hook in a v5.16-rcX kernel. If I was
confident that we could solve these issues in a week, maybe two, I
would refrain from asking for the revert but our current estimate is
for a minimum of two weeks for the next patch revision. With the
likelihood of additional delays due to normal patch review follow-up
and/or holidays it seems to me that the safest course of action is to
revert the patch both to try and keep some objectionable code out of a
release kernel and limit the chances of any new breakages from such a
change. While the SCTP/SELinux code in v5.15 and earlier has problems,
they are known problems, and I'd like to try and avoid creating new
and different problems while we work to fix things properly.
One final thing to mention: Xin Long's v2 patchset consisted of four
patches, yet this revert is for only the last two. We see the first
two patches as good, reasonable, and not likely to cause an issue. In
an attempt to create a cleaner revert patch we suggest leaving the
first two patches in the tree as they are currently"
* tag 'selinux-pr-20211112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
net,lsm,selinux: revert the security_sctp_assoc_established() hook
* Guest API and guest kernel support for SEV live migration
* SEV and SEV-ES intra-host migration
Bugfixes and cleanups for x86:
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status
* Fix selftests on APICv machines
* Fix sparse warnings
* Fix detection of KVM features in CPUID
* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
* Fixes and cleanups for MSR bitmap handling
* Cleanups for INVPCID
* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
Bugfixes for ARM:
* Fix finalization of host stage2 mappings
* Tighten the return value of kvm_vcpu_preferred_target()
* Make sure the extraction of ESR_ELx.EC is limited to architected bits
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGO1usUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOsXgf/d2CgZmTxZii/pOt0JKGSDKwRdoti
pfbmtXwTkagqg2tIJtEsxwvcc26Q2VLyg9mlEelX1I8KVlexSa8mDtbGWHLFilsc
JIZY8lE96wtlr9AyuN06K44QingDMIbXzlxcwO+muS3zTlSPsNkPcaVDk5PL35sN
Wy2GA6GCLWv6iTLCtlX5EzbcgkrR+Mypj4lGSdXGRqfBKVFOQjeVt+Q3YzOPuacS
03NBlYOmRxTnAGXfeAVs0/zEa69u/dYjBmwDPe6ERma4u8thHv0U/fDAh5C/ES7q
42Thes/JOYnEZiBQ1xZLsHqRII0achbJKZhmxqPwjRf/u6eXsfz0KY9FBg==
=kN8U
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"New x86 features:
- Guest API and guest kernel support for SEV live migration
- SEV and SEV-ES intra-host migration
Bugfixes and cleanups for x86:
- Fix misuse of gfn-to-pfn cache when recording guest steal time /
preempted status
- Fix selftests on APICv machines
- Fix sparse warnings
- Fix detection of KVM features in CPUID
- Cleanups for bogus writes to MSR_KVM_PV_EOI_EN
- Fixes and cleanups for MSR bitmap handling
- Cleanups for INVPCID
- Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
Bugfixes for ARM:
- Fix finalization of host stage2 mappings
- Tighten the return value of kvm_vcpu_preferred_target()
- Make sure the extraction of ESR_ELx.EC is limited to architected
bits"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from
KVM: x86: move guest_pv_has out of user_access section
KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()
KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT
KVM: nVMX: Clean up x2APIC MSR handling for L2
KVM: VMX: Macrofy the MSR bitmap getters and setters
KVM: nVMX: Handle dynamic MSR intercept toggling
KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
KVM: x86: Rename kvm_lapic_enable_pv_eoi()
KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES
KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows
kvm: mmu: Use fast PF path for access tracking of huge pages when possible
KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator
KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active
kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool
KVM: x86: Fix recording of guest steal time / preempted status
selftest: KVM: Add intra host migration tests
selftest: KVM: Add open sev dev helper
...
- Add PCI automatic error recovery.
- Fix tape driver timer initialization broken during timers api cleanup.
- Fix bogus CPU measurement counters values on CPUs offlining.
- Check the validity of subchanel before reading other fields in
the schib in cio code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmGPwDMACgkQjYWKoQLX
FBgCoQf/VBel0vDex9NNVo59OmGTNh9NPPT2cUU8vYEmwHfaBeInZVEx5WOxXijl
8MIbEgi6Wt3EwnIghjouC50nk8jCiNOJ8Z/wG+01zZpVpLk2GvKGjxoYxKg+5E6T
sOSr7TMeKOtOp23xKAXGVIzzkrDTSyr3qruTKg/m6TFhQ0XSm/ld2k6tR5AARbuB
UtsxBOtPWyHm1xPyuhjr+c6riK2vGQwJwYya4vGtIW8ix9uZoPabdqqzWsD3meBc
B6fe96YQGxA8Tt80FtyJ6pHEhNDr8CE656aJZNJCnd7q1RmLWC1R/aUme+9wqQtO
i9YmMvc+uzgQonpo+YgWqu9fIQqcXA==
=OjW1
-----END PGP SIGNATURE-----
Merge tag 's390-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Add PCI automatic error recovery.
- Fix tape driver timer initialization broken during timers api
cleanup.
- Fix bogus CPU measurement counters values on CPUs offlining.
- Check the validity of subchanel before reading other fields in the
schib in cio code.
* tag 's390-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: check the subchannel validity for dev_busid
s390/cpumf: cpum_cf PMU displays invalid value after hotplug remove
s390/tape: fix timer initialization in tape_std_assign()
s390/pci: implement minimal PCI error recovery
PCI: Export pci_dev_lock()
s390/pci: implement reset_slot for hotplug slot
s390/pci: refresh function handle in iomap
This set is mostly small fixes and cleanups, so more of a janitorial
update for this cycle.
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEEiOrDCAFJzPfAjcif3SOs138+s6EFAmGOQhEZHHRoaWVycnku
cmVkaW5nQGdtYWlsLmNvbQAKCRDdI6zXfz6zoaQGD/9CHCzKlhFn60fE1BI7FADK
tLIe4LdnjLju06CrCRE4jZVyJdHlo2iNUCTvsBv6BwrqyNrxGP33ZoCzJpmvGI69
aiBFhm4bJeUe8USUosXsgPaXPuzjQ+G3WMZmGdHs8zgCxw2Kpve5f3JXgpmgtYfM
uRGZDpYZxboeO2rx8xnyhhMk8BJ4rMAJInMu7pRhnV93MLPIieui0xbVRkboNxG+
FAElquk6eN2LlLYoSeFUqx7Na3GI+SGdfCkqX8SLe4AymD2Dm4Aj0+CsqM+Rwlff
w8TRO0bMXN5yrnol1ARD9TlNsmh/tTRDGSq0amxiXaPHPfCSPCAQHu274tAdQqOP
LVNxkm/JJFeanz3FJJot4saOwnTyRnOOXHxtbmze1kiZNDqs2fJhffKTJNptQoGm
QPn5PC0hLQhMMNJ5l+ENWAMVpRCtU4w5B28fmQyo3Np4VLbZcz20RAxl55O0roIv
NIDMVmIyDyxa2+VUOVEmeuRsGcAzGQwYpwjVzf2vmWxc1CMxmKw35TcSYvWIAwRp
LY/xeStEkNeGWf4YfsDeWIeUjveVDXzkMHbfrnQxt0gK3TLdvf3qoB6elFjd8EN+
unvfulvQ5RU4dXef0zduJAQQ4HMXbJvbaq8cZhZYxyPA0o9pZJcILxCXOdHqjSw9
q+uk+I7CtDOCZVqy0REAAA==
=KmUV
-----END PGP SIGNATURE-----
Merge tag 'pwm/for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"This set is mostly small fixes and cleanups, so more of a janitorial
update for this cycle"
* tag 'pwm/for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: vt8500: Rename pwm_busy_wait() to make it obviously driver-specific
dt-bindings: pwm: tpu: Add R-Car M3-W+ device tree bindings
dt-bindings: pwm: tpu: Add R-Car V3U device tree bindings
pwm: pwm-samsung: Trigger manual update when disabling PWM
pwm: visconti: Simplify using devm_pwmchip_add()
pwm: samsung: Describe driver in Kconfig
pwm: Make it explicit that pwm_apply_state() might sleep
pwm: Add might_sleep() annotations for !CONFIG_PWM API functions
pwm: atmel: Drop unused header