Commit graph

1107748 commits

Author SHA1 Message Date
Zongmin Zhou
c853246539 Input: vmmouse - disable vmmouse before entering suspend mode
Currently, when trying to suspend and resume with VirtualPS/2 VMMouse
there is an error message after resuming:

	psmouse serio1: vmmouse: Unable to re-enable mouse when reconnecting, err: -6

and the mouse will no longer be operable, requiring full rescan to find a
another driver to use for the port.

This error is due to QEMU still generating PS2 events which the kernel is
not consuming until resume time, where they interfere with mouse
identification and ultimately resulting in an error getting
VMMOUSE_VERSION_ID.

Test scenario:

1) start virtual machine with qemu command "vmport=on"
2) click suspend botton to enter suspend mode
3) resume and observe the error message in the kernel logs

Let's fix this by disabling the vmmouse in its reset handler. This will
notify qemu to stop vmmouse and remove the handler.

Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Link: https://lore.kernel.org/r/20220322021046.1087954-1-zhouzongmin@kylinos.cn
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18 15:02:13 -07:00
Stephen Boyd
d95bca4fbd dt-bindings: google,cros-ec-keyb: Fixup bad compatible match
This uses anyOf which is wrong. Use oneOf and move the items under the
description. Also drop allOf for $ref.

Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/CAE-0n50KE9bkqZvCOLtCGiq3g1jYhK7zpVcVFBzinaguNhNaPw@mail.gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18 15:01:06 -07:00
Marek Vasut
b26ff91371 Input: ili210x - use one common reset implementation
Rename ili251x_hardware_reset() to ili210x_hardware_reset(), change its
parameter from struct device * to struct gpio_desc *, and use it as one
single consistent reset implementation all over the driver. Also increase
the minimum reset duration to 12ms, to make sure the reset is really
within the spec.

Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20220518210423.106555-1-marex@denx.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18 14:31:31 -07:00
Marek Vasut
e4920d42ce Input: ili210x - fix reset timing
According to Ilitek "231x & ILI251x Programming Guide" Version: 2.30
"2.1. Power Sequence", "T4 Chip Reset and discharge time" is minimum
10ms and "T2 Chip initial time" is maximum 150ms. Adjust the reset
timings such that T4 is 12ms and T2 is 160ms to fit those figures.

This prevents sporadic touch controller start up failures when some
systems with at least ILI251x controller boot, without this patch
the systems sometimes fail to communicate with the touch controller.

Fixes: 201f3c8035 ("Input: ili210x - add reset GPIO support")
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20220518204901.93534-1-marex@denx.de
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-18 14:31:30 -07:00
Aidan MacDonald
2b0f3d70ce mips: ingenic: Do not manually reference the CPU clock
It isn't necessary to manually walk the device tree and enable
the CPU clock anymore. The CPU and other necessary clocks are
now flagged as critical in the clock driver, which accomplishes
the same thing in a more declarative fashion.

Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/20220428164454.17908-4-aidanmacdonald.0x0@gmail.com
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> # On X1000 and X1830
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-18 13:56:26 -07:00
Aidan MacDonald
ca54d06fca clk: ingenic: Mark critical clocks in Ingenic SoCs
Consider CPU, L2 cache, and memory clocks as critical to prevent
them -- and the parent clocks -- from being automatically gated,
since nothing calls clk_get() on these clocks.

Gating the CPU clock hangs the processor, and gating memory makes
external DRAM inaccessible. Normal kernel code can't hope to deal
with either situation so those clocks have to be critical.

The L2 cache is required only if caches are running, and could be
gated if the kernel takes care to flush and disable caches before
gating the clock. There's no mechanism to do this, and probably no
reason to do it, so it's simpler to mark the L2 cache as critical.

Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/20220428164454.17908-3-aidanmacdonald.0x0@gmail.com
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> # On X1000 and X1830
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-18 13:56:22 -07:00
Aidan MacDonald
bacf743e92 clk: ingenic: Allow specifying common clock flags
Provide a flags field for clocks under the ingenic-cgu driver,
which can be used to set generic common clock framework flags
on the created clocks. For example, the CLK_IS_CRITICAL flag
is needed for some clocks (such as CPU or memory) to stop them
being automatically disabled.

Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/20220428164454.17908-2-aidanmacdonald.0x0@gmail.com
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com> # On X1000 and X1830
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-18 13:56:16 -07:00
Hangyu Hua
bea0b66efa clk: ux500: fix a possible off-by-one in u8500_prcc_reset_base()
Off-by-one will happen when index == ARRAY_SIZE(ur->base).

Fixes: b14cbdfd46 ("clk: ux500: Add driver for the reset portions of PRCC")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220518062537.17933-1-hbh25y@gmail.com
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-18 13:34:03 -07:00
Mario Limonciello
7123d39dc2 drm/amd: Don't reset dGPUs if the system is going to s2idle
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.

This functionality was introduced in commit daf8de0874 ("drm/amdgpu:
always reset the asic in suspend (v2)").  A few other commits have
gone on top of the ASIC reset, but this still doesn't work on the A+A
configuration in s2idle.

Avoid doing the reset on dGPUs specifically when using s2idle.

Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-05-18 15:24:35 -04:00
Ilya Dryomov
d0bb883c63 libceph: fix misleading ceph_osdc_cancel_request() comment
cancel_request() never guaranteed that after its return the OSD
client would be completely done with the OSD request.  The callback
(if specified) can still be invoked and a ref can still be held.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
2022-05-18 21:21:29 +02:00
Ilya Dryomov
75dbb685f4 libceph: fix potential use-after-free on linger ping and resends
request_reinit() is not only ugly as the comment rightfully suggests,
but also unsafe.  Even though it is called with osdc->lock held for
write in all cases, resetting the OSD request refcount can still race
with handle_reply() and result in use-after-free.  Taking linger ping
as an example:

    handle_timeout thread                     handle_reply thread

                                              down_read(&osdc->lock)
                                              req = lookup_request(...)
                                              ...
                                              finish_request(req)  # unregisters
                                              up_read(&osdc->lock)
                                              __complete_request(req)
                                                linger_ping_cb(req)

      # req->r_kref == 2 because handle_reply still holds its ref

    down_write(&osdc->lock)
    send_linger_ping(lreq)
      req = lreq->ping_req  # same req
      # cancel_linger_request is NOT
      # called - handle_reply already
      # unregistered
      request_reinit(req)
        WARN_ON(req->r_kref != 1)  # fires
        request_init(req)
          kref_init(req->r_kref)

                   # req->r_kref == 1 after kref_init

                                              ceph_osdc_put_request(req)
                                                kref_put(req->r_kref)

            # req->r_kref == 0 after kref_put, req is freed

        <further req initialization/use> !!!

This happens because send_linger_ping() always (re)uses the same OSD
request for watch ping requests, relying on cancel_linger_request() to
unregister it from the OSD client and rip its messages out from the
messenger.  send_linger() does the same for watch/notify registration
and watch reconnect requests.  Unfortunately cancel_request() doesn't
guarantee that after it returns the OSD client would be completely done
with the OSD request -- a ref could still be held and the callback (if
specified) could still be invoked too.

The original motivation for request_reinit() was inability to deal with
allocation failures in send_linger() and send_linger_ping().  Switching
to using osdc->req_mempool (currently only used by CephFS) respects that
and allows us to get rid of request_reinit().

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
2022-05-18 21:21:05 +02:00
Mario Limonciello
0223e51647 drm/amd: Don't reset dGPUs if the system is going to s2idle
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.

This functionality was introduced in commit daf8de0874 ("drm/amdgpu:
always reset the asic in suspend (v2)").  A few other commits have
gone on top of the ASIC reset, but this still doesn't work on the A+A
configuration in s2idle.

Avoid doing the reset on dGPUs specifically when using s2idle.

Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-18 15:20:18 -04:00
Luben Tuikov
5ad25ace7c drm/amdgpu: Unmap legacy queue when MES is enabled
This fixes a kernel oops when MES is not enabled.

Reported-by: Kenny Ho <Kenny.Ho@amd.com>
Suggested-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Fixes: 18ee4ce63e ("drm/amdgpu: add mes unmap legacy queue routine")
Fixes: 3d879e81f0 ("drm/amdgpu: add init support for GFX11 (v2)")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-18 15:20:18 -04:00
Rafael J. Wysocki
d44d6c4a3a Update devfreq next for v5.19
Detailed description for this pull request:
 1. Update devfreq core
 - Add cpu based scaling support to passive governor. Some device like
 cache might require the dynamic frequency scaling. But, it has very
 tightly to cpu frequency. So that use passive governor to scale
 the frequency according to current cpu frequency.
 
 To decide the frequency of the device, the governor does one of the following:
 : Derives the optimal devfreq device opp from required-opps property of
   the parent cpu opp_table.
 
 : Scales the device frequency in proportion to the CPU frequency. So, if
   the CPUs are running at their max frequency, the device runs at its
   max frequency. If the CPUs are running at their min frequency, the
   device runs at its min frequency. It is interpolated for frequencies
   in between.
 
 2. Update devfreq driver
 - Update rk3399_dmc.c as following:
 : Convert dt-binding document to YAML and deprecate unused properties.
 
 : Use Hz units for the device-tree properties of rk3399_dmc.
 
 : rk3399_dmc is able to set the idle time before changing the dmc clock.
   Specify idle time parameters by using nano-second unit on dt bidning.
 
 : Add new disable-freq properties to optimize the power-saving feature
   of rk3399_dmc.
 
 : Disable devfreq-event device on remove() to fix unbalanced
   enable-disable count.
 
 : Use devm_pm_opp_of_add_table()
 
 : Block PMU (Power-Management Unit) transitions when scaling frequency
   by ARM Trust Firmware in order to fix the conflict between PMU and DMC
   (Dynamic Memory Controller).
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEsSpuqBtbWtRe4rLGnM3fLN7rz1MFAmKDddYWHGN3MDAuY2hv
 aUBzYW1zdW5nLmNvbQAKCRCczd8s3uvPU1o0EACGE7FV/pA/sbOFhhLXQlI1XMQf
 C2XVvjigP5M7Y9qgM6qLBKCSmRPbBjuO4VLKkIycmo2GRcUs+Sk5CqALjWZF0tDR
 aiAV+P3wIC2pCZNtAXJG6BhP5GNzGYYdv2zJfNmVUsYUYVC3ezppuE1Yp2Sscq5t
 K43eAOq0c9+TTeLfdDKdi07QtevMHwbOxjNUhwnzN5+yD9cknK1Ht5FVjDc8eYTG
 yBJPgMy9qhNqEZ4gl9zpKJkpAJMOhquT4ZtytNioIfnmKCrpyiOy+yApvW5WiU1d
 /8KMiG42C47rUBYCiEGEdn6PaTOpuOK0hnuoxlxE07dqJP349SbozFTqZxfbCJhI
 OjZSdLTolPFq82zuAafBJmijmeAiYdI0FKrhXmQo4doeleIQG9z0h8YpkQSnF47L
 xsj7QswD0LxtMQbYv5zds1NoGnUn/U+TpoJ8mbpg5gPWKaBTNdTZFJQj+HXtrDyy
 1ouR41u6kOoLLDHu2I5f9dTMptJRg3jS+kQfDv17K8u5mAmcwFidpOiyac6m2xad
 Nnn85NAE9JaxpzVe3+S3uQdscEe2FBWEZCkbrCcPn2T88BZwrfcby02YRBw22WkS
 p4xPAOncw4lhZ6z3pedcH/1+kk6uoTube5QK8ddSRxYrRavtOsqchAaec+hYRNif
 CbjJOqsdbDLa7ANLxw==
 =/3W8
 -----END PGP SIGNATURE-----

Merge tag 'devfreq-next-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux

Pull devfreq changes for 5.19-rc1 from Chanwoo Choi:

"1. Update devfreq core

 - Add cpu based scaling support to passive governor. Some device like
   cache might require the dynamic frequency scaling. But, it has very
   tightly to cpu frequency. So that use passive governor to scale
   the frequency according to current cpu frequency.

  To decide the frequency of the device, the governor does one of the following:
  : Derives the optimal devfreq device opp from required-opps property of
    the parent cpu opp_table.

  : Scales the device frequency in proportion to the CPU frequency. So, if
    the CPUs are running at their max frequency, the device runs at its
    max frequency. If the CPUs are running at their min frequency, the
    device runs at its min frequency. It is interpolated for frequencies
    in between.

 2. Update devfreq drivers

  - Update rk3399_dmc.c as following:

   : Convert dt-binding document to YAML and deprecate unused properties.

   : Use Hz units for the device-tree properties of rk3399_dmc.

   : rk3399_dmc is able to set the idle time before changing the dmc clock.
     Specify idle time parameters by using nano-second unit on dt bidning.

   : Add new disable-freq properties to optimize the power-saving feature
     of rk3399_dmc.

   : Disable devfreq-event device on remove() to fix unbalanced
     enable-disable count.

   : Use devm_pm_opp_of_add_table()

   : Block PMU (Power-Management Unit) transitions when scaling frequency
     by ARM Trust Firmware in order to fix the conflict between PMU and DMC
     (Dynamic Memory Controller)."

* tag 'devfreq-next-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
  PM / devfreq: passive: Keep cpufreq_policy for possible cpus
  PM / devfreq: passive: Reduce duplicate code when passive_devfreq case
  PM / devfreq: Add cpu based scaling support to passive governor
  PM / devfreq: Export devfreq_get_freq_range symbol within devfreq
  PM / devfreq: rk3399_dmc: Block PMU during transitions
  soc: rockchip: power-domain: Manage resource conflicts with firmware
  PM / devfreq: rk3399_dmc: Avoid static (reused) profile
  PM / devfreq: rk3399_dmc: Use devm_pm_opp_of_add_table()
  PM / devfreq: rk3399_dmc: Disable edev on remove()
  PM / devfreq: rk3399_dmc: Support new *-ns properties
  PM / devfreq: rk3399_dmc: Support new disable-freq properties
  PM / devfreq: rk3399_dmc: Use bitfield macro definitions for ODT_PD
  PM / devfreq: rk3399_dmc: Drop excess timing properties
  PM / devfreq: rk3399_dmc: Drop undocumented ondemand DT props
  dt-bindings: devfreq: rk3399_dmc: Add more disable-freq properties
  dt-bindings: devfreq: rk3399_dmc: Specify idle params in nanoseconds
  dt-bindings: devfreq: rk3399_dmc: Fix Hz units
  dt-bindings: devfreq: rk3399_dmc: Deprecate unused/redundant properties
  dt-bindings: devfreq: rk3399_dmc: Convert to YAML
2022-05-18 20:55:34 +02:00
Haowen Bai
5a66bfb277 thermal: intel: hfi: remove NULL check after container_of() call
container_of() will never return NULL, so remove useless code.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18 20:53:05 +02:00
Pavel Begunkov
0184f08e65 io_uring: add fully sparse buffer registration
Honour IORING_RSRC_REGISTER_SPARSE not only for direct files but fixed
buffers as well. It makes the rsrc API more consistent.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66f429e4912fe39fb3318217ff33a2853d4544be.1652879898.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-18 12:53:04 -06:00
Zhang Rui
f125bdbdd6 powercap: intel_rapl: add support for ALDERLAKE_N
Add ALDERLAKE_N to the list of supported processor models in the Intel
RAPL power capping driver.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18 20:40:50 +02:00
Lai Jiangshan
c42b145181 x86/sev: Annotate stack change in the #VC handler
In idtentry_vc(), vc_switch_off_ist() determines a safe stack to
switch to, off of the IST stack. Annotate the new stack switch with
ENCODE_FRAME_POINTER in case UNWINDER_FRAME_POINTER is used.

A stack walk before looks like this:

  CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc7+ #2
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   <TASK>
   dump_stack_lvl
   dump_stack
   kernel_exc_vmm_communication
   asm_exc_vmm_communication
   ? native_read_msr
   ? __x2apic_disable.part.0
   ? x2apic_setup
   ? cpu_init
   ? trap_init
   ? start_kernel
   ? x86_64_start_reservations
   ? x86_64_start_kernel
   ? secondary_startup_64_no_verify
   </TASK>

and with the fix, the stack dump is exact:

  CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc7+ #3
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   <TASK>
   dump_stack_lvl
   dump_stack
   kernel_exc_vmm_communication
   asm_exc_vmm_communication
  RIP: 0010:native_read_msr
  Code: ...
  < snipped regs >
   ? __x2apic_disable.part.0
   x2apic_setup
   cpu_init
   trap_init
   start_kernel
   x86_64_start_reservations
   x86_64_start_kernel
   secondary_startup_64_no_verify
   </TASK>

  [ bp: Test in a SEV-ES guest and rewrite the commit message to
    explain what exactly this does. ]

Fixes: a13644f3a5 ("x86/entry/64: Add entry code for #VC handler")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220316041612.71357-1-jiangshanlai@gmail.com
2022-05-18 20:36:03 +02:00
Hangyu Hua
947a844bb3 drm: msm: fix possible memory leak in mdp5_crtc_cursor_set()
drm_gem_object_lookup will call drm_gem_object_get inside. So cursor_bo
needs to be put when msm_gem_get_and_pin_iova fails.

Fixes: e172d10a9c ("drm/msm/mdp5: Add hardware cursor support")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220509061125.18585-1-hbh25y@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18 11:05:21 -07:00
Zhang Jianhua
b0487ede1f fs-verity: remove unused parameter desc_size in fsverity_create_info()
The parameter desc_size in fsverity_create_info() is useless and it is
not referenced anywhere. The greatest meaning of desc_size here is to
indecate the size of struct fsverity_descriptor and futher calculate the
size of signature. However, the desc->sig_size can do it also and it is
indeed, so remove it.

Therefore, it is no need to acquire desc_size by fsverity_get_descriptor()
in ensure_verity_info(), so remove the parameter desc_ret in
fsverity_get_descriptor() too.

Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220518132256.2297655-1-chris.zjh@huawei.com
2022-05-18 11:01:31 -07:00
Rob Clark
cec4e5cbb9 drm/msm: Fix fb plane offset calculation
The offset got dropped by accident.

Fixes: d413e6f971 ("drm/msm: Drop msm_gem_iova()")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Stephen Boyd <swboyd@chromium.org> # CoachZ
Link: https://lore.kernel.org/r/20220510165216.3577068-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18 10:55:47 -07:00
Miaoqian Lin
c56de48309 drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init
of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.

a6xx_gmu_init() passes the node to of_find_device_by_node()
and of_dma_configure(), of_find_device_by_node() will takes its
reference, of_dma_configure() doesn't need the node after usage.

Add missing of_node_put() to avoid refcount leak.

Fixes: 4b565ca5a2 ("drm/msm: Add A6XX device support")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Link: https://lore.kernel.org/r/20220512121955.56937-1-linmq006@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18 10:54:39 -07:00
Douglas Anderson
ec7981e6c6 drm/msm/dsi: don't powerup at modeset time for parade-ps8640
Commit 7d8e9a9050 ("drm/msm/dsi: move DSI host powerup to modeset
time") caused sc7180 Chromebooks that use the parade-ps8640 bridge
chip to fail to turn the display back on after it turns off.

Unfortunately, it doesn't look easy to fix the parade-ps8640 driver to
handle the new power sequence. The Linux driver has almost nothing in
it and most of the logic for this bridge chip is in black-box firmware
that the bridge chip uses.

Also unfortunately, reverting the patch will break "tc358762".

The long term solution here is probably Dave Stevenson's series [1]
that would give more flexibility. However, that is likely not a quick
fix.

For the short term, we'll look at the compatible of the next bridge in
the chain and go back to the old way for the Parade PS8640 bridge
chip. If it's found that other bridge chips also need this workaround
then we can add them to the list or consider inverting the
condition. However, the hope is that the framework will not take too
much longer to land and we won't have to add anything other than
ps8640 here.

[1] https://lore.kernel.org/r/cover.1646406653.git.dave.stevenson@raspberrypi.com

Fixes: 7d8e9a9050 ("drm/msm/dsi: move DSI host powerup to modeset time")
Suggested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Link: https://lore.kernel.org/r/20220513131504.v5.1.Ia196e35ad985059e77b038a41662faae9e26f411@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-05-18 10:51:42 -07:00
Xiu Jianfeng
29ed17389c cgroup: Make cgroup_debug static
Make cgroup_debug static since it's only used in cgroup.c

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-05-18 06:59:20 -10:00
Christoph Hellwig
354201c53e nvme: add support for TP4084 - Time-to-Ready Enhancements
Add support for using longer timeouts during controller initialization
and letting the controller come up with namespaces that are not ready
for I/O yet.  We skip these not ready namespaces during scanning and
only bring them online once anoter scan is kicked off by the AEN that
is set when the NRDY bit gets set in the  I/O Command Set Independent
Identify Namespace Data Structure.   This asynchronous probing avoids
blocking the kernel boot when controllers take a very long time to
recover after unclean shutdowns (up to minutes).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-05-18 18:54:17 +02:00
Al Viro
fb4554c223 Fix double fget() in vhost_net_set_backend()
Descriptor table is a shared resource; two fget() on the same descriptor
may return different struct file references.  get_tap_ptr_ring() is
called after we'd found (and pinned) the socket we'll be using and it
tries to find the private tun/tap data structures associated with it.
Redoing the lookup by the same file descriptor we'd used to get the
socket is racy - we need to same struct file.

Thanks to Jason for spotting a braino in the original variant of patch -
I'd missed the use of fd == -1 for disabling backend, and in that case
we can end up with sock == NULL and sock != oldsock.

Cc: stable@kernel.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-05-18 12:33:51 -04:00
Eli Cohen
acde392949 vdpa/mlx5: Use consistent RQT size
The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:

Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
   than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs is used
   to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the RQT can
   hold. This will emit errors and the driver state is compromised.

To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.

In addition to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take consider
it when creating the RQT.

Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.

Fixes: 52893733f2 ("vdpa/mlx5: Add multiqueue support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-18 12:31:31 -04:00
Daire McNamara
7013654af6 PCI: microchip: Fix potential race in interrupt handling
Clear the MSI bit in ISTATUS_LOCAL register after reading it, but
before reading and handling individual MSI bits from the ISTATUS_MSI
register. This avoids a potential race where new MSI bits may be set
on the ISTATUS_MSI register after it was read and be missed when the
MSI bit in the ISTATUS_LOCAL register is cleared.

ISTATUS_LOCAL is a read/write/clear register; the register's bits
are set when the corresponding interrupt source is activated. Each
source is independent and thus multiple sources may be active
simultaneously. The processor can monitor and clear status
bits. If one or more ISTATUS_LOCAL interrupt sources are active,
the RootPort issues an interrupt towards the processor (on
the AXI domain). Bit 28 of this register reports an MSI has been
received by the RootPort.

ISTATUS_MSI is a read/write/clear register. Bits 31-0 are asserted
when an MSI with message number 31-0 is received by the RootPort.
The processor must monitor and clear these bits.

Effectively, Bit 28 of ISTATUS_LOCAL informs the processor that
an MSI has arrived at the RootPort and ISTATUS_MSI informs the
processor which MSI (in the range 0 - 31) needs handling.

Reported by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/

Link: https://lore.kernel.org/r/20220517141622.145581-1-daire.mcnamara@microchip.com
Fixes: 6f15a9c9f9 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver")
Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-05-18 17:14:21 +01:00
Abhishek Sahu
7ab5e10eda vfio/pci: Move the unused device into low power state with runtime PM
Currently, there is very limited power management support
available in the upstream vfio_pci_core based drivers. If there
are no users of the device, then the PCI device will be moved into
D3hot state by writing directly into PCI PM registers. This D3hot
state help in saving power but we can achieve zero power consumption
if we go into the D3cold state. The D3cold state cannot be possible
with native PCI PM. It requires interaction with platform firmware
which is system-specific. To go into low power states (including D3cold),
the runtime PM framework can be used which internally interacts with PCI
and platform firmware and puts the device into the lowest possible
D-States.

This patch registers vfio_pci_core based drivers with the
runtime PM framework.

1. The PCI core framework takes care of most of the runtime PM
   related things. For enabling the runtime PM, the PCI driver needs to
   decrement the usage count and needs to provide 'struct dev_pm_ops'
   at least. The runtime suspend/resume callbacks are optional and needed
   only if we need to do any extra handling. Now there are multiple
   vfio_pci_core based drivers. Instead of assigning the
   'struct dev_pm_ops' in individual parent driver, the vfio_pci_core
   itself assigns the 'struct dev_pm_ops'. There are other drivers where
   the 'struct dev_pm_ops' is being assigned inside core layer
   (For example, wlcore_probe() and some sound based driver, etc.).

2. This patch provides the stub implementation of 'struct dev_pm_ops'.
   The subsequent patch will provide the runtime suspend/resume
   callbacks. All the config state saving, and PCI power management
   related things will be done by PCI core framework itself inside its
   runtime suspend/resume callbacks (pci_pm_runtime_suspend() and
   pci_pm_runtime_resume()).

3. Inside pci_reset_bus(), all the devices in dev_set needs to be
   runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take
   care of the runtime resume and its error handling.

4. Inside vfio_pci_core_disable(), the device usage count always needs
   to be decremented which was incremented in vfio_pci_core_enable().

5. Since the runtime PM framework will provide the same functionality,
   so directly writing into PCI PM config register can be replaced with
   the use of runtime PM routines. Also, the use of runtime PM can help
   us in more power saving.

   In the systems which do not support D3cold,

   With the existing implementation:

   // PCI device
   # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
   D3hot
   // upstream bridge
   # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
   D0

   With runtime PM:

   // PCI device
   # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
   D3hot
   // upstream bridge
   # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
   D3hot

   So, with runtime PM, the upstream bridge or root port will also go
   into lower power state which is not possible with existing
   implementation.

   In the systems which support D3cold,

   // PCI device
   # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
   D3hot
   // upstream bridge
   # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
   D0

   With runtime PM:

   // PCI device
   # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
   D3cold
   // upstream bridge
   # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
   D3cold

   So, with runtime PM, both the PCI device and upstream bridge will
   go into D3cold state.

6. If 'disable_idle_d3' module parameter is set, then also the runtime
   PM will be enabled, but in this case, the usage count should not be
   decremented.

7. vfio_pci_dev_set_try_reset() return value is unused now, so this
   function return type can be changed to void.

8. Use the runtime PM API's in vfio_pci_core_sriov_configure().
   The device can be in low power state either with runtime
   power management (when there is no user) or PCI_PM_CTRL register
   write by the user. In both the cases, the PF should be moved to
   D0 state. For preventing any runtime usage mismatch, pci_num_vf()
   has been called explicitly during disable.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18 10:00:48 -06:00
Abhishek Sahu
54918c2874 vfio/pci: Virtualize PME related registers bits and initialize to zero
If any PME event will be generated by PCI, then it will be mostly
handled in the host by the root port PME code. For example, in the case
of PCIe, the PME event will be sent to the root port and then the PME
interrupt will be generated. This will be handled in
drivers/pci/pcie/pme.c at the host side. Inside this, the
pci_check_pme_status() will be called where PME_Status and PME_En bits
will be cleared. So, the guest OS which is using vfio-pci device will
not come to know about this PME event.

To handle these PME events inside guests, we need some framework so
that if any PME events will happen, then it needs to be forwarded to
virtual machine monitor. We can virtualize PME related registers bits
and initialize these bits to zero so vfio-pci device user will assume
that it is not capable of asserting the PME# signal from any power state.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18 10:00:48 -06:00
Abhishek Sahu
f4162eb1e2 vfio/pci: Change the PF power state to D0 before enabling VFs
According to [PCIe v5 9.6.2] for PF Device Power Management States

 "The PF's power management state (D-state) has global impact on its
  associated VFs. If a VF does not implement the Power Management
  Capability, then it behaves as if it is in an equivalent
  power state of its associated PF.

  If a VF implements the Power Management Capability, the Device behavior
  is undefined if the PF is placed in a lower power state than the VF.
  Software should avoid this situation by placing all VFs in lower power
  state before lowering their associated PF's power state."

From the vfio driver side, user can enable SR-IOV when the PF is in D3hot
state. If VF does not implement the Power Management Capability, then
the VF will be actually in D3hot state and then the VF BAR access will
fail. If VF implements the Power Management Capability, then VF will
assume that its current power state is D0 when the PF is D3hot and
in this case, the behavior is undefined.

To support PF power management, we need to create power management
dependency between PF and its VF's. The runtime power management support
may help with this where power management dependencies are supported
through device links. But till we have such support in place, we can
disallow the PF to go into low power state, if PF has VF enabled.
There can be a case, where user first enables the VF's and then
disables the VF's. If there is no user of PF, then the PF can put into
D3hot state again. But with this patch, the PF will still be in D0
state after disabling VF's since detecting this case inside
vfio_pci_core_sriov_configure() requires access to
struct vfio_device::open_count along with its locks. But the subsequent
patches related to runtime PM will handle this case since runtime PM
maintains its own usage count.

Also, vfio_pci_core_sriov_configure() can be called at any time
(with and without vfio pci device user), so the power state change
and SR-IOV enablement need to be protected with the required locks.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18 10:00:48 -06:00
Abhishek Sahu
2b2c651baf vfio/pci: Invalidate mmaps and block the access in D3hot power state
According to [PCIe v5 5.3.1.4.1] for D3hot state

 "Configuration and Message requests are the only TLPs accepted by a
  Function in the D3Hot state. All other received Requests must be
  handled as Unsupported Requests, and all received Completions may
  optionally be handled as Unexpected Completions."

Currently, if the vfio PCI device has been put into D3hot state and if
user makes non-config related read/write request in D3hot state, these
requests will be forwarded to the host and this access may cause
issues on a few systems.

This patch leverages the memory-disable support added in commit
'abafbc551f ("vfio-pci: Invalidate mmaps and block MMIO access on
disabled memory")' to generate page fault on mmap access and
return error for the direct read/write. If the device is D3hot state,
then the error will be returned for MMIO access. The IO access generally
does not make the system unresponsive so the IO access can still happen
in D3hot state. The default value should be returned in this case
without bringing down the complete system.

Also, the power related structure fields need to be protected so
we can use the same 'memory_lock' to protect these fields also.
This protection is mainly needed when user changes the PCI
power state by writing into PCI_PM_CTRL register.
vfio_lock_and_set_power_state() wrapper function will take the
required locks and then it will invoke the vfio_pci_set_power_state().

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18 09:59:18 -06:00
Linus Torvalds
ef1302160b sound fixes for 5.18
A collection of last-minute HD- an USB-audio quirks in addition to
 a fix for the legacy ISA wavefront driver.
 
 All look small and easy.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmKDXUMOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE+PRw/+Pmfan+rymr86zTZ7kVmCzeVMB3r/1W6dPsMk
 rBYXOpdBrok7ovpgCvjx0WDLCCics/gSZCXg7RVzY/KypKrthPBpiUshe6on82zR
 Bmv9UMDKELuMeUPBHSCXxFDlTD0nzWk5Oz+WQN3hujM8HKBhT6v2jyuiGLMzBYkC
 pjnauvEVf9uNZq33oHyKtTaI43OIkrReG4q7GCpDpqMS4rKlt6hI5BGtUXpTHQsC
 CaR1d6XQcfjGr1peCcz0YZddxqhVOwujDCSyNSAted39HolsQUcIXVITsKYB4WXl
 rrLvUy+zxf8bKYO4bGfS04sXP9t7x0BOWUWeX0ApO97cQ1v95j4d363YFMDw3VE1
 /PIQxt/F+cq5cnhhl0goDTTN6pVA7eWcIck3XKmpqbduUGmWhhTm4WSuTeceK/7s
 Q0gWrX6vecLAFJjj2SBU4Se5rehDwoHbQ1xA4Pp2C1FgQf/47zg09K30UYEC3+iO
 Io8ohJZonQEf3+Y+NjFy72gaJ8mxMTA/kWCUIAm+DTr/4zGo/uC7I/5i0PbmeqVb
 OW26pfMYFcQ0e8MrqolQ9B0EBF+7/JEv/89w7F9CU/aF1VklWqUu0fWbXpHd9W0v
 JzhDzByc5NuUrdlfvwZl40Xd8Mf+LsnTO0nbdB+KqYQFUErmWYZTdzf+8zaC0Vka
 /6hpbMY=
 =9TJZ
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of last-minute HD- an USB-audio quirks in addition to a
  fix for the legacy ISA wavefront driver.

  All look small and easy"

* tag 'sound-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Restore Rane SL-1 quirk
  ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machine
  ALSA: hda/realtek: Add quirk for TongFang devices with pop noise
  ALSA: hda/realtek: Add quirk for the Framework Laptop
  ALSA: wavefront: Proper check of get_user() error
  ALSA: hda/realtek: Add quirk for Dell Latitude 7520
  ALSA: hda - fix unused Realtek function when PM is not enabled
  ALSA: usb-audio: Don't get sample rate for MCT Trigger 5 USB-to-HDMI
2022-05-18 05:53:43 -10:00
Pablo Neira Ayuso
9e539c5b6d netfilter: nf_tables: disable expression reduction infra
Either userspace or kernelspace need to pre-fetch keys inconditionally
before comparisons for this to work. Otherwise, register tracking data
is misleading and it might result in reducing expressions which are not
yet registers.

First expression is also guaranteed to be evaluated always, however,
certain expressions break before writing data to registers, before
comparing the data, leaving the register in undetermined state.

This patch disables this infrastructure by now.

Fixes: b2d306542f ("netfilter: nf_tables: do not reduce read-only expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-18 17:34:26 +02:00
Ritaro Takenaka
2738d9d963 netfilter: flowtable: move dst_check to packet path
Fixes sporadic IPv6 packet loss when flow offloading is enabled.

IPv6 route GC and flowtable GC are not synchronized.
When dst_cache becomes stale and a packet passes through the flow before
the flowtable GC teardowns it, the packet can be dropped.
So, it is necessary to check dst every time in packet path.

Fixes: 227e1e4d0d ("netfilter: nf_flowtable: skip device lookup from interface index")
Signed-off-by: Ritaro Takenaka <ritarot634@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-18 17:34:26 +02:00
Pablo Neira Ayuso
e5eaac2beb netfilter: flowtable: fix TCP flow teardown
This patch addresses three possible problems:

1. ct gc may race to undo the timeout adjustment of the packet path, leaving
   the conntrack entry in place with the internal offload timeout (one day).

2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
   timeout is reached before the flow offload del.

3. tcp ct is always set to ESTABLISHED with a very long timeout
   in flow offload teardown/delete even though the state might be already
   CLOSED. Also as a remark we cannot assume that the FIN or RST packet
   is hitting flow table teardown as the packet might get bumped to the
   slow path in nftables.

This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
state transition.

Moreover, teturn the connection's ownership to conntrack upon teardown
by clearing the offload flag and fixing the established timeout value.
The flow table GC thread will asynchonrnously free the flow table and
hardware offload entries.

Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
which is also misleading since the flow is back to classic conntrack
path.

If nf_ct_delete() removes the entry from the conntrack table, then it
calls nf_ct_put() which decrements the refcnt. This is not a problem
because the flowtable holds a reference to the conntrack object from
flow_offload_alloc() path which is released via flow_offload_free().

This patch also updates nft_flow_offload to skip packets in SYN_RECV
state. Since we might miss or bump packets to slow path, we do not know
what will happen there while we are still in SYN_RECV, this patch
postpones offload up to the next packet which also aligns to the
existing behaviour in tc-ct.

flow_offload_teardown() does not reset the existing tcp state from
flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
path might have already update the state to CLOSE/FIN.

Joint work with Oz and Sven.

Fixes: 1e5b2471bc ("netfilter: nf_flow_table: teardown flow timeout race")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-18 17:34:26 +02:00
Eric Biggers
c069db76ed ext4: fix memory leak in parse_apply_sb_mount_options()
If processing the on-disk mount options fails after any memory was
allocated in the ext4_fs_context, e.g. s_qf_names, then this memory is
leaked.  Fix this by calling ext4_fc_free() instead of kfree() directly.

Reproducer:

    mkfs.ext4 -F /dev/vdc
    tune2fs /dev/vdc -E mount_opts=usrjquota=file
    echo clear > /sys/kernel/debug/kmemleak
    mount /dev/vdc /vdc
    echo scan > /sys/kernel/debug/kmemleak
    sleep 5
    echo scan > /sys/kernel/debug/kmemleak
    cat /sys/kernel/debug/kmemleak

Fixes: 7edfd85b1f ("ext4: Completely separate options parsing and sb setup")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220513231605.175121-2-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-18 11:24:22 -04:00
Eric Biggers
cb8435dc8b ext4: reject the 'commit' option on ext2 filesystems
The 'commit' option is only applicable for ext3 and ext4 filesystems,
and has never been accepted by the ext2 filesystem driver, so the ext4
driver shouldn't allow it on ext2 filesystems.

This fixes a failure in xfstest ext4/053.

Fixes: 8dc0aa8cf0 ("ext4: check incompatible mount options while mounting ext2/3")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220510183232.172615-1-ebiggers@kernel.org
2022-05-18 11:24:22 -04:00
Yang Li
b10b6278ae ext4: remove duplicated #include of dax.h in inode.c
Fix following includecheck warning:
./fs/ext4/inode.c: linux/dax.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220504225025.44753-1-yang.lee@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-18 11:24:22 -04:00
Fabiano Rosas
ad55bae7dc KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint
We removed most of the vcore logic from the P9 path but there's still
a tracepoint that tried to dereference vc->runner.

Fixes: ecb6a7207f ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220328215831.320409-1-farosas@linux.ibm.com
2022-05-19 00:44:50 +10:00
Alexey Kardashevskiy
b22af90419 KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers
Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.

POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers
never execute on real P9 and later CPUs.

This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.

The default implemented hcalls list already contains XICS hcalls so
no change there.

This should not cause any behavioral change.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru
2022-05-19 00:44:28 +10:00
Alexey Kardashevskiy
29592181c5 KVM: PPC: Book3s: PR: Enable default TCE hypercalls
When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE
were already implemented and enabled by default; however H_GET_TCE
was missed out on PR KVM (probably because the handler was in
the real mode code at the time).

This enables H_GET_TCE by default. While at this, this wraps
the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506073737.3823347-1-aik@ozlabs.ru
2022-05-19 00:44:15 +10:00
Alexey Kardashevskiy
cad32d9d42 KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.

The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.

If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.

On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.

So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.

This removes the real mode handlers from KVM and related code from
the powernv platform.

This updates the list of implemented hcalls in KVM-HV as the realmode
handlers are removed.

This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2022-05-19 00:44:01 +10:00
Michael Ellerman
750137ec6c Merge branch 'fixes' into topic/ppc-kvm
Merge our fixes branch. In parciular this brings in the KVM TCE handling
fix, which is a prerequisite for a subsequent patch.
2022-05-19 00:43:04 +10:00
Fabiano Rosas
1d1cd0f12a KVM: PPC: Book3S HV: Initialize AMOR in nested entry
The hypervisor always sets AMOR to ~0, but let's ensure we're not
passing stale values around.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220425142151.1495142-1-farosas@linux.ibm.com
2022-05-19 00:28:49 +10:00
Michael Ellerman
a5fc286f69 Merge branch 'fixes' into next
Merge our fixes branch from this cycle. In particular this brings in a
papr_scm.c change which a subsequent patch has a dependency on.
2022-05-19 00:11:51 +10:00
Jason A. Donenfeld
12e45a2a63 random: credit architectural init the exact amount
RDRAND and RDSEED can fail sometimes, which is fine. We currently
initialize the RNG with 512 bits of RDRAND/RDSEED. We only need 256 bits
of those to succeed in order to initialize the RNG. Instead of the
current "all or nothing" approach, actually credit these contributions
the amount that is actually contributed.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18 15:53:53 +02:00
Jason A. Donenfeld
2f14062bb1 random: handle latent entropy and command line from random_init()
Currently, start_kernel() adds latent entropy and the command line to
the entropy bool *after* the RNG has been initialized, deferring when
it's actually used by things like stack canaries until the next time
the pool is seeded. This surely is not intended.

Rather than splitting up which entropy gets added where and when between
start_kernel() and random_init(), just do everything in random_init(),
which should eliminate these kinds of bugs in the future.

While we're at it, rename the awkwardly titled "rand_initialize()" to
the more standard "random_init()" nomenclature.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18 15:53:53 +02:00
Jason A. Donenfeld
8a5b8a4a4c random: use proper jiffies comparison macro
This expands to exactly the same code that it replaces, but makes things
consistent by using the same macro for jiffy comparisons throughout.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18 15:53:53 +02:00
Jason A. Donenfeld
cc1e127bfa random: remove ratelimiting for in-kernel unseeded randomness
The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the
kernel warns about all unseeded randomness or just the first instance.
There's some complicated rate limiting and comparison to the previous
caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled,
developers still don't see all the messages or even an accurate count of
how many were missed. This is the result of basically parallel
mechanisms aimed at accomplishing more or less the same thing, added at
different points in random.c history, which sort of compete with the
first-instance-only limiting we have now.

It turns out, however, that nobody cares about the first unseeded
randomness instance of in-kernel users. The same first user has been
there for ages now, and nobody is doing anything about it. It isn't even
clear that anybody _can_ do anything about it. Most places that can do
something about it have switched over to using get_random_bytes_wait()
or wait_for_random_bytes(), which is the right thing to do, but there is
still much code that needs randomness sometimes during init, and as a
geeneral rule, if you're not using one of the _wait functions or the
readiness notifier callback, you're bound to be doing it wrong just
based on that fact alone.

So warning about this same first user that can't easily change is simply
not an effective mechanism for anything at all. Users can't do anything
about it, as the Kconfig text points out -- the problem isn't in
userspace code -- and kernel developers don't or more often can't react
to it.

Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM
is set, so that developers can debug things need be, or if it isn't set,
don't show a warning at all.

At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting
random.ratelimit_disable=1 on by default, since if you care about one
you probably care about the other too. And we can clean up usage around
the related urandom_warning ratelimiter as well (whose behavior isn't
changing), so that it properly counts missed messages after the 10
message threshold is reached.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18 15:53:53 +02:00