linux-xiaomi-chiron

Author	SHA1	Message	Date
Rafael J. Wysocki	fddd8f86df	cpufreq: Split cpufreq_offline() Split the "core" part running under the policy rwsem out of cpufreq_offline() to allow the locking in cpufreq_remove_dev() to be rearranged more easily. As a side-effect this eliminates the unlock label that's not needed any more. No expected functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2022-05-12 17:11:57 +02:00
Rafael J. Wysocki	e1e962c5b9	cpufreq: Reorganize checks in cpufreq_offline() Notice that cpufreq_offline() only needs to check policy_is_inactive() once and rearrange the code in there to make that happen. No expected functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2022-05-12 17:11:57 +02:00
Miaoqian Lin	a508e33956	ipmi:ipmb: Fix refcount leak in ipmi_ipmb_probe of_parse_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when done. Add missing of_node_put() to avoid refcount leak. Fixes: `00d93611f0` ("ipmi:ipmb: Add the ability to have a separate slave and master device") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Message-Id: <20220512044445.3102-1-linmq006@gmail.com> Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:04 -05:00
Yu Zhe	5396ccbd79	ipmi: remove unnecessary type castings remove unnecessary void* type castings. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Message-Id: <20220421150941.7659-1-yuzhe@nfschina.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:04 -05:00
Corey Minyard	1016daf218	ipmi: Make two logs unique There were two identical logs in two different places, so you couldn't tell which one was being logged. Make them unique. Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:04 -05:00
Corey Minyard	be8503597c	ipmi:si: Convert pr_debug() to dev_dbg() A device is available, use it. Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:04 -05:00
Corey Minyard	b2c6941a5c	ipmi: Convert pr_debug() to dev_dbg() A device is available at all debug points, use the right interface. Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:04 -05:00
Corey Minyard	2ebaf18a0b	ipmi: Fix pr_fmt to avoid compilation issues The was it was wouldn't work in some situations, simplify it. What was there was unnecessary complexity. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:04 -05:00
Corey Minyard	f214549d71	ipmi: Add an intializer for ipmi_recv_msg struct Don't hand-initialize the struct here, create a macro to initialize it so new fields added don't get forgotten in places. Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:03 -05:00
Corey Minyard	9824117dd9	ipmi: Add an intializer for ipmi_smi_msg struct There was a "type" element added to this structure, but some static values were missed. The default value will be zero, which is correct, but create an initializer for the type and initialize the type properly in the initializer to avoid future issues. Reported-by: Joe Wiese <jwiese@rackspace.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:03 -05:00
Corey Minyard	7602b957e2	ipmi:ssif: Check for NULL msg when handling events and messages Even though it's not possible to get into the SSIF_GETTING_MESSAGES and SSIF_GETTING_EVENTS states without a valid message in the msg field, it's probably best to be defensive here and check and print a log, since that means something else went wrong. Also add a default clause to that switch statement to release the lock and print a log, in case the state variable gets messed up somehow. Reported-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:03 -05:00
Stephen Kitt	0924c5a0cb	ipmi: use simple i2c probe function The i2c probe functions here don't use the id information provided in their second argument, so the single-parameter i2c probe function ("probe_new") can be used instead. This avoids scanning the identifier tables during probes. Signed-off-by: Stephen Kitt <steve@sk2.org> Message-Id: <20220324171159.544565-1-steve@sk2.org> Signed-off-by: Corey Minyard <cminyard@mvista.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>	2022-05-12 10:00:03 -05:00
Corey Minyard	d5d91586be	ipmi: Add a sysfs count of total outstanding messages for an interface Go through each user and add its message count to a total and print the total. It would be nice to have a per-user file, but there's no user sysfs entity at this point to hang it off of. Probably not worth the effort. Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com> Cc: Chen Guanqiao <chen.chenchacha@foxmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:03 -05:00
Corey Minyard	f60231885f	ipmi: Add a sysfs interface to view the number of users A count of users is kept for each interface, allow it to be viewed. Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com> Cc: Chen Guanqiao <chen.chenchacha@foxmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:03 -05:00
Corey Minyard	333730e456	ipmi: Limit the number of message a user may have outstanding This way a rogue application can't use up a bunch of memory. Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com> Cc: Chen Guanqiao <chen.chenchacha@foxmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:02 -05:00
Corey Minyard	8e76741c3d	ipmi: Add a limit on the number of users that may use IPMI Each user uses memory, we need limits to avoid a rogue program from running the system out of memory. Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com> Cc: Chen Guanqiao <chen.chenchacha@foxmail.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2022-05-12 10:00:02 -05:00
Mario Limonciello	d52848620d	ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default ASUS B1400CEAE fails to resume from suspend to idle by default. This was bisected back to commit `df4f9bc4fb` ("nvme-pci: add support for ACPI StorageD3Enable property") but this is a red herring to the problem. Before this commit the system wasn't getting into deepest sleep state. Presumably this commit is allowing entry into deepest sleep state as advertised by firmware, but there are some other problems related to the wakeup. As it is confirmed the system works properly with S3, set the default for this system to S3. Reported-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-12 16:56:58 +02:00
Ian Cowan	4c19851c70	ACPI: clean up white space in a few places for consistency This cleans up a few line spaces so that it is consistent with the rest of the file. There are a few places where a space was added before a return and two spots where a double line space was made into one line space. Signed-off-by: Ian Cowan <ian@linux.cowan.aero> [ rjw: Subject adjustment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-12 16:54:49 +02:00
Nirmal Patel	c94f732e80	PCI: vmd: Revert `2565e5b69c` ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.") Revert `2565e5b69c` ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.") The commit `2565e5b69c` was added as a workaround to keep MSI-X remapping enabled if IOMMU enables interrupt remapping. VMD would keep running in low performance mode. There is no dependency between MSI-X remapping by VMD and interrupt remapping by IOMMU. Link: https://lore.kernel.org/r/20220511095707.25403-3-nirmal.patel@linux.intel.com Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-12 15:54:14 +01:00
Nirmal Patel	886e67100b	PCI: vmd: Assign VMD IRQ domain before enumeration During the boot process all the PCI devices are assigned default PCI-MSI IRQ domain including VMD endpoint devices. If interrupt-remapping is enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ domain for its child devices in order to support MSI-X remapping capabilities. Now when MSI-X remapping in VMD is disabled in order to improve performance, VMD skips VMD-MSI IRQ domain assignment process to its child devices. Thus the devices behind VMD get default PCI-MSI IRQ domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and configures child devices. As a result host OS fails to boot and DMAR errors were observed when interrupt remapping was enabled on Intel Icelake CPUs. For instance: DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request To fix this issue, dev_msi_info struct in dev struct maintains correct value of IRQ domain. VMD will use this information to assign proper IRQ domain to its child devices when it doesn't create a separate IRQ domain. Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.com Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-12 15:54:14 +01:00
Rafael J. Wysocki	b7fbf4cebd	ACPI: glue: Rearrange find_child_checks() Notice that it is not necessary to evaluate _STA in find_child_checks() if the device is expected to have children, but there are none, so move the children check to the front of the function. Also notice that FIND_CHILD_MIN_SCORE can be returned right away if _STA is missing, so make the function do so. Finally, replace the ternary operator in the return statement argument with an if () and a standalone return which is somewhat easier to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-05-12 16:52:32 +02:00
Yang Li	516edb456f	nilfs2: Fix some kernel-doc comments The description of @flags in nilfs_dirty_inode() kernel-doc comment is missing, and some functions had kernel-doc that used a hash instead of a colon to separate the parameter name from the one line description. Fix them to remove some warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/nilfs2/inode.c:73: warning: Function parameter or member 'inode' not described in 'nilfs_get_block' fs/nilfs2/inode.c:73: warning: Function parameter or member 'blkoff' not described in 'nilfs_get_block' fs/nilfs2/inode.c:73: warning: Function parameter or member 'bh_result' not described in 'nilfs_get_block' fs/nilfs2/inode.c:73: warning: Function parameter or member 'create' not described in 'nilfs_get_block' fs/nilfs2/inode.c:145: warning: Function parameter or member 'file' not described in 'nilfs_readpage' fs/nilfs2/inode.c:145: warning: Function parameter or member 'page' not described in 'nilfs_readpage' fs/nilfs2/inode.c:968: warning: Function parameter or member 'flags' not described in 'nilfs_dirty_inode' Link: https://lkml.kernel.org/r/20220324024215.63479-1-yang.lee@linux.alibaba.com Link: https://lkml.kernel.org/r/1652276316-7791-1-git-send-email-konishi.ryusuke@gmail.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-05-12 10:49:23 -04:00
Matthew Wilcox (Oracle)	08104fb0b1	Appoint myself page cache maintainer This feels like a sufficiently distinct area of responsibility to be worth separating out from both MM and VFS. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz>	2022-05-12 10:48:49 -04:00
Ming Lei	a327c341dc	blk-mq: fix passthrough plugging First we can't add request into plug list in blk_mq_request_bypass_insert which may be called when flushing plug list, so nested plug is caused. Second if polled passthrough request is inserted via blk_execute_rq(), it can't be added to plug list too since io polling needs the request to be issued to driver. Fixes the two by moving plugging into blk_execute_rq_no_wait(). Cc: Christoph Hellwig <hch@lst.de> Fixes: `1c2d2fff6d` ("block: wire-up support for passthrough plugging") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220512140010.1458645-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-12 08:45:39 -06:00
Geert Uytterhoeven	66d7a40beb	mtd: nand: MTD_NAND_ECC_MEDIATEK should depend on ARCH_MEDIATEK The MediaTek Hardware ECC Engine is only present on MediaTek MT27xx and MT76xx SoCs. The driver for this engine is a dependency for the MediaTek NAND controller (MTD_NAND_MTK) and the MediaTek SPI NAND Flash Interface (SPI_MTK_SNFI) drivers, both of which already depend on ARCH_MEDIATEK. Hence add a dependency on ARCH_MEDIATEK to the Hardware ECC Engine driver, too, to prevent asking the user about this driver when configuring a kernel without MediaTek SoC support. Fixes: `4fd62f15af` ("mtd: nand: make mtk_ecc.c a separated module") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/bb9568e825d4bc7506870b03836baa91bcc4b725.1652104136.git.geert+renesas@glider.be	2022-05-12 16:43:04 +02:00
Siddh Raman Pant	75d6fe48a2	spi: Doc fix - Describe add_lock and dma_map_dev in spi_controller This fixes the corresponding warnings during building the docs. Signed-off-by: Siddh Raman Pant <siddhpant.gh@gmail.com> Link: https://lore.kernel.org/r/4e6187a4-d0f8-4750-e407-e09cc1c91789@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-05-12 15:43:04 +01:00
Minghao Chi	c96f824af0	mtd: rawnand: cs553x: simplify the return expression of cs553x_write_ctrl_byte() Simplify the return expression. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220505022354.61458-1-chi.minghao@zte.com.cn	2022-05-12 16:43:03 +02:00
Vaishnav Achath	606e5d4081	spi: cadence-quadspi: Handle spi_unregister_master() in remove() Currently devres managed removal of the spi_controller happens after removing the power domain of the host platform_device.While this does not affect the clean removal of the controller, but affects graceful removal of the child devices if the child device removal requires issuing commands over SPI. Eg. flash device being soft reset to 1S-1S-1S mode before removal so that on next probe operations in 1S-1S-1S mode is successful. Failure is seen when `rmmod spi-cadence-quadspi` is performed: root@j7-evm:~# rmmod spi_cadence_quadspi [ 49.230996] cadence-qspi 47050000.spi: QSPI is still busy after 500ms timeout. [ 49.238209] spi-nor spi1.0: operation failed with -110 [ 49.244457] spi-nor spi1.0: Software reset failed: -110 and on subsequent modprobe the OSPI flash probe fails as it is in 8D-8D-8D mode since the previous soft reset did not happen. root@j7-evm:~# modprobe spi_cadence_quadspi [ 73.253536] spi-nor spi0.0: unrecognized JEDEC id bytes: ff ff ff ff ff ff [ 73.260476] spi-nor: probe of spi0.0 failed with error -2 This commit adds necessary changes to perform spi_unregister_master() in the host device remove() so that the child devices are gracefully removed before the power domain is removed. changes tested on J721E with mt35xu512aba flash. Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com> Link: https://lore.kernel.org/r/20220511115516.14894-1-vaishnav.a@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-05-12 15:43:03 +01:00
Rickard x Andersson	773898127e	mtd: rawnand: kioxia: Add support for TH58NVG3S0HBAI4 Add timings for Kioxia/Toshiba TH58NVG3S0HBAI4. Timings for this memory matches the timings selected for TH58NVG2S3HBAI4. This patch increases eraseblock write speed from 5248 KiB/s to 6864 KiB/s and erase block read speed from 8542 KiB/s to 18360 KiB/s Tested on i.MX6SX. Signed-off-by: Rickard x Andersson <rickaran@axis.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220429083931.26795-1-rickaran@axis.com	2022-05-12 16:43:01 +02:00
Zucheng Zheng	bc469ddf67	perf/x86/amd: Remove unused variable 'hwc' 'hwc' is never used in amd_pmu_enable_all(), so remove it. Signed-off-by: Zucheng Zheng <zhengzucheng@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421111031.174698-1-zhengzucheng@huawei.com	2022-05-12 16:28:46 +02:00
Tiezhu Yang	4bc7800588	objtool: Remove libsubcmd.a when make clean The file libsubcmd.a still exists after make clean, remove it. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/1652258270-6278-3-git-send-email-yangtiezhu@loongson.cn	2022-05-12 07:28:35 -07:00
Tiezhu Yang	f193c32cad	objtool: Remove inat-tables.c when make clean When build objtool on x86, the generated file inat-tables.c is in arch/x86/lib instead of arch/x86, use the correct dir to remove it when make clean. $ cd tools/objtool $ make [...] GEN arch/x86/lib/inat-tables.c [...] Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/1652258270-6278-2-git-send-email-yangtiezhu@loongson.cn	2022-05-12 07:28:05 -07:00
Dinh Nguyen	3a21c3ac93	dt-bindings: gpio: altera: correct interrupt-cells update documentation to correctly state the interrupt-cells to be 2. Cc: stable@vger.kernel.org Fixes: `4fd9bbc6e0` ("drivers/gpio: Altera soft IP GPIO driver devicetree binding") Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>	2022-05-12 09:27:07 -05:00
Krzysztof Kozlowski	f9ccf752ed	ARM: dts: socfpga: align SPI NOR node name with dtschema The node names should be generic and SPI NOR dtschema expects "flash". Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>	2022-05-12 09:23:30 -05:00
Mark Brown	fd4b80044b	ASoC: SOF: Add IPC4 FW loader support Merge series from Ranjani Sridharan <ranjani.sridharan@linux.intel.com>: The patches in this series add support for FW loading for IPC4 in the SOF driver.	2022-05-12 15:18:23 +01:00
Sean Christopherson	b28cb0cd2c	KVM: x86/mmu: Update number of zapped pages even if page list is stable When zapping obsolete pages, update the running count of zapped pages regardless of whether or not the list has become unstable due to zapping a shadow page with its own child shadow pages. If the VM is backed by mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping the batch count and thus without yielding. In the worst case scenario, this can cause a soft lokcup. watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [dirty_log_perf_:13020] RIP: 0010:workingset_activation+0x19/0x130 mark_page_accessed+0x266/0x2e0 kvm_set_pfn_accessed+0x31/0x40 mmu_spte_clear_track_bits+0x136/0x1c0 drop_spte+0x1a/0xc0 mmu_page_zap_pte+0xef/0x120 __kvm_mmu_prepare_zap_page+0x205/0x5e0 kvm_mmu_zap_all_fast+0xd7/0x190 kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 kvm_page_track_flush_slot+0x5c/0x80 kvm_arch_flush_shadow_memslot+0xe/0x10 kvm_set_memslot+0x1a8/0x5d0 __kvm_set_memory_region+0x337/0x590 kvm_vm_ioctl+0xb08/0x1040 Fixes: `fbb158cb88` ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""") Reported-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220511145122.3133334-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 10:09:51 -04:00
Vipin Sharma	6ba1e04fa6	KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps Avoid calling handlers on empty rmap entries and skip to the next non empty rmap entry. Empty rmap entries are noop in handlers. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220502220347.174664-1-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:45 -04:00
Kai Huang	3c5c32457d	KVM: VMX: Include MKTME KeyID bits in shadow_zero_check Intel MKTME KeyID bits (including Intel TDX private KeyID bits) should never be set to SPTE. Set shadow_me_value to 0 and shadow_me_mask to include all MKTME KeyID bits to include them to shadow_zero_check. Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <27bc10e97a3c0b58a4105ff9107448c190328239.1650363789.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:45 -04:00
Kai Huang	e54f1ff244	KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask Intel Multi-Key Total Memory Encryption (MKTME) repurposes couple of high bits of physical address bits as 'KeyID' bits. Intel Trust Domain Extentions (TDX) further steals part of MKTME KeyID bits as TDX private KeyID bits. TDX private KeyID bits cannot be set in any mapping in the host kernel since they can only be accessed by software running inside a new CPU isolated mode. And unlike to AMD's SME, host kernel doesn't set any legacy MKTME KeyID bits to any mapping either. Therefore, it's not legitimate for KVM to set any KeyID bits in SPTE which maps guest memory. KVM maintains shadow_zero_check bits to represent which bits must be zero for SPTE which maps guest memory. MKTME KeyID bits should be set to shadow_zero_check. Currently, shadow_me_mask is used by AMD to set the sme_me_mask to SPTE, and shadow_me_shadow is excluded from shadow_zero_check. So initializing shadow_me_mask to represent all MKTME keyID bits doesn't work for VMX (as oppositely, they must be set to shadow_zero_check). Introduce a new 'shadow_me_value' to replace existing shadow_me_mask, and repurpose shadow_me_mask as 'all possible memory encryption bits'. The new schematic of them will be: - shadow_me_value: the memory encryption bit(s) that will be set to the SPTE (the original shadow_me_mask). - shadow_me_mask: all possible memory encryption bits (which is a super set of shadow_me_value). - For now, shadow_me_value is supposed to be set by SVM and VMX respectively, and it is a constant during KVM's life time. This perhaps doesn't fit MKTME but for now host kernel doesn't support it (and perhaps will never do). - Bits in shadow_me_mask are set to shadow_zero_check, except the bits in shadow_me_value. Introduce a new helper kvm_mmu_set_me_spte_mask() to initialize them. Replace shadow_me_mask with shadow_me_value in almost all code paths, except the one in PT64_PERM_MASK, which is used by need_remote_flush() to determine whether remote TLB flush is needed. This should still use shadow_me_mask as any encryption bit change should need a TLB flush. And for AMD, move initializing shadow_me_value/shadow_me_mask from kvm_mmu_reset_all_pte_masks() to svm_hardware_setup(). Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <f90964b93a3398b1cf1c56f510f3281e0709e2ab.1650363789.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:44 -04:00
Kai Huang	c919e881ba	KVM: x86/mmu: Rename reset_rsvds_bits_mask() Rename reset_rsvds_bits_mask() to reset_guest_rsvds_bits_mask() to make it clearer that it resets the reserved bits check for guest's page table entries. Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <efdc174b85d55598880064b8bf09245d3791031d.1650363789.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:44 -04:00
Paolo Bonzini	c9f3d9fbcd	KVM: x86: a vCPU with a pending triple fault is runnable Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:44 -04:00
Sean Christopherson	1075d41efd	KVM: x86/mmu: Expand and clean up page fault stats Expand and clean up the page fault stats. The current stats are at best incomplete, and at worst misleading. Differentiate between faults that are actually fixed vs those that result in an MMIO SPTE being created, track faults that are spurious, faults that trigger emulation, faults that that are fixed in the fast path, and last but not least, track the number of faults that are taken. Note, the number of faults that require emulation for write-protected shadow pages can roughly be calculated by subtracting the number of MMIO SPTEs created from the overall number of faults that trigger emulation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:43 -04:00
Sean Christopherson	8d5265b101	KVM: x86/mmu: Use IS_ENABLED() to avoid RETPOLINE for TDP page faults Use IS_ENABLED() instead of an #ifdef to activate the anti-RETPOLINE fast path for TDP page faults. The generated code is identical, and the #ifdef makes it dangerously difficult to extend the logic (guess who forgot to add an "else" inside the #ifdef and ran through the page fault handler twice). No functional or binary change intented. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:43 -04:00
Sean Christopherson	8a009d5bca	KVM: x86/mmu: Make all page fault handlers internal to the MMU Move kvm_arch_async_page_ready() to mmu.c where it belongs, and move all of the page fault handling collateral that was in mmu.h purely for the async #PF handler into mmu_internal.h, where it belongs. This will allow kvm_mmu_do_page_fault() to act on the RET_PF_* return without having to expose those enums outside of the MMU. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:42 -04:00
Sean Christopherson	5276c616ab	KVM: x86/mmu: Add RET_PF_CONTINUE to eliminate bool+int* "returns" Add RET_PF_CONTINUE and use it in handle_abnormal_pfn() and kvm_faultin_pfn() to signal that the page fault handler should continue doing its thing. Aside from being gross and inefficient, using a boolean return to signal continue vs. stop makes it extremely difficult to add more helpers and/or move existing code to a helper. E.g. hypothetically, if nested MMUs were to gain a separate page fault handler in the future, everything up to the "is self-modifying PTE" check can be shared by all shadow MMUs, but communicating up the stack whether to continue on or stop becomes a nightmare. More concretely, proposed support for private guest memory ran into a similar issue, where it'll be forced to forego a helper in order to yield sane code: https://lore.kernel.org/all/YkJbxiL%2FAz7olWlq@google.com. No functional change intended. Cc: David Matlack <dmatlack@google.com> Cc: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:42 -04:00
Sean Christopherson	5c64aba517	KVM: x86/mmu: Drop exec/NX check from "page fault can be fast" Tweak the "page fault can be fast" logic to explicitly check for !PRESENT faults in the access tracking case, and drop the exec/NX check that becomes redundant as a result. No sane hardware will generate an access that is both an instruct fetch and a write, i.e. it's a waste of cycles. If hardware goes off the rails, or KVM runs under a misguided hypervisor, spuriously running throught fast path is benign (KVM has been uknowingly being doing exactly that for years). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:41 -04:00
Sean Christopherson	54275f74cf	KVM: x86/mmu: Don't attempt fast page fault just because EPT is in use Check for A/D bits being disabled instead of the access tracking mask being non-zero when deciding whether or not to attempt to fix a page fault vian the fast path. Originally, the access tracking mask was non-zero if and only if A/D bits were disabled by _KVM_ (including not being supported by hardware), but that hasn't been true since nVMX was fixed to honor EPTP12's A/D enabling, i.e. since KVM allowed L1 to cause KVM to not use A/D bits while running L2 despite KVM using them while running L1. In other words, don't attempt the fast path just because EPT is enabled. Note, attempting the fast path for all !PRESENT faults can "fix" a very, _VERY_ tiny percentage of faults out of mmu_lock by detecting that the fault is spurious, i.e. has been fixed by a different vCPU, but again the odds of that happening are vanishingly small. E.g. booting an 8-vCPU VM gets less than 10 successes out of 30k+ faults, and that's likely one of the more favorable scenarios. Disabling dirty logging can likely lead to a rash of collisions between vCPUs for some workloads that operate on a common set of pages, but penalizing _all_ !PRESENT faults for that one case is unlikely to be a net positive, not to mention that that problem is best solved by not zapping in the first place. The number of spurious faults does scale with the number of vCPUs, e.g. a 255-vCPU VM using TDP "jumps" to ~60 spurious faults detected in the fast path (again out of 30k), but that's all of 0.2% of faults. Using legacy shadow paging does get more spurious faults, and a few more detected out of mmu_lock, but the percentage goes _down_ to 0.08% (and that's ignoring faults that are reflected into the guest), i.e. the extra detections are purely due to the sheer number of faults observed. On the other hand, getting a "negative" in the fast path takes in the neighborhood of 150-250 cycles. So while it is tempting to keep/extend the current behavior, such a change needs to come with hard numbers showing that it's actually a win in the grand scheme, or any scheme for that matter. Fixes: `995f00a619` ("x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:41 -04:00
Li RongQing	91ab933f75	KVM: VMX: clean up pi_wakeup_handler Passing per_cpu() to list_for_each_entry() causes the macro to be evaluated N+1 times for N sleeping vCPUs. This is a very small inefficiency, and the code is cleaner if the address of the per-CPU variable is loaded earlier. Do this for both the list and the spinlock. Signed-off-by: Li RongQing <lirongqing@baidu.com> Message-Id: <1649244302-6777-1-git-send-email-lirongqing@baidu.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:40 -04:00
Maxim Levitsky	33fbe6befa	KVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness This shows up as a TDP MMU leak when running nested. Non-working cmpxchg on L0 relies makes L1 install two different shadow pages under same spte, and one of them is leaked. Fixes: `1c2361f667` ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220512101420.306759-1-mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-05-12 09:51:26 -04:00
Minghao Chi	46ecf720f3	platform/x86: toshiba_acpi: use kobj_to_dev() Use kobj_to_dev() instead of open-coding it. Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220511021638.1488650-1-chi.minghao@zte.com.cn Signed-off-by: Hans de Goede <hdegoede@redhat.com>	2022-05-12 15:37:53 +02:00

... 144 145 146 147 148 ...

1107633 commits