linux-xiaomi-chiron

Author	SHA1	Message	Date
Maciej W. Rozycki	68dec269ee	MIPS: memset: Limit excessive `noreorder' assembly mode use Rewrite to use the `reorder' assembly mode and remove manually scheduled delay slots except where GAS cannot schedule a delay-slot instruction due to a data dependency or a section switch (as is the case with the EX macro). No change in machine code produced. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> [paul.burton@mips.com: Fix conflict with commit `932afdeec1` ("MIPS: Add Kconfig variable for CPUs with unaligned load/store instructions")] Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20834/ Cc: Ralf Baechle <ralf@linux-mips.org>	2018-10-09 10:31:03 -07:00
Qiuxu Zhuo	8f18973877	EDAC, skx_edac: Fix logical channel intermediate decoding The code "lchan = (lchan << 1) \| ~lchan" for logical channel intermediate decoding is wrong. The wrong intermediate decoding result is {0xffffffff, 0xfffffffe}. Fix it by replacing '~' with '!'. The correct intermediate decoding result is {0x1, 0x2}. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Aristeu Rozanski <aris@redhat.com> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20181009172025.18594-1-tony.luck@intel.com	2018-10-09 19:30:04 +02:00
Maciej W. Rozycki	2f7619ae90	MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression Fix a commit `8a8158c85e` ("MIPS: memset.S: EVA & fault support for small_memset") regression and remove assembly warnings: arch/mips/lib/memset.S: Assembler messages: arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot triggering with the CPU_DADDI_WORKAROUNDS option set and this code: PTR_SUBU a2, t1, a0 jr ra PTR_ADDIU a2, 1 This is because with that option in place the DADDIU instruction, which the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn expands to an LI/DADDU (or actually ADDIU/DADDU) sequence: 13c: 01a4302f dsubu a2,t1,a0 140: 03e00008 jr ra 144: 24010001 li at,1 148: 00c1302d daddu a2,a2,at ... Correct this by switching off the `noreorder' assembly mode and letting GAS schedule this jump's delay slot, as there is nothing special about it that would require manual scheduling. With this change in place correct code is produced: 13c: 01a4302f dsubu a2,t1,a0 140: 24010001 li at,1 144: 03e00008 jr ra 148: 00c1302d daddu a2,a2,at ... Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: `8a8158c85e` ("MIPS: memset.S: EVA & fault support for small_memset") Patchwork: https://patchwork.linux-mips.org/patch/20833/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: stable@vger.kernel.org # 4.17+	2018-10-09 10:24:04 -07:00
Paolo Bonzini	4cebf459b6	KVM/arm fixes for 4.19, take #2 - Correctly order GICv3 SGI registers in the cp15 array -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlu8nHAVHG1hcmMuenlu Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDpoUP+QEUeu73gSJkteFlAN59rXPM6Lm1 P6RbXV/v48Qcvoi/YnCNUW1fPzaKK7/un3elzprHBdyO/n/t+0WL6Ss18CoFD9NE ggpYzZhSp8t8nQb4auIrRcF/Sq2BDVTp7/r4cPtMPQZvLB1w+xs0KUBW9jC4BmGA xyiIT8QPWPp+vIyi6eGFyNbTPHEUNujQk4GWgUURn8To9nuRKRLAyDO/oKkOHre4 +Yk4o9b9+bhpxOKUzYRgct8nfnYIH88yHVGG2IBOulD/CODc+FwvtPa3FFwzUaqj UfsRaHj2cetyck0GAzb63sEF0vFQqvnnnf0PGwXwIatr2qt/bOVhkTDDyewTmA7S JIor5mCTNbzHbvdne8GQhRMh8Dt4p6Sj+DEkOW3g9kCgJAlBg0aG3wtrXSaTYd5V E6aOki+Y7XjHCIeYB833BOgFP8ALLgtVSggEPAnMngYkP4VKo2AdGQzq65sAlQ/s nf4frliY6nfCtg8HexPI7bJITP/qJ/82JmaiFrtOdRioT7PUlEfO+QGATUUpufmG ocyFQ3TVLKT9rVLcVmfQ3Y7mPu7FmfkDev7kbqBNu+Epu6pHpPc0bq32tXVc4Xwx BV+X/NpKkG5GbbELdVx6cxypHnIbEq7lnMYUK7fYkXvbEezskiMQBeb4bLtV/WRR ohHZnN9oaJMyCIuo =YYx4 -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm fixes for 4.19, take #2 - Correctly order GICv3 SGI registers in the cp15 array	2018-10-09 18:49:32 +02:00
Paolo Bonzini	853c110982	KVM: x86: support CONFIG_KVM_AMD=y with CONFIG_CRYPTO_DEV_CCP_DD=m SEV requires access to the AMD cryptographic device APIs, and this does not work when KVM is builtin and the crypto driver is a module. Actually the Kconfig conditions for CONFIG_KVM_AMD_SEV try to disable SEV in that case, but it does not work because the actual crypto calls are not culled, only sev_hardware_setup() is. This patch adds two CONFIG_KVM_AMD_SEV checks that gate all the remaining SEV code; it fixes this particular configuration, and drops 5 KiB of code when CONFIG_KVM_AMD_SEV=n. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-09 18:38:42 +02:00
Andreas Gruenbacher	dc480feb45	gfs2: Fix iomap buffered write support for journaled files Commit `64bc06bb32` broke buffered writes to journaled files (chattr +j): we'll try to journal the buffer heads of the page being written to in gfs2_iomap_journaled_page_done. However, the iomap code no longer creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix that by creating buffer heads ourself when needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-10-09 18:20:13 +02:00
Shaokun Zhang	742fafa50b	arm64: mm: Drop the unused cpu parameter Cpu parameter is never used in flush_context, remove it. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2018-10-09 17:17:23 +01:00
Borislav Petkov	b69c2e20f6	resource: Clean it up a bit - Drop BUG_ON()s and do normal error handling instead, in find_next_iomem_res(). - Align function arguments on opening braces. - Get rid of local var sibling_only in find_next_iomem_res(). - Shorten unnecessarily long first_level_children_only arg name. Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com Link: <new submission>	2018-10-09 17:25:58 +02:00
Bjorn Helgaas	010a93bf97	resource: Fix find_next_iomem_res() iteration issue Previously find_next_iomem_res() used "res" as both an input parameter for the range to search and the type of resource to search for, and an output parameter for the resource we found, which makes the interface confusing. The current callers use find_next_iomem_res() incorrectly because they allocate a single struct resource and use it for repeated calls to find_next_iomem_res(). When find_next_iomem_res() returns a resource, it overwrites the start, end, flags, and desc members of the struct. If we call find_next_iomem_res() again, we must update or restore these fields. The previous code restored res.start and res.end, but not res.flags or res.desc. Since the callers did not restore res.flags, if they searched for flags IORESOURCE_MEM \| IORESOURCE_BUSY and found a resource with flags IORESOURCE_MEM \| IORESOURCE_BUSY \| IORESOURCE_SYSRAM, the next search would incorrectly skip resources unless they were also marked as IORESOURCE_SYSRAM. Fix this by restructuring the interface so it takes explicit "start, end, flags" parameters and uses "res" only as an output parameter. Based on a patch by Lianbo Jiang <lijiang@redhat.com>. [ bp: While at it: - make comments kernel-doc style. - Originally-by: http://lore.kernel.org/lkml/20180921073211.20097-2-lijiang@redhat.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812916.1157.177580438135143788.stgit@bhelgaas-glaptop.roam.corp.google.com	2018-10-09 17:18:36 +02:00
Bjorn Helgaas	a98959fdbd	resource: Include resource end in walk_() interfaces find_next_iomem_res() finds an iomem resource that covers part of a range described by "start, end". All callers expect that range to be inclusive, i.e., both start and end are included, but find_next_iomem_res() doesn't handle the end address correctly. If it finds an iomem resource that contains exactly the end address, it skips it, e.g., if "start, end" is [0x0-0x10000] and there happens to be an iomem resource [mem 0x10000-0x10000] (the single byte at 0x10000), we skip it: find_next_iomem_res(...) { start = 0x0; end = 0x10000; for (p = next_resource(...)) { # p->start = 0x10000; # p->end = 0x10000; # we should* return this resource, but this condition is false: if ((p->end >= start) && (p->start < end)) break; Adjust find_next_iomem_res() so it allows a resource that includes the single byte at the end of the range. This is a corner case that we probably don't see in practice. Fixes: `58c1b5b079` ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.stgit@bhelgaas-glaptop.roam.corp.google.com	2018-10-09 17:18:34 +02:00
Bjorn Helgaas	51fbf14f25	x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error The only use of KEXEC_BACKUP_SRC_END is as an argument to walk_system_ram_res(): int crash_load_segments(struct kimage image) { ... walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END, image, determine_backup_region); walk_system_ram_res() expects "start, end" arguments that are inclusive, i.e., the range to be walked includes both the start and end addresses. KEXEC_BACKUP_SRC_END was previously defined as (640 1024UL), which is the first address past the desired 0-640KB range. Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC region is [0-0x9ffff], not [0-0xa0000]. Fixes: `dd5f726076` ("kexec: support for kexec on panic using new system call") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Ingo Molnar <mingo@redhat.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: baiyaowei@cmss.chinamobile.com CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.stgit@bhelgaas-glaptop.roam.corp.google.com	2018-10-09 17:18:31 +02:00
Dave Hansen	367e3f1d3f	x86/mm: Remove spurious fault pkey check Spurious faults only ever occur in the kernel's address space. They are also constrained specifically to faults with one of these error codes: X86_PF_WRITE \| X86_PF_PROT X86_PF_INSTR \| X86_PF_PROT So, it's never even possible to reach spurious_kernel_fault_check() with X86_PF_PK set. In addition, the kernel's address space never has pages with user-mode protections. Protection Keys are only enforced on pages with user-mode protection. This gives us lots of reasons to not check for protection keys in our sprurious kernel fault handling. But, let's also add some warnings to ensure that these assumptions about protection keys hold true. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160231.243A0D6A@viggo.jf.intel.com	2018-10-09 16:51:16 +02:00
Dave Hansen	3ae0ad92f5	x86/mm/vsyscall: Consider vsyscall page part of user address space The vsyscall page is weird. It is in what is traditionally part of the kernel address space. But, it has user permissions and we handle faults on it like we would on a user page: interrupts on. Right now, we handle vsyscall emulation in the "bad_area" code, which is used for both user-address-space and kernel-address-space faults. Move the handling to the user-address-space code only and ensure we get there by "excluding" the vsyscall page from the kernel address space via a check in fault_in_kernel_space(). Since the fault_in_kernel_space() check is used on 32-bit, also add a 64-bit check to make it clear we only use this path on 64-bit. Also move the unlikely() to be in is_vsyscall_vaddr() itself. This helps clean up the kernel fault handling path by removing a case that can happen in normal[1] operation. (Yeah, yeah, we can argue about the vsyscall page being "normal" or not.) This also makes sanity checks easier, like the "we never take pkey faults in the kernel address space" check in the next patch. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160230.6E9336EE@viggo.jf.intel.com	2018-10-09 16:51:16 +02:00
Dave Hansen	02e983b760	x86/mm: Add vsyscall address helper We will shortly be using this check in two locations. Put it in a helper before we do so. Let's also insert PAGE_MASK instead of the open-coded ~0xfff. It is easier to read and also more obviously correct considering the implicit type conversion that has to happen when it is not an implicit 'unsigned long'. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160228.C593509B@viggo.jf.intel.com	2018-10-09 16:51:16 +02:00
Dave Hansen	88259744e2	x86/mm: Fix exception table comments The comments here are wrong. They are too absolute about where faults can occur when running in the kernel. The comments are also a bit hard to match up with the code. Trim down the comments, and make them more precise. Also add a comment explaining why we are doing the bad_area_nosemaphore() path here. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160227.077DDD7A@viggo.jf.intel.com	2018-10-09 16:51:16 +02:00
Dave Hansen	5b0c2cac54	x86/mm: Add clarifying comments for user addr space The SMAP and Reserved checking do not have nice comments. Add some to clarify and make it match everything else. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160225.FFD44B8D@viggo.jf.intel.com	2018-10-09 16:51:16 +02:00
Dave Hansen	aa37c51b94	x86/mm: Break out user address space handling The last patch broke out kernel address space handing into its own helper. Now, do the same for user address space handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com	2018-10-09 16:51:15 +02:00
Dave Hansen	8fed620000	x86/mm: Break out kernel address space handling The page fault handler (__do_page_fault()) basically has two sections: one for handling faults in the kernel portion of the address space and another for faults in the user portion of the address space. But, these two parts don't stick out that well. Let's make that more clear from code separation and naming. Pull kernel fault handling into its own helper, and reflect that naming by renaming spurious_fault() -> spurious_kernel_fault(). Also, rewrite the vmalloc() handling comment a bit. It was a bit stale and also glossed over the reserved bit handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160222.401F4E10@viggo.jf.intel.com	2018-10-09 16:51:15 +02:00
Dave Hansen	164477c233	x86/mm: Clarify hardware vs. software "error_code" We pass around a variable called "error_code" all around the page fault code. Sounds simple enough, especially since "error_code" looks like it exactly matches the values that the hardware gives us on the stack to report the page fault error code (PFEC in SDM parlance). But, that's not how it works. For part of the page fault handler, "error_code" does exactly match PFEC. But, during later parts, it diverges and starts to mean something a bit different. Give it two names for its two jobs. The place it diverges is also really screwy. It's only in a spot where the hardware tells us we have kernel-mode access that occurred while we were in usermode accessing user-controlled address space. Add a warning in there. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160220.4A2272C9@viggo.jf.intel.com	2018-10-09 16:51:15 +02:00
Rik van Riel	145f573b89	x86/mm/tlb: Make lazy TLB mode lazier Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, when all it really needs to do is reload %CR3 at the next context switch, assuming no page table pages got freed. Memory ordering is used to prevent race conditions between switch_mm_irqs_off, which checks whether .tlb_gen changed, and the TLB invalidation code, which increments .tlb_gen whenever page table entries get invalidated. The atomic increment in inc_mm_tlb_gen is its own barrier; the context switch code adds an explicit barrier between reading tlbstate.is_lazy and next->context.tlb_gen. CPUs in lazy TLB mode remain part of the mm_cpumask(mm), both because that allows TLB flush IPIs to be sent at page table freeing time, and because the cache line bouncing on the mm_cpumask(mm) was responsible for about half the CPU use in switch_mm_irqs_off(). We can change native_flush_tlb_others() without touching other (paravirt) implementations of flush_tlb_others() because we'll be flushing less. The existing implementations flush more and are therefore still correct. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-8-riel@surriel.com	2018-10-09 16:51:12 +02:00
Rik van Riel	97807813fe	x86/mm/tlb: Add freed_tables element to flush_tlb_info Pass the information on to native_flush_tlb_others. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-7-riel@surriel.com	2018-10-09 16:51:12 +02:00
Rik van Riel	016c4d92cd	x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range Add an argument to flush_tlb_mm_range to indicate whether page tables are about to be freed after this TLB flush. This allows for an optimization of flush_tlb_mm_range to skip CPUs in lazy TLB mode. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-6-riel@surriel.com	2018-10-09 16:51:12 +02:00
Rik van Riel	7d49b28a80	smp,cpumask: introduce on_each_cpu_cond_mask Introduce a variant of on_each_cpu_cond that iterates only over the CPUs in a cpumask, in order to avoid making callbacks for every single CPU in the system when we only need to test a subset. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-5-riel@surriel.com	2018-10-09 16:51:11 +02:00
Rik van Riel	c3f7f2c7eb	smp: use __cpumask_set_cpu in on_each_cpu_cond The code in on_each_cpu_cond sets CPUs in a locally allocated bitmask, which should never be used by other CPUs simultaneously. There is no need to use locked memory accesses to set the bits in this bitmap. Switch to __cpumask_set_cpu. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-4-riel@surriel.com	2018-10-09 16:51:11 +02:00
Rik van Riel	12c4d978fd	x86/mm/tlb: Restructure switch_mm_irqs_off() Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. (cherry picked from commit `61d0beb579`) Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-4-riel@surriel.com	2018-10-09 16:51:11 +02:00
Rik van Riel	5462bc3a9a	x86/mm/tlb: Always use lazy TLB mode On most workloads, the number of context switches far exceeds the number of TLB flushes sent. Optimizing the context switches, by always using lazy TLB mode, speeds up those workloads. This patch results in about a 1% reduction in CPU use on a two socket Broadwell system running a memcache like workload. Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> (cherry picked from commit `95b0e6357d`) Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-7-riel@surriel.com	2018-10-09 16:51:11 +02:00
Peter Zijlstra	a31acd3ee8	x86/mm: Page size aware flush_tlb_mm_range() Use the new tlb_get_unmap_shift() to determine the stride of the INVLPG loop. Cc: Nick Piggin <npiggin@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2018-10-09 16:51:11 +02:00
Peter Zijlstra	a5b966ae42	Merge branch 'tlb/asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into x86/mm Pull in the generic mmu_gather changes from the ARM64 tree such that we can put x86 specific things on top as well.	2018-10-09 16:50:43 +02:00
Javier González	766c8ceb16	lightnvm: pblk: guarantee that backpointer is respected on writer stall pblk's write buffer must guarantee that it respects the device's constrains for reads (i.e., mw_cunits). This is done by maintaining a backpointer that updates the L2P table as entries wrap up, making them point to the media instead of pointing to the write buffer. This mechanism can race in case that the write thread stalls, as the write pointer will protect the last written entry, thus disregarding the read constrains. This patch adds an extra check on wrap up, making sure that the threshold is respected at all times, preventing new entries to overwrite committed data, also in case of write thread stall. Reported-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Zhoujie Wu	8a57fc3823	lightnvm: pblk: consider max hw sectors supported for max_write_pgs When do GC, the number of read/write sectors are determined by max_write_pgs(see gc_rq preparation in pblk_gc_line_prepare_ws). Due to max_write_pgs doesn't consider max hw sectors supported by nvme controller(128K), which leads to GC tries to read 64 * 4K in one command, and see below error caused by pblk_bio_map_addr in function pblk_submit_read_gc. [ 2923.005376] pblk: could not add page to bio [ 2923.005377] pblk: could not allocate GC bio (18446744073709551604) Signed-off-by: Zhoujie Wu <zjwu@marvell.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Wei Yongjun	a70985f83c	lightnvm: pblk: fix error handling of pblk_lines_init() In the too many bad blocks error handling case, we should release all the allocated resources, otherwise it will cause memory leak. Fixes: `2deeefc02d` ("lightnvm: pblk: fail gracefully on line alloc. failure") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	6fd05cad5e	lightnvm: do no update csecs and sos on 1.2 1.2 devices exposes their data and metadata size through the separate identify command. Make sure that the NVMe LBA format does not override these values. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	d672d92d9c	lightnvm: pblk: guarantee mw_cunits on read buffer OCSSD 2.0 defines the amount of data that the host must buffer per chunk to guarantee reads through the geometry field mw_cunits. This value is the base that pblk uses to determine the size of its read buffer. Currently, this size is set to be the closes power-of-2 to mw_cunits times the number of parallel units available to the pblk instance for each open line (currently one). When an entry (4KB) is put in the buffer, the L2P table points to it. As the buffer wraps up, the L2P is updated to point to addresses on the device, thus guaranteeing mw_cunits at a chunk level. However, given that pblk cannot write to the device under ws_min (normally ws_opt), there might be a window in which the buffer starts wrapping up and updating L2P entries before the mw_cunits value in a chunk has been surpassed. In order not to violate the mw_cunits constrain in this case, account for ws_opt on the read buffer creation. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	9bd1f875c0	lightnvm: pblk: move ring buffer alloc/free rb init pblk's read/write buffer currently takes a buffer and its size and uses it to create the metadata around it to use it as a ring buffer. This puts the responsibility of allocating/freeing ring buffer memory on the ring buffer user. Instead, move it inside of the ring buffer helpers (pblk-rb.c). This simplifies creation/destruction routines. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	40b8657dcc	lightnvm: pblk: encapsulate rb pointer operations pblk's read/write buffer is always a power-of-2, thus wrapping up the buffer can be done with a bit mask. Since this is an implementation detail internal to the write buffer, make a helper that hides pointer increment + wrap, and allows to transparently relax this assumption in the future. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	dde4aac20b	lightnvm: pblk: remove unused function Removed unused function in pblk-rb.c Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	44cdbdc657	lightnvm: pblk: fix race on sysfs line state pblk exposes a sysfs interface that represents its internal state. Part of this state is the map bitmap for the current open line, which should be protected by the line lock to avoid a race when freeing the line metadata. Currently, it is not. This patch makes sure that the line state is consistent and NULL bitmap pointers are not dereferenced. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	02a1520d56	lightnvm: pblk: add SPDX license tag Add GLP-2.0 SPDX license tag to all pblk files Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	6ad2f619b2	lightnvm: pblk: recover open lines on 2.0 devices In the OCSSD 2.0 spec, each chunk reports its write pointer. This means that pblk does not need to scan open lines to find the write pointer, but instead, it can retrieve it directly (and verify it). This patch uses the write pointer on open lines to (i) recover the line up until the last written lba and (ii) reconstruct the map bitmap and rest of line metadata so that the line can be used for new data. Since the 1.2 path in lightnvm core has been re-implemented to populate the chunk structure and thus recover the write pointer on initialization, this patch removes 1.2 specific recovery, as the 2.0 path can be reused. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	253babc3f6	lightnvm: pblk: take write semaphore on metadata pblk guarantees write ordering at a chunk level through a per open chunk semaphore. At this point, since we only have an open I/O stream for both user and GC data, the semaphore is per parallel unit. For the metadata I/O that is synchronous, the semaphore is not needed as ordering is guaranteed. However, if the metadata scheme changes or multiple streams are open, this guarantee might not be preserved. This patch makes sure that all writes go through the semaphore, even for synchronous I/O. This is consistent with pblk's write I/O model. It also simplifies maintenance since changes in the metadata scheme could cause ordering issues. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:08 -06:00
Javier González	af3fac1664	lightnvm: pblk: refactor metadata paths pblk maintains two different metadata paths for smeta and emeta, which store metadata at the start of the line and at the end of the line, respectively. Until now, these path has been common for writing and retrieving metadata, however, as these paths diverge, the common code becomes less clear and unnecessary complicated. In preparation for further changes to the metadata write path, this patch separates the write and read paths for smeta and emeta and removes the synchronous emeta path as it not used anymore (emeta is scheduled asynchronously to prevent jittering due to internal I/Os). Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Javier González	45dcf29b98	lightnvm: pblk: encapsulate rqd dma allocations dma allocations for ppa_list and meta_list in rqd are replicated in several places across the pblk codebase. Make helpers to encapsulate creation and deletion to simplify the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Javier González	090ee26fd5	lightnvm: use internal allocation for chunk log page The lightnvm subsystem provides helpers to retrieve chunk metadata, where the target needs to provide a buffer to store the metadata. An implicit assumption is that this buffer is contiguous and can be used to retrieve the data from the device. If the device exposes too many chunks, then kmalloc might fail, thus failing instance creation. This patch removes this assumption by implementing an internal buffer in the lightnvm subsystem to retrieve chunk metadata. Targets can then use virtual memory allocations. Since this is a target API change, adapt pblk accordingly. Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Jia-Ju Bai	7325b4bbe5	lightnvm: pblk: fix two sleep-in-atomic-context bugs The driver may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16 are: [FUNC] nvm_dev_dma_alloc(GFP_KERNEL) drivers/lightnvm/pblk-core.c, 754: nvm_dev_dma_alloc in pblk_line_submit_smeta_io drivers/lightnvm/pblk-core.c, 1048: pblk_line_submit_smeta_io in pblk_line_init_bb drivers/lightnvm/pblk-core.c, 1434: pblk_line_init_bb in pblk_line_replace_data drivers/lightnvm/pblk-recovery.c, 980: pblk_line_replace_data in pblk_recov_l2p drivers/lightnvm/pblk-recovery.c, 976: spin_lock in pblk_recov_l2p [FUNC] bio_map_kern(GFP_KERNEL) drivers/lightnvm/pblk-core.c, 762: bio_map_kern in pblk_line_submit_smeta_io drivers/lightnvm/pblk-core.c, 1048: pblk_line_submit_smeta_io in pblk_line_init_bb drivers/lightnvm/pblk-core.c, 1434: pblk_line_init_bb in pblk_line_replace_data drivers/lightnvm/pblk-recovery.c, 980: pblk_line_replace_data in pblk_recov_l2p drivers/lightnvm/pblk-recovery.c, 976: spin_lock in pblk_recov_l2p To fix these bugs, the call to pblk_line_replace_data() is moved out of the spinlock protection. These bugs are found by my static analysis tool DSAC. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Hans Holmberg	bf82fa2f58	lightnvm: pblk: fix mapping issue on failed writes On 1.2-devices, the mapping-out of remaning sectors in the failed-write's block can result in an infinite loop, stalling the write pipeline, fix this. Fixes: `6a3abf5bee` ("lightnvm: pblk: rework write error recovery path") Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Hans Holmberg	1864de94ec	lightnvm: pblk: stop recreating global caches Pblk should not create a set of global caches every time a pblk instance is created. The global caches should be made available only when there is one or more pblk instances. This patch bundles the global caches together with a kref keeping track of whether the caches should be available or not. Also, turn the global pblk lock into a mutex that explicitly protects the caches (as this was the only purpose of the lock). Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Javier González	63dee3a6c3	lightnvm: pblk: calculate line pad distance in helper If a line is padded, calculate the pad distance directly on the helper being used for this purpose. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Javier González	7f985f9a69	lightnvm: move ppa transformations to core Continuing the effort of moving 1.2 and 2.0 specific code to core, move 64_to_32 and 32_to_64 ppa helpers from pblk to core. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Hans Holmberg	4209c31c0c	lightnvm: pblk: add tracing for chunk resets Trace state of chunk resets. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00
Hans Holmberg	1b0dd0bf3d	lightnvm: pblk: add trace events for pblk state changes Add trace events for tracking pblk state changes. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-09 08:25:07 -06:00

... 52 53 54 55 56 ...

791450 commits