Commit graph

915365 commits

Author SHA1 Message Date
Marc Zyngier
d01fd161e8 irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2
Despite the architecture spec requiring that reserved registers in the GIC
distributor memory map are RES0 (and thus are not allowed to generate
an exception), the Cavium ThunderX (aka TX1) SoC explodes as such:

[    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000] GICv3: 128 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] Internal error: synchronous external abort: 96000210 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc4-00035-g3cf6a3d5725f #7956
[    0.000000] Hardware name: cavium,thunder-88xx (DT)
[    0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
[    0.000000] pc : __raw_readl+0x0/0x8
[    0.000000] lr : gic_init_bases+0x110/0x560
[    0.000000] sp : ffff800011243d90
[    0.000000] x29: ffff800011243d90 x28: 0000000000000000
[    0.000000] x27: 0000000000000018 x26: 0000000000000002
[    0.000000] x25: ffff8000116f0000 x24: ffff000fbe6a2c80
[    0.000000] x23: 0000000000000000 x22: ffff010fdc322b68
[    0.000000] x21: ffff800010a7a208 x20: 00000000009b0404
[    0.000000] x19: ffff80001124dad0 x18: 0000000000000010
[    0.000000] x17: 000000004d8d492b x16: 00000000f67eb9af
[    0.000000] x15: ffffffffffffffff x14: ffff800011249908
[    0.000000] x13: ffff800091243ae7 x12: ffff800011243af4
[    0.000000] x11: ffff80001126e000 x10: ffff800011243a70
[    0.000000] x9 : 00000000ffffffd0 x8 : ffff80001069c828
[    0.000000] x7 : 0000000000000059 x6 : ffff8000113fb4d1
[    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
[    0.000000] x1 : 0000000000000000 x0 : ffff8000116f000c
[    0.000000] Call trace:
[    0.000000]  __raw_readl+0x0/0x8
[    0.000000]  gic_of_init+0x188/0x224
[    0.000000]  of_irq_init+0x200/0x3cc
[    0.000000]  irqchip_init+0x1c/0x40
[    0.000000]  init_IRQ+0x160/0x1d0
[    0.000000]  start_kernel+0x2ec/0x4b8
[    0.000000] Code: a8c47bfd d65f03c0 d538d080 d65f03c0 (b9400000)

when reading the GICv4.1 GICD_TYPER2 register, which is unexpected...

Work around it by adding a new quirk for the following variants:

 ThunderX: CN88xx
 OCTEON TX: CN83xx, CN81xx
 OCTEON TX2: CN93xx, CN96xx, CN98xx, CNF95xx*

and use this flag to avoid accessing GICD_TYPER2. Note that all
reserved registers (including redistributors and ITS) are impacted
by this erratum, but that only GICD_TYPER2 has to be worked around
so far.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Robert Richter <rrichter@marvell.com>
Tested-by: Mark Salter <msalter@redhat.com>
Tested-by: Tim Harvey <tharvey@gateworks.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Robert Richter <rrichter@marvell.com>
Link: https://lore.kernel.org/r/20191027144234.8395-11-maz@kernel.org
Link: https://lore.kernel.org/r/20200311115649.26060-1-maz@kernel.org
2020-03-14 10:15:19 +00:00
Takashi Iwai
edd6608644 ACPI: PCI: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 11:15:10 +01:00
Takashi Iwai
949fe25f2a ACPI: fan: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Also adjust the argument to really match with the actually remaining
buffer size.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 11:11:58 +01:00
Rafael J. Wysocki
b1e14999a4 ACPI: EC: Eliminate EC_FLAGS_QUERY_HANDSHAKE
The EC_FLAGS_QUERY_HANDSHAKE switch is never set in the current
code (the only function setting it is defined under #if 0) and
has no effect whatever, so eliminate it and drop the code
depending on it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 11:04:17 +01:00
Rafael J. Wysocki
65a691f5f8 ACPI: EC: Do not clear boot_ec_is_ecdt in acpi_ec_add()
The reason for clearing boot_ec_is_ecdt in acpi_ec_add() (if a
PNP0C09 device object matching the ECDT boot EC had been found in
the namespace) was to cause acpi_ec_ecdt_start() to return early,
but since the latter does not look at boot_ec_is_ecdt any more,
acpi_ec_add() need not clear it.

Moreover, doing that may be confusing as it may cause "DSDT" to be
printed instead of "ECDT" in the EC initialization completion
message, so stop doing it.

While at it, split the EC initialization completion message into
two messages, one regarding the boot EC and another one printed
regardless of whether or not the EC at hand is the boot one.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 11:04:17 +01:00
Rafael J. Wysocki
98ada3c59d ACPI: EC: Simplify acpi_ec_ecdt_start() and acpi_ec_init()
Notice that the return value of acpi_ec_init() is discarded anyway,
so it can be void and it doesn't need to check the return values of
acpi_bus_register_driver() and acpi_ec_ecdt_start() called by it.
Thus the latter can be void too and it really has nothing to do
if acpi_ec_add() has already found an EC matching the boot one in the
namespace.  Also, acpi_ec_ecdt_get_handle() can be folded into it.

Modify the code accordingly and while at it create a propoer kerneldoc
comment to document acpi_ec_ecdt_start() and move the remark regarding
ASUS X550ZE along with the related bug URL from acpi_ec_init() into
that comment.

Additionally, fix up a stale comment in acpi_ec_init().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 11:04:17 +01:00
Rafael J. Wysocki
03e9a0e057 ACPI: EC: Consolidate event handler installation code
Commit 406857f773 ("ACPI: EC: add support for hardware-reduced
systems") made ec_install_handlers() return an error on failures
to configure a GPIO IRQ for the EC, but that is inconsistent with
the handling of the GPE event handler installation failures even
though it is exactly the same issue and the driver can respond to
it in the same way in both cases (the EC can be actively polled
for events through its registers if the event handler installation
fails).

Moreover, it requires acpi_ec_add() to take that special case into
account and disagrees with the ec_install_handlers() header comment.

For this reason, rework the event handler installation code in
ec_install_handlers() to explicitly take deferred probing (that
may be needed in the GPIO IRQ case) into account and to avoid
failing the EC initialization in any other case.

Among other things, reduce code duplication between
install_gpe_event_handler() and install_gpio_irq_event_handler() by
moving some code from there into ec_install_handlers() itself and
simplify the error code path in acpi_ec_add().

While at it, turn the ec_install_handlers() header comment into
a proper kerneldoc one and add some general control flow information
to it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Jian-Hong Pan <jian-hong@endlessm.com>
2020-03-14 10:59:44 +01:00
Nitesh Narayan Lal
0c22056f8c KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
Previously all fields of structure kvm_lapic_irq were not initialized
before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause
an issue when any of those fields are used for processing a request.
For example not initializing the msi_redir_hint field before passing
to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of
kvm_apic_map_get_dest_lapic(). This will specifically happen when the
kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage
value of msi_redir_hint, which should not happen as the request belongs
to APIC fixed delivery mode and we do not want to deliver the
interrupt only to the lowest priority candidate.

This patch initializes all the fields of kvm_lapic_irq based on the
values of ioapic redirect_entry object before passing it on to
kvm_bitmap_or_dest_vcpus().

Fixes: 7ee30bc132 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Set level to false since the value doesn't really matter. Suggested
 by Vitaly Kuznetsov. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 10:46:01 +01:00
Jan Engelhardt
ecb9c79099 acpi/x86: ignore unspecified bit positions in the ACPI global lock field
The value in "new" is constructed from "old" such that all bits defined
as reserved by the ACPI spec[1] are left untouched. But if those bits
do not happen to be all zero, "new < 3" will not evaluate to true.

The firmware of the laptop(s) Medion MD63490 / Akoya P15648 comes with
garbage inside the "FACS" ACPI table. The starting value is
old=0x4944454d, therefore new=0x4944454e, which is >= 3. Mask off
the reserved bits.

[1] https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206553
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 10:41:56 +01:00
Alex Hung
1ffb8d032d acpi/x86: add a kernel parameter to disable ACPI BGRT
BGRT is for displaying seamless OEM logo from booting to login screen;
however, this mechanism does not always work well on all configurations
and the OEM logo can be displayed multiple times. This looks worse than
without BGRT enabled.

This patch adds a kernel parameter to disable BGRT in boot time. This is
easier than re-compiling a kernel with CONFIG_ACPI_BGRT disabled.

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 10:36:49 +01:00
Sean Christopherson
7a57c09bb1 KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if
the CPU supports SGX1.  Per Intel's SDM, all ENCLS leafs #UD if SGX1
is not supported[*], i.e. intercepting ENCLS to inject a #UD is
unnecessary.

Avoiding ENCLS-exiting even when it is reported as supported by the CPU
works around a reported issue where SGX is "hard" disabled after an S3
suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control
are enumerated as unsupported.  While the root cause of the S3 issue is
unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a
workaround for what is most likely a hardware or firmware issue.  As a
bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and
vmcs02.

Note, SGX must be disabled in BIOS to take advantage of this workaround

[*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be
    globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are
    cleared.  Soft disabled meaning disabling SGX without clearing the
    primary CPUID bit (in leaf 0x7) and without poking into non-SGX
    CPU paths, e.g. for the VMCS controls.

Fixes: 0b665d3040 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
Reported-by: Toni Spets <toni.spets@iki.fi>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 10:34:51 +01:00
Alexey Dobriyan
fa0fca68e1 x86/acpi: make "asmlinkage" part first thing in the function definition
g++ insists that function declaration must start with extern "C"
(which asmlinkage expands to).

gcc doesn't care.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 10:29:07 +01:00
Douglas Anderson
d49e7953f9 tty: serial: qcom_geni_serial: Don't try to manually disable the console
The geni serial driver's shutdown code had a special case to call
console_stop().  Grepping through the code, it was the only serial
driver doing something like this (the only other caller of
console_stop() was in serial_core.c).

As far as I can tell there's no reason to call console_stop() in the
geni code.  ...and a good reason _not_ to call it.  Specifically if
you have an agetty running on the same serial port as the console then
killing the agetty kills your console and if you start the agetty
again the console doesn't come back.

Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200313134635.2.I3648fac6c98b887742934146ac2729ecb7232eb1@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-14 09:40:27 +01:00
Douglas Anderson
e83766334f tty: serial: qcom_geni_serial: No need to stop tx/rx on UART shutdown
On a board using qcom_geni_serial I found that I could no longer
interact with kdb if I got a crash after the "agetty" running on the
same serial port was killed.  This meant that various classes of
crashes that happened at reboot time were undebuggable.

Reading through the code, I couldn't figure out why qcom_geni_serial
felt the need to run so much code at port shutdown time.  All we need
to do is disable the interrupt.

After I make this change then a hardcoded kgdb_breakpoint in some late
shutdown code now allows me to interact with the debugger.  I also
could freely close / re-open the port without problems.

Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200313134635.1.Icf54c533065306b02b880c46dfd401d8db34e213@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-14 09:40:27 +01:00
Suravee Suthikulpanit
730ad0ede1 iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE
Commit b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC
(de-)activation code") accidentally left out the ir_data pointer when
calling modity_irte_ga(), which causes the function amd_iommu_update_ga()
to return prematurely due to struct amd_ir_data.ref is NULL and
the "is_run" bit of IRTE does not get updated properly.

This results in bad I/O performance since IOMMU AVIC always generate GA Log
entry and notify IOMMU driver and KVM when it receives interrupt from the
PCI pass-through device instead of directly inject interrupt to the vCPU.

Fixes by passing ir_data when calling modify_irte_ga() as done previously.

Fixes: b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14 09:39:11 +01:00
Daniel Drake
da72a379b2 iommu/vt-d: Ignore devices with out-of-spec domain number
VMD subdevices are created with a PCI domain ID of 0x10000 or
higher.

These subdevices are also handled like all other PCI devices by
dmar_pci_bus_notifier().

However, when dmar_alloc_pci_notify_info() take records of such devices,
it will truncate the domain ID to a u16 value (in info->seg).
The device at (e.g.) 10000:00:02.0 is then treated by the DMAR code as if
it is 0000:00:02.0.

In the unlucky event that a real device also exists at 0000:00:02.0 and
also has a device-specific entry in the DMAR table,
dmar_insert_dev_scope() will crash on:
   BUG_ON(i >= devices_cnt);

That's basically a sanity check that only one PCI device matches a
single DMAR entry; in this case we seem to have two matching devices.

Fix this by ignoring devices that have a domain number higher than
what can be looked up in the DMAR table.

This problem was carefully diagnosed by Jian-Hong Pan.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Fixes: 59ce0515cd ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14 09:38:39 +01:00
Zhenzhong Duan
b0bb0c22c4 iommu/vt-d: Fix the wrong printing in RHSA parsing
When base address in RHSA structure doesn't match base address in
each DRHD structure, the base address in last DRHD is printed out.

This doesn't make sense when there are multiple DRHD units, fix it
by printing the buggy RHSA's base address.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Fixes: fd0c889489 ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14 09:37:58 +01:00
Fabrizio Castro
d260871628 dt-bindings: display: Add idk-1110wr binding
Add binding for the idk-1110wr LVDS panel from Advantech.

Some panel-specific documentation can be found here:
https://buy.advantech.eu/Displays/Embedded-LCD-Kits-LCD-Kit-Modules/model-IDK-1110WR-55WSA1E.htm

Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1583957020-16359-2-git-send-email-prabhakar.mahadev-lad.rj@bp.renesas.com
2020-03-14 08:36:10 +01:00
Kamlesh Gurudasani
fdcf7bb69b drm/tiny: fix sparse warning: incorrect type in assignment (different base types)
This fixes the following sparse warning:

drivers/gpu/drm/tiny/ili9486.c:61:16: sparse: sparse: incorrect type in assignment (different base types)
drivers/gpu/drm/tiny/ili9486.c:61:16: sparse:    expected unsigned short [usertype]
drivers/gpu/drm/tiny/ili9486.c:61:16: sparse:    got restricted __be16 [usertype]
drivers/gpu/drm/tiny/ili9486.c:71:32: sparse: sparse: incorrect type in assignment (different base types)
drivers/gpu/drm/tiny/ili9486.c:71:32: sparse:    expected unsigned short [usertype]
drivers/gpu/drm/tiny/ili9486.c:71:32: sparse:    got restricted __be16 [usertype]

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kamlesh Gurudasani <kamlesh.gurudasani@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1583684084-4694-1-git-send-email-kamlesh.gurudasani@gmail.com
2020-03-14 08:31:30 +01:00
David S. Miller
94229d4523 mlx5-updates-2020-03-13
Misc update to mlx5 core and E-Switch driver:
 
 1) Blue-Field, Update VF vports config when num of VFs changed
 
 From Bodon, Various misc cleanups and refactoring
 for vport enabling/disabling routines to allow them to be called
 dynamically and not only on E-Switch load.
 
 This will allow ECPF (ConnectX BlueField Smartnic) support for dynamic
 num vf changes and dynamic vport creation and configuration as introduced
 in "Update VF vports config when num of VFs changed" patch.
 
 2) From Parav and Mark, trivial clean-ups.
 
 3) Software steering support for flow table id as destination
 and a clean-up patch to remove unnecessary function stubs, from Alex.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5sFqcACgkQSD+KveBX
 +j6KxAf+INpm/mpG4+Vk7loUw0C3xnchOMwIZhKuanWiqsbO+tWC8g1AYgpFRKdE
 gkE18EN0verQEqnol7N7+HhnHboqvmqT1/FOvOF9pRVpQWDEoE73oJzT6vF42u5M
 Hg/Rsh0Q+R6pjDr62/MOJrysCFip87GG6TDerWhQ3Fol6ZL8vHMaQkTdyxcVTIAP
 TidHQWs4Cc82tpfsIirBKIylcbBxbgj4kQE4Ov81hm6FE3h4ZV3rJ+dXp4WCj9TL
 PbSgtnZ1kc+GOryF8AfRU197bIrXFHfraog7Qi6hVspf8QK+w4Iz9wy3Z2SCAxVA
 dCc/DC29o0dOcUn72Xrqc2+6M2Jnyw==
 =ui+6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-03-13

Misc update to mlx5 core and E-Switch driver:

1) Blue-Field, Update VF vports config when num of VFs changed

From Bodon, Various misc cleanups and refactoring
for vport enabling/disabling routines to allow them to be called
dynamically and not only on E-Switch load.

This will allow ECPF (ConnectX BlueField Smartnic) support for dynamic
num vf changes and dynamic vport creation and configuration as introduced
in "Update VF vports config when num of VFs changed" patch.

2) From Parav and Mark, trivial clean-ups.

3) Software steering support for flow table id as destination
and a clean-up patch to remove unnecessary function stubs, from Alex.
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13 21:04:30 -07:00
David S. Miller
44ef976ab3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-03-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 86 non-merge commits during the last 12 day(s) which contain
a total of 107 files changed, 5771 insertions(+), 1700 deletions(-).

The main changes are:

1) Add modify_return attach type which allows to attach to a function via
   BPF trampoline and is run after the fentry and before the fexit programs
   and can pass a return code to the original caller, from KP Singh.

2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher
   objects to be visible in /proc/kallsyms so they can be annotated in
   stack traces, from Jiri Olsa.

3) Extend BPF sockmap to allow for UDP next to existing TCP support in order
   in order to enable this for BPF based socket dispatch, from Lorenz Bauer.

4) Introduce a new bpftool 'prog profile' command which attaches to existing
   BPF programs via fentry and fexit hooks and reads out hardware counters
   during that period, from Song Liu. Example usage:

   bpftool prog profile id 337 duration 3 cycles instructions llc_misses

        4228 run_cnt
     3403698 cycles                                              (84.08%)
     3525294 instructions   #  1.04 insn per cycle               (84.05%)
          13 llc_misses     #  3.69 LLC misses per million isns  (83.50%)

5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition
   of a new bpf_link abstraction to keep in particular BPF tracing programs
   attached even when the applicaion owning them exits, from Andrii Nakryiko.

6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering
   and which returns the PID as seen by the init namespace, from Carlos Neira.

7) Refactor of RISC-V JIT code to move out common pieces and addition of a
   new RV32G BPF JIT compiler, from Luke Nelson.

8) Add gso_size context member to __sk_buff in order to be able to know whether
   a given skb is GSO or not, from Willem de Bruijn.

9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output
   implementation but can be called from tracepoint programs, from Eelco Chaudron.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13 20:52:03 -07:00
Nathan Chancellor
82f2bc2fcc kbuild: Disable -Wpointer-to-enum-cast
Clang's -Wpointer-to-int-cast deviates from GCC in that it warns when
casting to enums. The kernel does this in certain places, such as device
tree matches to set the version of the device being used, which allows
the kernel to avoid using a gigantic union.

https://elixir.bootlin.com/linux/v5.5.8/source/drivers/ata/ahci_brcm.c#L428
https://elixir.bootlin.com/linux/v5.5.8/source/drivers/ata/ahci_brcm.c#L402
https://elixir.bootlin.com/linux/v5.5.8/source/include/linux/mod_devicetable.h#L264

To avoid a ton of false positive warnings, disable this particular part
of the warning, which has been split off into a separate diagnostic so
that the entire warning does not need to be turned off for clang. It
will be visible under W=1 in case people want to go about fixing these
easily and enabling the warning treewide.

Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/887
Link: 2a41b31fcd
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-14 10:31:08 +09:00
Al Viro
6dfd9fe54d follow_dotdot{,_rcu}(): switch to use of step_into()
gets the regular mount crossing on result of ..

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
7521f22b3c handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into()
Right now the tail ends of follow_dotdot{,_rcu}() are pretty
much the open-coded analogues of step_into().  The differences:
	* the lack of proper LOOKUP_NO_XDEV handling in non-RCU case
(arguably a bug)
	* the lack of ->d_manage() handling (again, arguably a bug)

Adjust the calling conventions so that on the next step with could
just switch those functions to returning step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
957dd41d88 move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into()
pure move; we are going to have step_into() called by that bunch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
c9a0f75d81 follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loop
Behaviour change: LOOKUP_BENEATH lookup of .. in absolute root
yields an error even if it's not the process' root.  That's
possible only if you'd managed to escape chroot jail by way of
procfs symlinks, but IMO the resulting behaviour is not worse -
more consistent and easier to describe:
	".." in root is "stay where you are", uness LOOKUP_BENEATH
	has been given, in which case it's "fail with EXDEV".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
abc2c632e0 follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
a6a7eb7628 expand path_parent_directory() in its callers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
63b27720a4 path_parent_directory(): leave changing path->dentry to callers
Instead of returning 0, return new dentry; instead of returning
-ENOENT, return NULL.  Adjust the callers accordingly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
6b03f7edf4 path_connected(): pass mount and dentry separately
eventually we'll want to do that check *before* mangling
nd->path.dentry...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
c981a48281 split the lookup-related parts of do_last() into a separate helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
973d4b73fb do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case
... getting may_create_in_sticky() checks in FMODE_OPENED case as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
8795e7d482 do_last(): simplify the liveness analysis past finish_open_created
Don't mess with got_write there - it is guaranteed to be false on
entry and it will be set true if and only if we decide to go for
truncation and manage to get write access for that.

Don't carry acc_mode through the entire thing - it's only used
in that part.  And don't bother with gotos in there - compiler is
quite capable of optimizing that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:13 -04:00
Al Viro
5a2d3edd8d do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
59e96e6583 do_last(): don't bother with keeping got_write in FMODE_OPENED case
it's easier to drop it right after lookup_open() and regain if
needed (i.e. if we will need to truncate).  On the non-FMODE_OPENED
path we do that anyway.  In case of FMODE_CREATED we won't be
needing it.  And it's easier to prove correctness that way,
especially since the initial failure to get write access is not
always fatal; proving that we'll never end up truncating in that
case is rather convoluted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
3ad5615a07 do_last(): merge the may_open() calls
have FMODE_OPENED case rejoin the main path at earlier point

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
7be219b4dc atomic_open(): lift the call of may_open() into do_last()
there we'll be able to merge it with its counterparts in other
cases, and there's no reason to do it before the parent has
been unlocked

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
6fb968cdf9 atomic_open(): return the right dentry in FMODE_OPENED case
->atomic_open() might have used a different alias than the one we'd
passed to it; in "not opened" case we take care of that, in "opened"
one we don't.  Currently we don't care downstream of "opened" case
which alias to return; however, that will change shortly when we
get to unifying may_open() calls.

It's not hard to get right in all cases, anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
9deed3ebca new helper: traverse_mounts()
common guts of follow_down() and follow_managed() taken to a new
helper - traverse_mounts().  The remnants of follow_managed()
are folded into its sole remaining caller (handle_mounts()).
Calling conventions of handle_mounts() slightly sanitized -
instead of the weird "1 for success, -E... for failure" that used
to be imposed by the calling conventions of walk_component() et.al.
we can use the normal "0 for success, -E... for failure".

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
ea936aeb3e massage __follow_mount_rcu() a bit
make the loop more similar to that in follow_managed(), with
explicit tracking of flags, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
c108837e06 namei: have link_path_walk() maintain LOOKUP_PARENT
set on entry, clear when we get to the last component.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
d8d4611a4f link_path_walk(): simplify stack handling
We use nd->stack to store two things: pinning down the symlinks
we are resolving and resuming the name traversal when a nested
symlink is finished.

Currently, nd->depth is used to keep track of both.  It's 0 when
we call link_path_walk() for the first time (for the pathname
itself) and 1 on all subsequent calls (for trailing symlinks,
if any).  That's fine, as far as pinning symlinks goes - when
handling a trailing symlink, the string we are interpreting
is the body of symlink pinned down in nd->stack[0].  It's
rather inconvenient with respect to handling nested symlinks,
though - when we run out of a string we are currently interpreting,
we need to decide whether it's a nested symlink (in which case
we need to pick the string saved back when we started to interpret
that nested symlink and resume its traversal) or not (in which
case we are done with link_path_walk()).

Current solution is a bit of a kludge - in handling of trailing symlink
(in lookup_last() and open_last_lookups() we clear nd->stack[0].name.
That allows link_path_walk() to use the following rules when
running out of a string to interpret:
	* if nd->depth is zero, we are at the end of pathname itself.
	* if nd->depth is positive, check the saved string; for
nested symlink it will be non-NULL, for trailing symlink - NULL.

It works, but it's rather non-obvious.  Note that we have two sets:
the set of symlinks currently being traversed and the set of postponed
pathname tails.  The former is stored in nd->stack[0..nd->depth-1].link
and it's valid throught the pathname resolution; the latter is valid only
during an individual call of link_path_walk() and it occupies
nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and
nd->stack[1..nd->depth-1].name for subsequent ones.  The kludge is basically
a way to recognize the second set becoming empty.

The things get simpler if we keep track of the second set's size
explicitly and always store it in nd->stack[0..depth-1].name.
We access the second set only inside link_path_walk(), so its
size can live in a local variable; that way the check becomes
trivial without the need of that kludge.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
b1a8197240 pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:12 -04:00
Al Viro
8c4efe22e7 namei: invert the meaning of WALK_FOLLOW
old flags & WALK_FOLLOW <=> new !(flags & WALK_TRAILING)
That's what that flag had really been used for.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:09:09 -04:00
Al Viro
b4c0353693 sanitize handling of nd->last_type, kill LAST_BIND
->last_type values are set in 3 places: path_init() (sets to LAST_ROOT),
link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).

The are checked in walk_component(), lookup_last() and do_last().
They also get copied to the caller by filename_parentat().  In the last
3 cases the value is what we had at the return from link_path_walk().
In case of walk_component() it's either directly downstream from
assignment in link_path_walk() or, when called by lookup_last(), the
value we have at the return from link_path_walk().

The value at the entry into link_path_walk() can survive to return only
if the pathname contains nothing but slashes.  Note that pick_link()
never returns such - pure jumps are handled directly.  So for the calls
of link_path_walk() for trailing symlinks it does not matter what value
had been there at the entry; the value at the return won't depend upon it.

There are 3 call chains that might have pick_link() storing LAST_BIND:

1) pick_link() from step_into() from walk_component() from
link_path_walk().  In that case we will either be parsing the next
component immediately after return into link_path_walk(), which will
overwrite the ->last_type before anyone has a chance to look at it,
or we'll fail, in which case nobody will be looking at ->last_type at all.

2) pick_link() from step_into() from walk_component() from lookup_last().
The value is never looked at due to the above; it won't affect the value
seen at return from any link_path_walk().

3) pick_link() from step_into() from do_last().  Ditto.

In other words, assignemnt in pick_link() is pointless, and so is
LAST_BIND itself; nothing ever looks at that value.  Kill it off.
And make link_path_walk() _always_ assign ->last_type - in the only
case when the value at the entry might survive to the return that value
is always LAST_ROOT, inherited from path_init().  Move that assignment
from path_init() into the beginning of link_path_walk(), to consolidate
the things.

Historical note: LAST_BIND used to be used for the kludge with trailing
pure jump symlinks (extra iteration through the top-level loop).
No point keeping it anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:08:19 -04:00
Al Viro
ad6cc4c338 finally fold get_link() into pick_link()
kill nd->link_inode, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:08:19 -04:00
Al Viro
06708adb99 merging pick_link() with get_link(), part 6
move the only remaining call of get_link() into pick_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:08:18 -04:00
Al Viro
b0417d2c72 merging pick_link() with get_link(), part 5
move get_link() call into step_into().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:08:18 -04:00
Al Viro
92d270165c merging pick_link() with get_link(), part 4
Move the call of get_link() into walk_component().  Change the
calling conventions for walk_component() to returning the link
body to follow (if any).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:08:18 -04:00
Al Viro
40fcf5a931 merging pick_link() with get_link(), part 3
After a pure jump ("/" or procfs-style symlink) we don't need to
hold the link anymore.  link_path_walk() dropped it if such case
had been detected, lookup_last/do_last() (i.e. old trailing_symlink())
left it on the stack - it ended up calling terminate_walk() shortly
anyway, which would've purged the entire stack.

Do it in get_link() itself instead.  Simpler logics that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13 21:08:18 -04:00