Commit graph

781637 commits

Author SHA1 Message Date
Keith Busch
1e687220ef libnvdimm: Export max available extent
The 'available_size' attribute showing the combined total of all
unallocated space isn't always useful to know how large of a namespace
a user may be able to allocate if the region is fragmented. This patch
will export the largest extent of unallocated space that may be allocated
to create a new namespace.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2018-07-25 14:11:43 -07:00
Keith Busch
12e3129e29 libnvdimm: Use max contiguous area for namespace size
This patch will find the max contiguous area to determine the largest
pmem namespace size that can be created. If the requested size exceeds
the largest available, ENOSPC error will be returned.

This fixes the allocation underrun error and wrong error return code
that have otherwise been observed as the following kernel warning:

  WARNING: CPU: <CPU> PID: <PID> at drivers/nvdimm/namespace_devs.c:913 size_store

Fixes: a1f3e4d6a0 ("libnvdimm, region: update nd_region_available_dpa() for multi-pmem support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2018-07-25 14:11:09 -07:00
Andreas Gruenbacher
776125785a gfs2: Special-case rindex for gfs2_grow
To speed up the common case of appending to a file,
gfs2_write_alloc_required presumes that writing beyond the end of a file
will always require additional blocks to be allocated.  This assumption
is incorrect for preallocates files, but there are no negative
consequences as long as *some* space is still left on the filesystem.

One special file that always has some space preallocated beyond the end
of the file is the rindex: when growing a filesystem, gfs2_grow adds one
or more new resource groups and appends records describing those
resource groups to the rindex; the preallocated space ensures that this
is always possible.

However, when a filesystem is completely full, gfs2_write_alloc_required
will indicate that an additional allocation is required, and appending
the next record to the rindex will fail even though space for that
record has already been preallocated.  To fix that, skip the incorrect
optimization in gfs2_write_alloc_required, but for the rindex only.
Other writes to preallocated space beyond the end of the file are still
allowed to fail on completely full filesystems.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-25 22:56:14 +02:00
dann frazier
7856e86162 hinic: Link the logical network device to the pci device in sysfs
Otherwise interfaces get exposed under /sys/devices/virtual, which
doesn't give udev the context it needs for PCI-based predictable
interface names.

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 13:54:52 -07:00
Bjorn Helgaas
bc510d59cf vxge: Remove unnecessary include of <linux/pci_hotplug.h>
The vxge driver doesn't need anything provided by pci_hotplug.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 13:52:38 -07:00
Linus Walleij
8c17dee170 Samsung pinctrl drivers changes for v4.19
1. Add handling of external wakeup interrupts mask inside the pin
    controller driver.
 
    Existing solution is spread between the driver and machine code.  The
    machine code writes the mask but its value is taken from pin
    controller driver.
 
    This moves everything into pin controller driver allowing later to
    remove the cross-subsystem interaction.  Also this is a necessary
    step for implementing later Suspend to RAM on ARMv8 Exynos5433.
 
 2. Bring necessary suspend/resume callbacks for Exynos542x and
    Exynos5260.
 3. Document hidden requirement about one external wakeup interrupts
    device node.
 4. Minor documentation cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQItBAABCAAXBQJbWJ6WEBxrcnprQGtlcm5lbC5vcmcACgkQwTdm5oaLg9dvJBAA
 kgLUDDgkY+lKOQ/dRA7HJf2OTHSPZ3hhXpD8BnaRnqs8HRTWW8xVX+ts/wbXFnw2
 s4Q45hYNqqSfH3lzQ67fKokjoQMf2TtmidXaHVfnVlNHa7gcFW0yj1Kc/4qRyRql
 xo7FptZXM1bFZ/su3VMbSnBH+2n9gn4RDC5Zk5Vzgr6jC7Pu2kSgM0Q4dpk7sJg/
 TwRT+HZV2RDN3APByGWHEZ5gbOtxj6L8+gHsvtgbf8STHVIlAAUS/dDAFAISusEI
 SIkewPCZFvT9FWYqFQjuS7JTAVPgXeU+JisZaRyhR2sQFgu5qyvEZzRDV87f4X2b
 Zf0dRMrrcScsNcIMerkKki0r1FVxPEaSjgAuK+8x5u+RopAgbmmiFBSt2pYUayVb
 7M8rcfqLRgb6V8/Um/aGajirQaAj5DWtZGPbtCGF01uKCPPaEgE5ch2dXPjKeNbw
 HSoj2dOKDnPApzWcsVbRpQfo9E9VszT3JZ9CEV64dxWLebYvQDx9f2vZxxkJnuYS
 EcnPXY2E+QkwpSavwmqsbJhrHGVO8scveKQSS3TCmMyOJp2feerBlsyWXCjHGpeA
 JLdYXPnj23jIQrcAtRu5a3DskjpT3CgsEheaRikds03vT706qEISLpTDHLbfhvGy
 6oetWL6Cwzqq2tRpyOdX5Z/DWthRXJmhKyHhmGKI+ec=
 =a9XD
 -----END PGP SIGNATURE-----

Merge tag 'samsung-pinctrl-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung into devel

Samsung pinctrl drivers changes for v4.19

1. Add handling of external wakeup interrupts mask inside the pin
   controller driver.

   Existing solution is spread between the driver and machine code.  The
   machine code writes the mask but its value is taken from pin
   controller driver.

   This moves everything into pin controller driver allowing later to
   remove the cross-subsystem interaction.  Also this is a necessary
   step for implementing later Suspend to RAM on ARMv8 Exynos5433.

2. Bring necessary suspend/resume callbacks for Exynos542x and
   Exynos5260.

3. Document hidden requirement about one external wakeup interrupts
   device node.

4. Minor documentation cleanups.
2018-07-25 22:47:03 +02:00
Peter De-Schrijver
633e79650b clk: tegra: Add sdmmc mux divider clock
Add a clock type to model the sdmmc switch divider clocks which have paths
to source clocks bypassing the divider (Low Jitter paths). These
are handled by selecting the lj path when the divider is 1 (ie the
rate is the parent rate), otherwise the normal path with divider
will be selected. Otherwise this clock behaves as a normal peripheral
clock.

Signed-off-by: Peter De-Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-07-25 13:45:09 -07:00
Peter De Schrijver
cb3ac5947a clk: tegra: Refactor fractional divider calculation
Move this to a separate file so it can be used to calculate the sdmmc
clock dividers.

Signed-off-by: Peter De-Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-07-25 13:43:34 -07:00
Aapo Vienamo
0cbb61a313 clk: tegra: Fix includes required by fence_udelay()
Add the missing linux/delay.h include statement for udelay() used by
fence_udelay() macro.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-07-25 13:43:13 -07:00
Heiner Kallweit
3c507b8af6 net: phy: add helper phy_polling_mode
Add a helper for checking whether polling is used to detect PHY status
changes.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 13:41:22 -07:00
Krzysztof Kozlowski
d805f6a868 net: ethernet: fs-enet: Use generic CRC32 implementation
Use generic kernel CRC32 implementation because it:
1. Should be faster (uses lookup tables),
2. Removes duplicated CRC generation code,
3. Uses well-proven algorithm instead of coding it one more time.

Suggested-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 13:40:39 -07:00
Krzysztof Kozlowski
16f6e9835b net: ethernet: freescale: Use generic CRC32 implementation
Use generic kernel CRC32 implementation because it:
1. Should be faster (uses lookup tables),
2. Removes duplicated CRC generation code,
3. Uses well-proven algorithm instead of coding it one more time.

Suggested-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 13:40:25 -07:00
Varsha Rao
076dd53be5 IB/core: Remove extra parentheses
Remove unnecessary parentheses to fix the clang warning of extraneous
parentheses.

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:36:29 -06:00
Bart Van Assche
9491a1edba RDMA/ocrdma: Suppress a compiler warning
This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:

In function 'ocrdma_mbx_get_ctrl_attribs',
    inlined from 'ocrdma_init_hw' at drivers/infiniband/hw/ocrdma/ocrdma_hw.c:3224:11:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:1368:3: warning: 'strncpy' output may be truncated copying 31 bytes from a string of length 31 [-Wstringop-truncation]
   strncpy(dev->model_number,
   ^~~~~~~~~~~~~~~~~~~~~~~~~~
    hba_attribs->controller_model_number, 31);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:35:11 -06:00
Nicholas Mc Guire
7f5eac5934 clk: imx6sll: fix missing of_node_put()
of_find_compatible_node() is returning a device node with refcount
incremented and must be explicitly decremented after the last use
which is right after the us in of_iomap() here.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: 4a5f720b65 ("clk: imx: add clock driver for imx6sll")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-07-25 13:35:00 -07:00
Nicholas Mc Guire
11177e7a7a clk: imx6ul: fix missing of_node_put()
of_find_compatible_node() is returning a device node with refcount
incremented and must be explicitly decremented after the last use
which is right after the us in of_iomap() here.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: 787b4271a6 ("clk: imx: add imx6ul clk tree support")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-07-25 13:34:57 -07:00
Ingo Molnar
25a00ac7dc perf/cores fixes and improvements:
Tools:
 
 top:
 
 - Fix 'struct comm_str' removal crash race, detected with refcount_t
   debugging (Jiri Olsa)
 
 - Use last_match threads cache only in single threaded mode, fixing
   a crash (Jiri Olsa)
 
 record:
 
 - Synthesize GROUP_DESC feature in pipe mode fixing display of
   event groups (Jiri Olsa)
 
 stat:
 
 - Get rid of extra clock display function (Jiri Olsa)
 
 perf script:
 
 - Show correct offsets for DWARF-based unwinding (Sandipan Das)
 
 test:
 
 - Check that complex event name is parsed correctly (Alexey Budankov)
 
 - Fix subtest number when showing results (Thomas Richter)
 
 Arch specific:
 
 arm64:
 
 - Generate syscall table from the kernel sources (asm/unistd.h) like
   other arches do, speeding up the support for new system calls in
   tools such as 'perf trace' (Kim Phillips)
 
 arm:
 
 - Bail out immediatelly on CoreSight hardware tracing instruction sample failure (Leo Yan)
 
 PowerPC:
 
 - Fix record+probe_libc_inet_pton.sh 'perf test' entry (Sandipan Das)
 
 - Callchain IP filtering fixes (Sandipan Das)
 
 S/390:
 
 - Add support for detailed S/390 PMU event description in 'perf list' (Thomas Richter)
 
 - Add transaction flag (-T) support in 'perf stat' for S/390 (Thomas Richter)
 
 - Fix 'perf kvm' S/390 subcommands (Thomas Richter)
 
 Infrastructure:
 
 hists:
 
 - Clarify callchain disabling when available (Arnaldo Carvalho de Melo)
 
 evsel:
 
 - Use perf_evsel__match instead of open coded equivalent (Jiri Olsa)
 
 Documentation:
 
 - Add missing documentation for 'perf list' --desc and --debug options (Sangwon Hong)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAltYrXUACgkQ1lAW81NS
 qkD8jQ//STGb87h4aFaXSHucRFfSqBTj1/6nOlk9PafJ1wLmOZhWVccUJzvPENJq
 wRCffnQi0yMaFJBq4omfVf0zlnCvg78jb+4oYhExy66x+6edvYPg6UF1O68EhjZC
 dbe7ejkq6uiG7kpJpwaZQNBa4UiJ1DtlPmKUukpJnZD45JWWU6l0b5j0xKs/pybI
 phApii5iRB6KEWL/NvqdQJBJ796dJJKKRHMLCjU7GQfkmpMT++/qqecBwlrQdhos
 BLsNYoqhktUlm3GYJEWmzn/rJNNiQPiJcF5cy+I8/+a7v0dQgqIGrkq9fqoW060f
 GlIGs75ZT5GnWqv6JpmYWoUUYw0qeLvjPgh2vTwfWcidO1jWXF6BV9vo20LNukmB
 rQgMa4N/ypPk0aGPb1n/VraUPrGGdSC/p4XheJ3SiwEDwb2xbhRUCJJIA/X/Xn9G
 COJrGeypKbHVq2r4r81cp0yEtO6qrkjfoDlRXamCpiNVtU0nPM1A1alutNAeDk0s
 2rQnqidku8DxtKaTBJIuA+YZAiM1bSPFEbY2p+NR/op4tAxAXAtkJgUi1x39J8uW
 9TmbYkHRBBk59S+nOJQJXHz0KBecuN4HS9KZTT2Qw9sgGu+lLPtg7+rLUbwV87d+
 5o41ZxAEV/000n0lU17chDvbQxCYl5jxmNRfHTiRc4bWI+ltM1o=
 =+kyy
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-4.19-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/cores fixes and improvements from Arnaldo Carvalho de Melo:

Tools:

top:

- Fix 'struct comm_str' removal crash race, detected with refcount_t
  debugging (Jiri Olsa)

- Use last_match threads cache only in single threaded mode, fixing
  a crash (Jiri Olsa)

record:

- Synthesize GROUP_DESC feature in pipe mode fixing display of
  event groups (Jiri Olsa)

stat:

- Get rid of extra clock display function (Jiri Olsa)

perf script:

- Show correct offsets for DWARF-based unwinding (Sandipan Das)

test:

- Check that complex event name is parsed correctly (Alexey Budankov)

- Fix subtest number when showing results (Thomas Richter)

Arch specific:

arm64:

- Generate syscall table from the kernel sources (asm/unistd.h) like
  other arches do, speeding up the support for new system calls in
  tools such as 'perf trace' (Kim Phillips)

arm:

- Bail out immediatelly on CoreSight hardware tracing instruction sample failure (Leo Yan)

PowerPC:

- Fix record+probe_libc_inet_pton.sh 'perf test' entry (Sandipan Das)

- Callchain IP filtering fixes (Sandipan Das)

S/390:

- Add support for detailed S/390 PMU event description in 'perf list' (Thomas Richter)

- Add transaction flag (-T) support in 'perf stat' for S/390 (Thomas Richter)

- Fix 'perf kvm' S/390 subcommands (Thomas Richter)

Infrastructure:

hists:

- Clarify callchain disabling when available (Arnaldo Carvalho de Melo)

evsel:

- Use perf_evsel__match instead of open coded equivalent (Jiri Olsa)

Documentation:

- Add missing documentation for 'perf list' --desc and --debug options (Sangwon Hong)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25 22:31:00 +02:00
Jason Gunthorpe
22fa27fbc6 IB/uverbs: Fix locking around struct ib_uverbs_file ucontext
We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.

Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
  while holding a READ or WRITE lock on the uobject.
  This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL

This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.

As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:46 -06:00
Jason Gunthorpe
c36ee46daf IB/mlx5: Use the ucontext from the uobj, not the file
This approach matches the standard flow of the typical write method that
relies on the HW object to store the device and the uobject to access the
ucontext.  Avoids the use of the devx_ufile2uctx in several places will
make revising the semantics of ib_uverbs_get_ucontext() in the next patch
simpler.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
aba94548c9 IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit
Allocating the struct file during alloc_begin creates this strange
asymmetry with IDR, where the FD has two krefs pointing at it during the
pre-commit phase. In particular this makes the abort process for FD very
strange and confusing.

For instance abort currently calls the type's destroy_object twice, and
the fops release once if abort is done. This is very counter intuitive. No
fops should be called until alloc_commit succeeds, and destroy_object
should only ever be called once.

Moving the struct file allocation to the alloc_commit is now simple, as we
already support failure of rdma_alloc_commit_uobject, with all the
required rollback pieces.

This creates an understandable symmetry with IDR and simplifies/fixes the
abort handling for FD types.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
2c96eb7d62 IB/uverbs: Always propagate errors from rdma_alloc_commit_uobject()
The ioctl framework already does this correctly, but the write path did
not. This is trivially fixed by simply using a standard pattern to return
uobj_alloc_commit() as the last statement in every function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
e951747a08 IB/uverbs: Rework the locking for cleaning up the ucontext
The locking here has always been a bit crazy and spread out, upon some
careful analysis we can simplify things.

Create a single function uverbs_destroy_ufile_hw() that internally handles
all locking. This pulls together pieces of this process that were
sprinkled all over the places into one place, and covers them with one
lock.

This eliminates several duplicate/confusing locks and makes the control
flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
simple.

Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
logically part of the rwsem and provides the 'down write, fail if write
locked, wait if read locked' semantic we require.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
87064277c4 IB/uverbs: Revise and clarify the rwsem and uobjects_lock
Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call
to the type destroy function (aka 'hw' destroy). The main purpose of this
lock is to prevent normal add and destroy from running concurrently with
uverbs_cleanup_ufile()

Since the uobjects list is always manipulated under the 'hw_destroy_rwsem'
we can eliminate the uobjects_lock in the cleanup function. This allows
converting that lock to a very simple spinlock with a narrow critical
section.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
e6d5d5ddd0 IB/uverbs: Clarify and revise uverbs_close_fd
The locking requirements here have changed slightly now that we can rely
on the ib_uverbs_file always existing and containing all the necessary
locking infrastructure.

That means we can get rid of the cleanup_mutex usage (this was protecting
the check on !uboj->context).

Otherwise, follow the same pattern that IDR uses for destroy, acquire
exclusive write access, then call destroy and the undo the 'lookup'.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
5671f79b42 IB/uverbs: Revise the placement of get/puts on uobject
This wasn't wrong, but the placement of two krefs didn't make any
sense. Follow some simple rules.

- A kref is held inside uobjects_list
- A kref is held inside the IDR
- A kref is held inside file->private
- A stack based kref is passed bettwen alloc_begin and
  alloc_abort/alloc_commit

Any place we destroy one of the above pointers, we stick a put,
or 'move' the kref into another pointer.

The key functions have sensible semantics:
- alloc_uobj fully initializes the common members in uobj, including
  the list
- Get rid of the uverbs_idr_remove_uobj helper since IDR remove
  does require put, but it depends on the situation. Later
  patches will re-consolidate this differently.
- alloc_abort always consumes the passed kref, done in the type
- alloc_commit always consumes the passed kref, done in the type
- rdma_remove_commit_uobject always pairs with a lookup_get

After it is all done the only control flow change is to:
- move a get from alloc_commit_fd_uobject to rdma_alloc_commit_uobject
- add a put to remove_commit_idr_uobject
- Consistenly use rdma_lookup_put in rdma_remove_commit_uobject at
  the right place

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe
c561c28846 IB/uverbs: Clarify the kref'ing ordering for alloc_commit
The alloc_commit callback makes the uobj visible to other threads,
and it does so using a 'move' semantic of the uobj kref on the stack
into the public storage (eg the IDR, uobject list and file_private_data)

Once this is done another thread could start up and trigger deletion
of the kref. Fortunately cleanup_rwsem happens to prevent this from
being a bug, but that is a fantastically unclear side effect.

Re-organize things so that alloc_commit is that last thing to touch
the uobj, get rid of the sneaky implicit dependency on cleanup_rwsem,
and add a comment reminding that uobj is no longer kref'd after
alloc_commit.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:21 -06:00
Jason Gunthorpe
1250c3048c IB/uverbs: Handle IDR and FD types without truncation
Our ABI for write() uses a s32 for FDs and a u32 for IDRs, but internally
we ended up implicitly casting these ABI values into an 'int'. For ioctl()
we use a s64 for FDs and a u64 for IDRs, again casting to an int.

The various casts to int are all missing range checks which can cause
userspace values that should be considered invalid to be accepted.

Fix this by making the generic lookup routine accept a s64, which does not
truncate the write API's u32/s32 or the ioctl API's s64. Then push the
detailed range checking down to the actual type implementations to be
shared by both interfaces.

Finally, change the copy of the uobj->id to sign extend into a s64, so eg,
if we ever wish to return a negative value for a FD it is carried
properly.

This ensures that userspace values are never weirdly interpreted due to
the various trunctations and everything that is really out of range gets
an EINVAL.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:21 -06:00
Jason Gunthorpe
3df593bfe6 IB/uverbs: Get rid of null_obj_type
If the method fails after calling rdma_explicit_destroy (eg if
copy_to_user faults) then it will trigger a kernel oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000000548d067 P4D 800000000548d067 PUD 54a0067 PMD 0
SMP PTI
CPU: 0 PID: 359 Comm: ibv_rc_pingpong Not tainted 4.18.0-rc1+ #28
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
RIP: 0010:          (null)
Code: Bad RIP value.
RSP: 0018:ffffc900001a3bf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000603bd00 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000603bd00
RBP: 0000000000000001 R08: ffffc900001a3cf8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900001a3cf0
R13: 0000000000000000 R14: ffffc900001a3cf0 R15: 0000000000000000
FS:  00007fb00dda8700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000000548e004 CR4: 00000000003606b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rdma_lookup_put_uobject+0x22/0x50 [ib_uverbs]
 ? uverbs_finalize_object+0x3b/0x60 [ib_uverbs]
 ? uverbs_finalize_attrs+0x128/0x140 [ib_uverbs]
 ? ib_uverbs_cmd_verbs+0x698/0x7c0 [ib_uverbs]
 ? find_held_lock+0x2d/0x90
 ? __might_fault+0x39/0x90
 ? ib_uverbs_ioctl+0x111/0x1f0 [ib_uverbs]
 ? do_vfs_ioctl+0xa0/0x6d0
 ? trace_hardirqs_on_caller+0xed/0x180
 ? _raw_spin_unlock_irq+0x24/0x40
 ? syscall_trace_enter+0x138/0x1d0
 ? ksys_ioctl+0x35/0x60
 ? __x64_sys_ioctl+0x11/0x20
 ? do_syscall_64+0x5b/0x1c0
 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is because the type was replaced with the null_type during explicit
destroy that cannot complete the destruction.

One of the side effects of replacing the type is to make the object
handle totally unreachable - so no other command could attempt to use
it, even though it remains on the uboject list.

We can get the same end result by just fully destroying the object inside
rdma_explicit_destroy and leaving the caller the residual kref for the
uobj with no attached HW object, and no presence in the ubojects list.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-25 14:21:21 -06:00
Rob Herring
791d3ef2e1 dt-bindings: remove 'interrupt-parent' from bindings
'interrupt-parent' is often documented as part of define bindings, but
it is really outside the scope of a device binding. It's never required
in a given node as it is often inherited from a parent node. Or it can
be implicit if a parent node is an 'interrupt-controller' node. So
remove it from all the binding files.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: devicetree@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2018-07-25 14:09:39 -06:00
Marcel Ziswiler
13d6753f1d pinctrl: tegra: fix spelling in devicetree binding document
This fixes a spelling mistake.

Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2018-07-25 14:09:39 -06:00
Jia-Ju Bai
586092ab4b gpu: drm: amdgpu: Replace mdelay with msleep in cik_pcie_gen3_enable()
cik_pcie_gen3_enable() is only called by cik_common_hw_init(), which is
never called in atomic context.
cik_pcie_gen3_enable() calls mdelay() to busily wait, which is not
necessary.
mdelay() can be replaced with msleep().

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:06:40 -05:00
Alex Deucher
6cdf4e87b4 drm/amdgpu/gmc9: clarify GPUVM fault error message
The address printed is the actual address, not the page.

Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:06:40 -05:00
Nayan Deshmukh
068c330419 drm/scheduler: remove sched field from the entity
The scheduler of the entity is decided by the run queue on which
it is queued. This patch avoids us the effort required to maintain
a sync between rq and sched field when we start shifting entites
among different rqs.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:06:26 -05:00
Nayan Deshmukh
cdc5017659 drm/scheduler: modify API to avoid redundancy
entity has a scheduler field and we don't need the sched argument
in any of the functions where entity is provided.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:06:19 -05:00
Christian König
bf314ca3f1 drm/amdgpu: reduce the number of placements for a BO
Make struct amdgpu_bo a bit smaller.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:06:13 -05:00
Christian König
c704ab18e0 drm/amdgpu: consistenly name amdgpu_bo_ functions
Just rename functions, no functional change.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:06:06 -05:00
Christian König
6beccb15c4 MAINTAINERS: add entry for AMD PP code
Add separate entry for the power managent code on AMD GPUs.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Rex Zhu <rezhu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:05:54 -05:00
Christian König
dbae59466f MAINTAINERS: Add separate section for DC
Note that Harry and Leo Li are maintainers for that stuff.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:05:47 -05:00
Christian König
122b5a0598 MAINTAINERS: add new TTM maintainers
Roger unfortunately doesn't work for AMD any longer. So add Rui and
Jerry as co-maintainer as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:05:39 -05:00
Christian König
4d4831a3da drm/amdgpu: expose only the first UVD instance for now
Going to completely rework the context to ring mapping with Nayan's GSoC
work, but for now just stopping to expose the second UVD instance should
do it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:05:32 -05:00
Christian König
f8a91d4555 drm/amdgpu: clean up coding style a bit
No need to bitcast a boolean and even if we should use "!!" instead.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25 15:05:08 -05:00
Camelia Groza
34786005ec net: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg
genphy_config_aneg() should be called only by PHYs that implement
the Clause 22 register set. Prevent Clause 45 PHYs that don't implement
the register set from calling the genphy function.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 13:02:37 -07:00
Toshiaki Makita
ecbc42ca5d virtio_net: Fix incosistent received bytes counter
When received packets are dropped in virtio_net driver, received packets
counter is incremented but bytes counter is not.
As a result, for instance if we drop all packets by XDP, only received
is counted and bytes stays 0, which looks inconsistent.
IMHO received packets/bytes should be counted if packets are produced by
the hypervisor, like what common NICs on physical machines are doing.
So fix the bytes counter.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:58:25 -07:00
David S. Miller
7e50b2a51d Merge branch 'virtio_net-Add-ethtool-stat-items'
Toshiaki Makita says:

====================
virtio_net: Add ethtool stat items

Add some ethtool stat items useful for performance analysis.
====================

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:53:38 -07:00
Toshiaki Makita
461f03dc99 virtio_net: Add kick stats
So we can infer the number of VM-Exits.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:53:37 -07:00
Toshiaki Makita
5b8f3c8d30 virtio_net: Add XDP related stats
Add counters below:
* Tx
 - xdp_tx: frames sent by ndo_xdp_xmit or XDP_TX.
 - xdp_tx_drops: dropped frames out of xdp_tx ones.
* Rx
 - xdp_packets: frames went through xdp program.
 - xdp_tx: XDP_TX frames.
 - xdp_redirects: XDP_REDIRECT frames.
 - xdp_drops: any dropped frames out of xdp_packets ones.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:53:37 -07:00
Toshiaki Makita
2a43565c06 virtio_net: Factor out the logic to determine xdp sq
Make sure to use the same logic in all places to determine xdp sq. This
is useful for xdp counters which the following commit will introduce as
well.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:53:37 -07:00
Toshiaki Makita
2c4a2f7d82 virtio_net: Make drop counter per-queue
Since when XDP was introduced, drop counter has been able to be updated
much more frequently than before, as XDP_DROP increments the counter.
Thus for performance analysis per-queue drop counter would be useful.

Also this avoids cache contention and race on updating the counter. It
is currently racy because napi handlers read-modify-write it without any
locks.

There are more counters in dev->stats that are racy, but I left them
per-device, because they are rarely updated and does not worth being
per-queue counters IMHO. To fix them we need atomic ops or some kind of
locks.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:53:37 -07:00
Toshiaki Makita
a0929a44c2 virtio_net: Use temporary storage for accounting rx stats
The purpose is to keep receive_buf arguments simple when more per-queue
counter items are added later.
Also XDP_TX related sq counters will be updated in the following changes
so create a container struct virtnet_rx_stats which will includes both
rq and sq statistics. For now it only covers rq stats.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:53:37 -07:00
Toshiaki Makita
7d9d60fd4a virtio_net: Fix incosistent received bytes counter
When received packets are dropped in virtio_net driver, received packets
counter is incremented but bytes counter is not.
As a result, for instance if we drop all packets by XDP, only received
is counted and bytes stays 0, which looks inconsistent.
IMHO received packets/bytes should be counted if packets are produced by
the hypervisor, like what common NICs on physical machines are doing.
So fix the bytes counter.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 12:53:37 -07:00