Elan has given me a (GPL-ed) Android driver for their non HID-mt touchpads
to help improve the upstream support.
Acoording to Elan what we are currently reporting as tool-width
really is a per-touch pressure. This always has a maximum of 255, so there
is no need to make the max configurable.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We never report MT_TOUCH_MAJOR, so lets not claim that we do.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
max_area_x and max_area_y are initialized but never used anywhere,
drop them.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This patch fixes a bug where configfs_register_group had added
a group in a tree, and userspace has done a rmdir on a dir somewhere
above that group and we hit a kernel crash. The problem is configfs_rmdir
will detach everything under it and unlink groups on the default_groups
list. It will not unlink groups added with configfs_register_group so when
configfs_unregister_group is called to drop its references to the group/items
we crash when we try to access the freed dentrys.
The patch just adds a check for if a rmdir has been done above
us and if so just does the unlink part of unregistration.
Sorry if you are getting this multiple times. I thouhgt I sent
this to some of you and lkml, but I do not see it.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Since for ULP1 PM mode of SAMA5D2 the wakeup sources are limited and
well known add a method to check if these wakeup sources are defined by
user (either via DT or filesystem). In case there are no wakeup sources
defined for ULP1 the PM suspend will fail, otherwise these will be
configured in fast startup registers of PMC. Since wakeup sources of
ULP1 need also to be configured in SHDWC registers the code was a bit
changed to map the SHDWC also in case ULP1 is requested by user (this
was done in the initialization phase). In case the ULP1 initialization
fails the ULP0 mode is used (this mode was also used in case backup mode
initialization failed).
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
In the ULP1 mode, in order to achieve the lowest power consumption
with the system in retention mode and be able to resume on the wake
up events, all the clocks are shut off, inclusive the embedded 12MHz
RC oscillator, and the number of wake up sources is limited as well.
When the wake up event is asserted, the embedded 12MHz RC oscillator
restarts automatically.
The ULP1 (Ultra Low-power mode 1) is introduced by SAMA5D2.
The previous size of pm_suspend.o was 2148 bytes. With the addition of
ULP1 mode the new size of pm_suspend.o raised at 2456 bytes.
Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com>
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
[claudiu.beznea@microchip.com: aligned with 4.18-rc1]
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Switch to use ULP0 naming instead of slow clock naming for power modes, to
be as closed as possible to datasheet. This commit does the necessary
renaming and macro addition to be as close as possible to the namings
from [1].
[1] https://lore.kernel.org/lkml/1470650705-31418-3-git-send-email-wenyou.yang@atmel.com
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
The cooling device properties, like "#cooling-cells" and
"dynamic-power-coefficient", should either be present for all the CPUs
of a cluster or none. If these are present only for a subset of CPUs of
a cluster then things will start falling apart as soon as the CPUs are
brought online in a different order. For example, this will happen
because the operating system looks for such properties in the CPU node
it is trying to bring up, so that it can register a cooling device.
Add such missing properties.
Do minor rearrangement as well to keep ordering consistent.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
This has been unused since commit 44b460cfe5 ("drm: imx: remove struct
imx_drm_crtc and imx_drm_crtc_helper_funcs")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This patch unifies the naming of DRM functions for reference counting
of struct drm_device. The resulting code is more aligned with the rest
of the Linux kernel interfaces.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717084814.18091-1-tzimmermann@suse.de
Upper layer users of SPI device drivers may rely on 'actual_length',
so it is important that information is correctly reported. One such
example is spi_mem_exec_op() function that will fail if
'actual_length' of the data transferred is not what was requested. Add
necessary code to populate 'actual_length.
Cc: Mark Brown <broonie@kernel.org>
Cc: Sanchayan Maity <maitysanchayan@gmail.com>
Cc: Stefan Agner <stefan@agner.ch>
Cc: cphealy@gmail.com
Cc: linux-spi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
In commit ac0b4145d6 ("btrfs: scrub: Don't use inode pages for device
replace") we removed the branch of copy_nocow_pages() to avoid
corruption for compressed nodatasum extents.
However above commit only solves the problem in scrub_extent(), if
during scrub_pages() we failed to read some pages,
sctx->no_io_error_seen will be non-zero and we go to fixup function
scrub_handle_errored_block().
In scrub_handle_errored_block(), for sctx without csum (no matter if
we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
which does the similar thing with copy_nocow_pages(), but does it
without the extra check in copy_nocow_pages() routine.
So for test cases like btrfs/100, where we emulate read errors during
replace/scrub, we could corrupt compressed extent data again.
This patch will fix it just by avoiding any "optimization" for
nodatasum, just falls back to the normal fixup routine by try read from
any good copy.
This also solves WARN_ON() or dead lock caused by lame backref iteration
in scrub_fixup_nodatasum() routine.
The deadlock or WARN_ON() won't be triggered before commit ac0b4145d6
("btrfs: scrub: Don't use inode pages for device replace") since
copy_nocow_pages() have better locking and extra check for data extent,
and it's already doing the fixup work by try to read data from any good
copy, so it won't go scrub_fixup_nodatasum() anyway.
This patch disables the faulty code and will be removed completely in a
followup patch.
Fixes: ac0b4145d6 ("btrfs: scrub: Don't use inode pages for device replace")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The NULL pointer check in __free_irq() triggers a 'dereference before NULL
pointer check' warning in static code analysis. It turns out that the check
is redundant because all callers have a NULL pointer check already.
Remove it.
Signed-off-by: RAGHU Halharvi <raghuhack78@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180717102009.7708-1-raghuhack78@gmail.com
Update entry/exit latency and residency time of hikey960 to use more
realistic figures based on unitary tests done on the platform.
The complete results (in us) :
big cluster
cluster CPU
max entry latency 800 400
max exit latency 2900 550
residency 903Mhz 5000 1500
residency 2363Mhz 0 1500
little cluster
cluster CPU
max entry latency 500 400
max exit latency 1600 650
residency 533Mhz 8000 4500
residency 1844Mhz 0 1500
We can see that the residency time depends of the running OPP which is not
handled for now. Then we also have to take into account the constraint of
a residency time shorter than the tick to get full advantage of idle loop
reordering(tick is stopped if idle duration is higher than tick period).
Finally the selected residency value are :
big cluster
cluster CPU
residency 3700 1500
little cluster
cluster CPU
residency 3500 1500
A simple test with a task waking up every 11.111ms shows improvement:
- 5% a lowest OPP
- 22% at highest OPP
The period has been chosen:
- to be shorter than old cluster residency time and longer than new
residency time of cluster off C-state
- to prevent any sync with tick (4ms) when running tests that can add
some variances between tests
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
The recent change to add printf annotations to xmon inadvertently made
the disassembly output ugly, eg:
c00000002001e058 7ee00026 mfcr r23
c00000002001e05c fffffffffae101a0 std r23,416(r1)
c00000002001e060 fffffffff8230000 std r1,0(r3)
The problem being that negative 32-bit values are being displayed in
full 64-bits.
The printf conversion was actually correct, we are passing unsigned
long so it should use "lx". But powerpc instructions are only 4 bytes
and the code only reads 4 bytes, so inst should really just be
unsigned int, and that also fixes the printing to look the way we
want:
c00000002001e058 7ee00026 mfcr r23
c00000002001e05c fae101a0 std r23,416(r1)
c00000002001e060 f8230000 std r1,0(r3)
Fixes: e70d8f5526 ("powerpc/xmon: Add __printf annotation to xmon_printf()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove the keep-power-in-suspend property because it keeps wifi power
on during suspend. This property is only required when enabling WoWLAN
and should only be enabled based on need.
Signed-off-by: Ryan Grachek <ryan@edited.us>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Remove the keep-power-in-suspend property because it keeps wifi power
on during suspend. This property is only required when enabling WoWLAN
and should only be enabled based on need. Also remove dupplicate property
Signed-off-by: Ryan Grachek <ryan@edited.us>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Certain properties should be moved to the board file to reflect
the specific properties of the board, and not the SoC. Move these
properties to proper location and organize properties in both files.
Signed-off-by: Ryan Grachek <ryan@edited.us>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Add "rockchip,px30-spi", "rockchip,rk3066-spi" for spi on px30 platform.
Signed-off-by: Liang Chen <cl@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This patch unifies the naming of DRM functions for reference counting
of struct drm_device. The resulting code is more aligned with the rest
of the Linux kernel interfaces.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
In the huge pages tests, we may have lots of objects being trapped on
the freelist as we hold the struct_mutex allowing the free worker no
opportunity to recover the backing store. We also have stricter
requirements and the desire for large contiguous pages, further
increasing the allocation pressure. To reduce the chance of running out
of memory, we could either drop the mutex and flush the free worker, or
we could release the backing store directly. We do the latter in this
patch for simplicity.
References: https://bugs.freedesktop.org/show_bug.cgi?id=107254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717082334.18774-1-chris@chris-wilson.co.uk
On modern laptop, there are more and more platforms
have two GPUs, and each of them maybe have audio codec
for HDMP/DP output. For some dGPU which is no output,
audio codec usually is disabled.
In currect HDA audio driver, it will set all codec as
VGA_SWITCHEROO_DIS, the audio which is binded to UMA
will be suspended if user use debugfs to contorl power
In HDA driver side, it is difficult to know which GPU
the audio has binded to. So set the bound gpu pci dev
to vga_switcheroo.
if the audio client is not the third registration, audio
id will set in vga_switcheroo enable function. if the
audio client is the last registration when vga_switcheroo
_ready() get true, we should get audio client id from bound
GPU directly.
Signed-off-by: Jim Qu <Jim.Qu@amd.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
With a total of 20 non-merge commits, we have accumulated quite a few
fixes. These include lot's of fixes the our audio gadget interface, a
build error fix for PPC64 builds for the frescale PHY driver,
sleep-while-atomic fixes on the r8a66597 UDC driver, 3-stage SETUP fix
for the aspeed-vhub UDC and some other misc fixes.
-----BEGIN PGP SIGNATURE-----
iQJRBAABCgA7FiEElLzh7wn96CXwjh2IzL64meEamQYFAltNmBgdHGZlbGlwZS5i
YWxiaUBsaW51eC5pbnRlbC5jb20ACgkQzL64meEamQbp6hAAzShWRGLQowpnIau+
CwrI4cm5Gknz0BRCdgoXy+sofxof7WRJYnr+N/N7GlWHq5uZPpQv+br7aiYVxbQd
7YJfGUgfTS2FGGzcaoUP9S0VC9+U8UKfCKtCkuNhGABG9BGFKuS6R/QdavukxXQU
uBiAUWonxVJlSJ5xnQZQ0pt2ZeUir+qEP0NgpmFQe2tjL9bQNSVu3dCL9/5QHDuk
sGV4UnFSXyy86R7su0XViqpaFozYmhbQTqh5fy7V/Qmf7oaqRRs2hnTuNFt3I1a8
QHOsKkRQS7+4nc5CGWuWmR+6yzNZBjyHagC35KGOKgMvSPfo5Bn8e+eQmBOnfb98
nU8+Bh6FMg/U3JGmb4R0/ksIFzxfxipVODTvH6dPJzYmtvZsCuzItwX3TRnhpnQ9
B9Rg3tkL2BBr2fnCR5IlLlf6i1iSq798BSAokO6vlYa7LdSDRx16HM0oRUZE6d5X
Mg2Zx7SLWV94FYR7RTN05/d/NofRUXycxEgG9toWIGvlu2fTl/Bh3weRenOfAxIr
6N03aXsguexFqP/IMrsfT5Hucjl3UtpQBQxnQV5aJ7QT/np00aoUgaXTkMztNpUs
jceiTNOJawIc93w/1IWHwqcql2igD+R4FCKTpa+9GQ+4Ph5gJFDnXqo1lVmGqti9
kvLP5+uiqrXVu9dpOSFHFpucexM=
=w2qX
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v4.18-rc5
With a total of 20 non-merge commits, we have accumulated quite a few
fixes. These include lot's of fixes the our audio gadget interface, a
build error fix for PPC64 builds for the frescale PHY driver,
sleep-while-atomic fixes on the r8a66597 UDC driver, 3-stage SETUP fix
for the aspeed-vhub UDC and some other misc fixes.
After the commit acf137951367 ("pinctrl: core: Return selector to the
pinctrl driver") and the commit 47f1242d19c3 ("pinctrl: pinmux: Return
selector to the pinctrl driver"), it's necessary to add the fixes
needed for the pin controller drivers to use the appropriate returned
selector for a negative error number returned in case of the fail at
these functions. Otherwise, the driver would have a failed probe and
that causes boot message cannot correctly output and devices fail
to acquire their own pins.
Cc: Kevin Hilman <khilman@baylibre.com>
Fixes: acf137951367 ("pinctrl: core: Return selector to the pinctrl driver")
Fixes: 47f1242d19c3 ("pinctrl: pinmux: Return selector to the pinctrl driver")
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Add the missing unlock before return from function
in the error handling case.
Fixes: 0f5972033509 ("pinctrl: single: Fix group and function selector use")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
With no users left for these functions let's remove them.
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Christ van Willegen <cvwillegen@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Jacopo Mondi <jacopo+renesas@jmondi.org>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-By: H. Nikolaus Schaller <hns@goldelico.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
We must use a mutex around the generic_add functions and save the
function and group selector in case we need to remove them. Otherwise
the selector use will be racy for deferred probe at least.
Fixes: 5a49b644b3 ("pinctrl: Renesas RZ/A1 pin and gpio controller")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Christ van Willegen <cvwillegen@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Sean Wang <sean.wang@mediatek.com>
Acked-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-By: H. Nikolaus Schaller <hns@goldelico.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
We must use a mutex around the generic_add functions and save the
function and group selector in case we need to remove them. Otherwise
the selector use will be racy for deferred probe at least.
Note that struct device_node *np is unused in pcs_add_function() we
remove that too and fix a checkpatch warning for bare unsigned while
at it.
Fixes: 571aec4df5 ("pinctrl: single: Use generic pinmux helpers for
managing functions")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Christ van Willegen <cvwillegen@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Jacopo Mondi <jacopo+renesas@jmondi.org>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-By: H. Nikolaus Schaller <hns@goldelico.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
We must return the selector from pinmux_generic_add_function() so
pin controller device drivers can remove the right group if needed
for deferred probe for example. And we now must make sure that a
proper name is passed so we can use it to check if the entry already
exists.
Note that fixes are also needed for the pin controller drivers to
use the selector value.
Fixes: a76edc89b1 ("pinctrl: core: Add generic pinctrl functions for
managing groups")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Christ van Willegen <cvwillegen@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Jacopo Mondi <jacopo+renesas@jmondi.org>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-By: H. Nikolaus Schaller <hns@goldelico.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
While refactoring the routine before, it occurred to me that this will
make the code much easier to understand.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
We must return the selector from pinctrl_generic_add_group() so
pin controller device drivers can remove the right group if needed
for deferred probe for example. And we now must make sure that a
proper name is passed so we can use it to check if the entry already
exists.
Note that fixes are also needed for the pin controller drivers to
use the selector value.
Fixes: c7059c5ac7 ("pinctrl: core: Add generic pinctrl functions
for managing groups")
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Christ van Willegen <cvwillegen@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Jacopo Mondi <jacopo+renesas@jmondi.org>
Cc: Paul Cercueil <paul@crapouillou.net>
Cc: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-By: H. Nikolaus Schaller <hns@goldelico.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
To break out of recovery as early as possible, feed back the bus_free
logic state.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Some IP cores have an internal 'bus free' logic which may be more
advanced than just checking if SDA is high. Add a separate callback to
get this status. Filling it is optional.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
After exiting the while loop, we checked if recovery was successful and
sent a STOP to the clients. Meanwhile however, we send a STOP after
every pulse, so it is not needed after the loop. If we move the check
for a free bus to the end of the while loop, we can shorten and simplify
the logic. It is still ensured that at least one STOP will be sent to
the wire even if SDA was not stuck low.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
For bus recovery, we either need to bail out early if we can read SDA or
we need to send STOP after every pulse. Otherwise recovery might be
misinterpreted as an unwanted write. So, require one of those SDA
handling functions to avoid this problem.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black,
but with the following differences:
* Gigabit capable PHY
* Extra USB hub, optional i2c control
* lps3331ap barometer connected over i2c
* MPU6050 6 axis MEMS accelerometer/gyro connected over i2c
* 1GiB DDR3 RAM
* RTL8723 Wifi/Bluetooth connected over USB
Tested on a revision G board.
Signed-off-by: Koen Kooi <koen@dominion.thruhere.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The input clock of UART0 should be CLK_PERI_UART0_PD.
Fixes: 13f36c326c ("arm64: dts: mt7622: turn uart0 clock to real ones")
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Song Liu noticed switch_mm_irqs_off() taking a lot of CPU time in recent
kernels,using 1.8% of a 48 CPU system during a netperf to localhost run.
Digging into the profile, we noticed that cpumask_clear_cpu and
cpumask_set_cpu together take about half of the CPU time taken by
switch_mm_irqs_off().
However, the CPUs running netperf end up switching back and forth
between netperf and the idle task, which does not require changes
to the mm_cpumask. Furthermore, the init_mm cpumask ends up being
the most heavily contended one in the system.
Simply skipping changes to mm_cpumask(&init_mm) reduces overhead.
Reported-and-tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-8-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except
at page table freeing time, and idle CPUs will no longer get shootdown IPIs
for things like mprotect and madvise, we can always use lazy TLB mode.
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-7-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
CPUs in !is_lazy have either received TLB flush IPIs earlier on during
the munmap (when the user memory was unmapped), or have context switched
and reloaded during that stage of the munmap.
Page table free TLB flushes only need to be sent to CPUs in lazy TLB
mode, which TLB contents might not yet be up to date yet.
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-6-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lazy TLB mode can result in an idle CPU being woken up by a TLB flush,
when all it really needs to do is reload %CR3 at the next context switch,
assuming no page table pages got freed.
Memory ordering is used to prevent race conditions between switch_mm_irqs_off,
which checks whether .tlb_gen changed, and the TLB invalidation code, which
increments .tlb_gen whenever page table entries get invalidated.
The atomic increment in inc_mm_tlb_gen is its own barrier; the context
switch code adds an explicit barrier between reading tlbstate.is_lazy and
next->context.tlb_gen.
Unlike the 2016 version of this patch, CPUs with cpu_tlbstate.is_lazy set
are not removed from the mm_cpumask(mm), since that would prevent the TLB
flush IPIs at page table free time from being sent to all the CPUs
that need them.
This patch reduces total CPU use in the system by about 1-2% for a
memcache workload on two socket systems, and by about 1% for a heavily
multi-process netperf between two systems.
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-5-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move some code that will be needed for the lazy -> !lazy state
transition when a lazy TLB CPU has gotten out of date.
No functional changes, since the if (real_prev == next) branch
always returns.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20180716190337.26133-4-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy discovered that speculative memory accesses while in lazy
TLB mode can crash a system, when a CPU tries to dereference a
speculative access using memory contents that used to be valid
page table memory, but have since been reused for something else
and point into la-la land.
The latter problem can be prevented in two ways. The first is to
always send a TLB shootdown IPI to CPUs in lazy TLB mode, while
the second one is to only send the TLB shootdown at page table
freeing time.
The second should result in fewer IPIs, since operationgs like
mprotect and madvise are very common with some workloads, but
do not involve page table freeing. Also, on munmap, batching
of page table freeing covers much larger ranges of virtual
memory than the batching of unmapped user pages.
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-3-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The mm_struct always contains a cpumask bitmap, regardless of
CONFIG_CPUMASK_OFFSTACK. That means the first step can be to
simplify things, and simply have one bitmask at the end of the
mm_struct for the mm_cpumask.
This does necessitate moving everything else in mm_struct into
an anonymous sub-structure, which can be randomized when struct
randomization is enabled.
The second step is to determine the correct size for the
mm_struct slab object from the size of the mm_struct
(excluding the CPU bitmap) and the size the cpumask.
For init_mm we can simply allocate the maximum size this
kernel is compiled for, since we only have one init_mm
in the system, anyway.
Pointer magic by Mike Galbraith, to evade -Wstringop-overflow
getting confused by the dynamically sized array.
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-2-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>