Commit graph

916837 commits

Author SHA1 Message Date
Srinivas Pandruvada
f5205f4931 tools/power/x86/intel-speed-select: Make target CPU optional for core-power info
Currently "-c" is a mandatory option for "core-power info" command. Make
this optional as this is a per package/die property. When not specified,
it will print info for every package/die.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:46:19 +02:00
Srinivas Pandruvada
f0e0b4d17b tools/power/x86/intel-speed-select: Warn for invalid package id
When CPU is offline, we can't get package id. So print error for this
and don't use output.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:46:19 +02:00
Srinivas Pandruvada
ced2f5304d tools/power/x86/intel-speed-select: Fix last cpu number
Here topology_max_cpus is used for total CPU count, not the last CPU
number. So remove "-1".

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:46:01 +02:00
Srinivas Pandruvada
8ddbda7624 tools/power/x86/intel-speed-select: Fix mailbox usage for CLOS_PM_QOS_CONFIG
Even for the products using MMIO, this message needs to be sent via
mail box. The previous fix done for this didn't properly address this.
That fix simply removed sending command via MMIO, but still didn't
trigger sending via mailbox.

Add additional condition to check for CLOS_PM_QOS_CONFIG, when MMIO
is supported on a platform.

Fixes: cd0e637065 (tools/power/x86/intel-speed-select: Use mailbox for CLOS_PM_QOS_CONFIG)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:46:01 +02:00
Greg Kroah-Hartman
c23ff2aa3e interconnect changes for 5.7
Here is a pull request with interconnect changes for the 5.7-rc1 merge
 window. It contains just driver updates, and these are:
 
 - Refactoring of the SDM845 driver, which is now improved to better
 represent the hardware.
 - New driver for SC7180 platforms.
 - New driver for OSM L3 interconnect hardware found on SDM845/SC7180
 platforms.
 
 Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJedJ1QAAoJEIDQzArG2BZjHboP/2bxPoImL9MBsPY31qn4G8Uf
 RPHz5pgO0LMygFY+oskbY68H61qQclLkXG/KZ35OnK2vveWZDI3jF2Wims3P7Wjt
 bErrVzSL9wJGYR8efIamPsTK8OGFePGJgf/dKic63yXTPM/x1E+RGavUOXkW0hyo
 XNNfAJiC1Q3l+yAc8ZIEQ+Ys+KR93BYRi9IeAmp5/9CvC2UN3ZrlrC6989WZiBSY
 hHQXdOhcP/n258Nd1nlB2/0zdIar3PbdW0I3mlK7Fhfb04RwiSihQzs6vPJ2mc62
 aoMGdtrAVxd9sTPGAw++2eOsLx01aqTVK7N+aHGygD4buoz53XtOUe5j/wB/Pv5+
 fMtN3ddCoNAEwxD5hgR7iaAOGsEByl4JFdWOKMIXByiAAUoegIiAVx8Gv1qvT6Ma
 NZBuBYbgEW0AHLdSqZ1NDVOD6to0+81RICk3433TdHSbG8RMRRhBB66nW2c+c2qs
 9pD9SX78ax4AtNSTHbXB3D5NJO5ZkXX1keFErEHKY3psdumTa2mvDZYEcS31gJ7P
 E0WEZNac9QvvJtoBBhwhGBCHlYnIMK6C/rMrz+GFnoMwnJWgGjHYFSISoC+8RCVr
 SrHPED+G678JTruYdFOSxHCM3Gv80iSTLdCYPe3VxPLFqyqPO9Lmscl3woZfeoUI
 KOXOoLB0wOLocNLiUzNX
 =jHM0
 -----END PGP SIGNATURE-----

Merge tag 'icc-5.7-rc1' of https://git.linaro.org/people/georgi.djakov/linux into char-misc-next

Georgi writes:

interconnect changes for 5.7

Here is a pull request with interconnect changes for the 5.7-rc1 merge
window. It contains just driver updates, and these are:

- Refactoring of the SDM845 driver, which is now improved to better
represent the hardware.
- New driver for SC7180 platforms.
- New driver for OSM L3 interconnect hardware found on SDM845/SC7180
platforms.

Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>

* tag 'icc-5.7-rc1' of https://git.linaro.org/people/georgi.djakov/linux:
  interconnect: qcom: Add OSM L3 support on SC7180
  dt-bindings: interconnect: Add OSM L3 DT binding on SC7180
  interconnect: qcom: Add OSM L3 interconnect provider support
  dt-bindings: interconnect: Add OSM L3 DT bindings
  interconnect: qcom: Allow icc node to be used across icc providers
  interconnect: qcom: Add SC7180 interconnect provider driver
  dt-bindings: interconnect: Add Qualcomm SC7180 DT bindings
  interconnect: qcom: sdm845: Split qnodes into their respective NoCs
  interconnect: qcom: Consolidate interconnect RPMh support
  dt-bindings: interconnect: Update Qualcomm SDM845 DT bindings
  dt-bindings: interconnect: Add YAML schemas for QCOM bcm-voter
  dt-bindings: interconnect: Convert qcom,sdm845 to DT schema
2020-03-20 13:45:25 +01:00
Takashi Iwai
b40e288bfb platform/x86: sony-laptop: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:44:47 +02:00
Hans de Goede
1d6f8c5bac platform/x86: GPD pocket fan: Fix error message when temp-limits are out of range
Commit 1f27dbd826 ("platform/x86: GPD pocket fan: Allow somewhat
lower/higher temperature limits") changed the module-param sanity check
to accept temperature limits between 20 and 90 degrees celcius.

But the error message printed when the module params are outside this
range was not updated. This commit updates the error message to match
the new min and max value for the temp-limits.

Reported-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:44:47 +02:00
Srinivas Pandruvada
6cc8f65989 platform/x86: ISST: Fix wrong unregister type
The MMIO driver is not unregistering with the correct type with the ISST
common core during module removal. This should be unregistered with
ISST_IF_DEV_MMIO instead of ISST_IF_DEV_MBOX.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:44:47 +02:00
Leonid Maksymchuk
edeee341fd platform/x86: asus_wmi: Fix return value of fan_boost_mode_store
Function fan_boost_mode_store returns 0 if store is successful,
this leads to infinite loop after any write to it's sysfs entry:

# echo 0 >/sys/devices/platform/asus-nb-wmi/fan_boost_mode

This command never ends, one CPU core is at 100% utilization.
This patch fixes this by returning size of written data.

Fixes: b096f626a6 ("platform/x86: asus-wmi: Switch fan boost mode")
Signed-off-by: Leonid Maksymchuk <leonmaxx@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:44:46 +02:00
Kristian Klausen
6b3586d45b platform/x86: asus-wmi: Support laptops where the first battery is named BATT
The WMI method to set the charge threshold does not provide a
way to specific a battery, so we assume it is the first/primary
battery (by checking if the name is BAT0).
On some newer ASUS laptops (Zenbook UM431DA) though, the
primary/first battery isn't named BAT0 but BATT, so we need
to support that case.

Fixes: 7973353e92 ("platform/x86: asus-wmi: Refactor charge threshold to use the battery hooking API")
Cc: stable@vger.kernel.org
Signed-off-by: Kristian Klausen <kristian@klausen.dk>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-20 14:44:46 +02:00
Peter Zijlstra
f6f48e1804 lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions
nmi_enter() does lockdep_off() and hence lockdep ignores everything.

And NMI context makes it impossible to do full IN-NMI tracking like we
do IN-HARDIRQ, that could result in graph_lock recursion.

However, since look_up_lock_class() is lockless, we can find the class
of a lock that has prior use and detect IN-NMI after USED, just not
USED after IN-NMI.

NOTE: By shifting the lockdep_off() recursion count to bit-16, we can
easily differentiate between actual recursion and off.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lkml.kernel.org/r/20200221134215.090538203@infradead.org
2020-03-20 13:06:25 +01:00
Peter Zijlstra
248efb2158 locking/lockdep: Rework lockdep_lock
A few sites want to assert we own the graph_lock/lockdep_lock, provide
a more conventional lock interface for it with a number of trivial
debug checks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313102107.GX12561@hirez.programming.kicks-ass.net
2020-03-20 13:06:25 +01:00
Peter Zijlstra
10476e6304 locking/lockdep: Fix bad recursion pattern
There were two patterns for lockdep_recursion:

Pattern-A:
	if (current->lockdep_recursion)
		return

	current->lockdep_recursion = 1;
	/* do stuff */
	current->lockdep_recursion = 0;

Pattern-B:
	current->lockdep_recursion++;
	/* do stuff */
	current->lockdep_recursion--;

But a third pattern has emerged:

Pattern-C:
	current->lockdep_recursion = 1;
	/* do stuff */
	current->lockdep_recursion = 0;

And while this isn't broken per-se, it is highly dangerous because it
doesn't nest properly.

Get rid of all Pattern-C instances and shore up Pattern-A with a
warning.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313093325.GW12561@hirez.programming.kicks-ass.net
2020-03-20 13:06:25 +01:00
Boqun Feng
25016bd7f4 locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep
triggered a warning:

  [ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
  ...
  [ ] Call Trace:
  [ ]  lock_is_held_type+0x5d/0x150
  [ ]  ? rcu_lockdep_current_cpu_online+0x64/0x80
  [ ]  rcu_read_lock_any_held+0xac/0x100
  [ ]  ? rcu_read_lock_held+0xc0/0xc0
  [ ]  ? __slab_free+0x421/0x540
  [ ]  ? kasan_kmalloc+0x9/0x10
  [ ]  ? __kmalloc_node+0x1d7/0x320
  [ ]  ? kvmalloc_node+0x6f/0x80
  [ ]  __bfs+0x28a/0x3c0
  [ ]  ? class_equal+0x30/0x30
  [ ]  lockdep_count_forward_deps+0x11a/0x1a0

The warning got triggered because lockdep_count_forward_deps() call
__bfs() without current->lockdep_recursion being set, as a result
a lockdep internal function (__bfs()) is checked by lockdep, which is
unexpected, and the inconsistency between the irq-off state and the
state traced by lockdep caused the warning.

Apart from this warning, lockdep internal functions like __bfs() should
always be protected by current->lockdep_recursion to avoid potential
deadlocks and data inconsistency, therefore add the
current->lockdep_recursion on-and-off section to protect __bfs() in both
lockdep_count_forward_deps() and lockdep_count_backward_deps()

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com
2020-03-20 13:06:25 +01:00
Kan Liang
3442a9ecb8 perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box
The IMC uncore unit in Ice Lake server can only be accessed by MMIO,
which is similar as Snow Ridge.
Factor out __snr_uncore_mmio_init_box which can be shared with Ice Lake
server in the following patch.

No functional changes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1584470314-46657-2-git-send-email-kan.liang@linux.intel.com
2020-03-20 13:06:23 +01:00
Kan Liang
bc88a2fe21 perf/x86/intel/uncore: Add box_offsets for free-running counters
The offset between uncore boxes of free-running counters varies, e.g.
IIO free-running counters on Ice Lake server.

Add box_offsets, an array of offsets between adjacent uncore boxes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1584470314-46657-1-git-send-email-kan.liang@linux.intel.com
2020-03-20 13:06:23 +01:00
Dan Carpenter
a6763625ae perf/core: Fix reversed NULL check in perf_event_groups_less()
This NULL check is reversed so it leads to a Smatch warning and
presumably a NULL dereference.

    kernel/events/core.c:1598 perf_event_groups_less()
    error: we previously assumed 'right->cgrp->css.cgroup' could be null
	(see line 1590)

Fixes: 95ed6c707f ("perf/cgroup: Order events in RB tree by cgroup id")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312105637.GA8960@mwanda
2020-03-20 13:06:22 +01:00
Peter Zijlstra
90c91dfb86 perf/core: Fix endless multiplex timer
Kan and Andi reported that we fail to kill rotation when the flexible
events go empty, but the context does not. XXX moar

Fixes: fd7d55172d ("perf/cgroups: Don't rotate events for cgroups unnecessarily")
Reported-by: Andi Kleen <ak@linux.intel.com>
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200305123851.GX2596@hirez.programming.kicks-ass.net
2020-03-20 13:06:22 +01:00
Peter Zijlstra
d8a7386897 x86/optprobe: Fix OPTPROBE vs UACCESS
While looking at an objtool UACCESS warning, it suddenly occurred to me
that it is entirely possible to have an OPTPROBE right in the middle of
an UACCESS region.

In this case we must of course clear FLAGS.AC while running the KPROBE.
Luckily the trampoline already saves/restores [ER]FLAGS, so all we need
to do is inject a CLAC. Unfortunately we cannot use ALTERNATIVE() in the
trampoline text, so we have to frob that manually.

Fixes: ca0bbc70f147 ("sched/x86_64: Don't save flags on context switch")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20200305092130.GU2596@hirez.programming.kicks-ass.net
2020-03-20 13:06:22 +01:00
Tao Zhou
6c8116c914 sched/fair: Fix condition of avg_load calculation
In update_sg_wakeup_stats(), the comment says:

Computing avg_load makes sense only when group is fully
busy or overloaded.

But, the code below this comment does not check like this.

From reading the code about avg_load in other functions, I
confirm that avg_load should be calculated in fully busy or
overloaded case. The comment is correct and the checking
condition is wrong. So, change that condition.

Fixes: 57abff067a ("sched/fair: Rework find_idlest_group()")
Signed-off-by: Tao Zhou <ouwen210@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/Message-ID:
2020-03-20 13:06:20 +01:00
Qais Yousef
e94f80f6c4 sched/rt: cpupri_find: Trigger a full search as fallback
If we failed to find a fitting CPU, in cpupri_find(), we only fallback
to the level we found a hit at.

But Steve suggested to fallback to a second full scan instead as this
could be a better effort.

	https://lore.kernel.org/lkml/20200304135404.146c56eb@gandalf.local.home/

We trigger the 2nd search unconditionally since the argument about
triggering a full search is that the recorded fall back level might have
become empty by then. Which means storing any data about what happened
would be meaningless and stale.

I had a humble try at timing it and it seemed okay for the small 6 CPUs
system I was running on

	https://lore.kernel.org/lkml/20200305124324.42x6ehjxbnjkklnh@e107158-lin.cambridge.arm.com/

On large system this second full scan could be expensive. But there are
no users outside capacity awareness for this fitness function at the
moment. Heterogeneous systems tend to be small with 8cores in total.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20200310142219.syxzn5ljpdxqtbgx@e107158-lin.cambridge.arm.com
2020-03-20 13:06:20 +01:00
Liang Chen
26c7295be0 kthread: Do not preempt current task if it is going to call schedule()
when we create a kthread with ktrhead_create_on_cpu(),the child thread
entry is ktread.c:ktrhead() which will be preempted by the parent after
call complete(done) while schedule() is not called yet,then the parent
will call wait_task_inactive(child) but the child is still on the runqueue,
so the parent will schedule_hrtimeout() for 1 jiffy,it will waste a lot of
time,especially on startup.

  parent                             child
ktrhead_create_on_cpu()
  wait_fo_completion(&done) -----> ktread.c:ktrhead()
                             |----- complete(done);--wakeup and preempted by parent
 kthread_bind() <------------|  |-> schedule();--dequeue here
  wait_task_inactive(child)     |
   schedule_hrtimeout(1 jiffy) -|

So we hope the child just wakeup parent but not preempted by parent, and the
child is going to call schedule() soon,then the parent will not call
schedule_hrtimeout(1 jiffy) as the child is already dequeue.

The same issue for ktrhead_park()&&kthread_parkme().
This patch can save 120ms on rk312x startup with CONFIG_HZ=300.

Signed-off-by: Liang Chen <cl@rock-chips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20200306070133.18335-2-cl@rock-chips.com
2020-03-20 13:06:20 +01:00
Vincent Guittot
c32b430829 sched/fair: Improve spreading of utilization
During load_balancing, a group with spare capacity will try to pull some
utilizations from an overloaded group. In such case, the load balance
looks for the runqueue with the highest utilization. Nevertheless, it
should also ensure that there are some pending tasks to pull otherwise
the load balance will fail to pull a task and the spread of the load will
be delayed.

This situation is quite transient but it's possible to highlight the
effect with a short run of sysbench test so the time to spread task impacts
the global result significantly.

Below are the average results for 15 iterations on an arm64 octo core:
sysbench --test=cpu --num-threads=8  --max-requests=1000 run

                           tip/sched/core  +patchset
total time:                172ms           158ms
per-request statistics:
         avg:                1.337ms         1.244ms
         max:               21.191ms        10.753ms

The average max doesn't fully reflect the wide spread of the value which
ranges from 1.350ms to more than 41ms for the tip/sched/core and from
1.350ms to 21ms with the patch.

Other factors like waiting for an idle load balance or cache hotness
can delay the spreading of the tasks which explains why we can still
have up to 21ms with the patch.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312165429.990-1-vincent.guittot@linaro.org
2020-03-20 13:06:20 +01:00
Michael Wang
26cf52229e sched: Avoid scale real weight down to zero
During our testing, we found a case that shares no longer
working correctly, the cgroup topology is like:

  /sys/fs/cgroup/cpu/A		(shares=102400)
  /sys/fs/cgroup/cpu/A/B	(shares=2)
  /sys/fs/cgroup/cpu/A/B/C	(shares=1024)

  /sys/fs/cgroup/cpu/D		(shares=1024)
  /sys/fs/cgroup/cpu/D/E	(shares=1024)
  /sys/fs/cgroup/cpu/D/E/F	(shares=1024)

The same benchmark is running in group C & F, no other tasks are
running, the benchmark is capable to consumed all the CPUs.

We suppose the group C will win more CPU resources since it could
enjoy all the shares of group A, but it's F who wins much more.

The reason is because we have group B with shares as 2, since
A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
so A->cfs_rq.load.weight become very small.

And in calc_group_shares() we calculate shares as:

  load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
  shares = (tg_shares * load) / tg_weight;

Since the 'cfs_rq->load.weight' is too small, the load become 0
after scale down, although 'tg_shares' is 102400, shares of the se
which stand for group A on root cfs_rq become 2.

While the se of D on root cfs_rq is far more bigger than 2, so it
wins the battle.

Thus when scale_load_down() scale real weight down to 0, it's no
longer telling the real story, the caller will have the wrong
information and the calculation will be buggy.

This patch add check in scale_load_down(), so the real weight will
be >= MIN_SHARES after scale, after applied the group C wins as
expected.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com
2020-03-20 13:06:19 +01:00
Yafang Shao
1066d1b697 psi: Move PF_MEMSTALL out of task->flags
The task->flags is a 32-bits flag, in which 31 bits have already been
consumed. So it is hardly to introduce other new per process flag.
Currently there're still enough spaces in the bit-field section of
task_struct, so we can define the memstall state as a single bit in
task_struct instead.
This patch also removes an out-of-date comment pointed by Matthew.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/1584408485-1921-1-git-send-email-laoar.shao@gmail.com
2020-03-20 13:06:19 +01:00
Johannes Weiner
a0fe6ba690 MAINTAINERS: Add maintenance information for psi
Add a maintainer section for psi, as it's a user-visible, configurable
kernel feature.

The patches are still routed through the scheduler tree due to the
close integration with that code, but get_maintainers.pl does the
right thing and makes sure everybody gets CCd:

$ ./scripts/get_maintainer.pl -f kernel/sched/psi.c
Johannes Weiner <hannes@cmpxchg.org> (maintainer:PRESSURE STALL INFORMATION (PSI))
Ingo Molnar <mingo@redhat.com> (maintainer:SCHEDULER)
Peter Zijlstra <peterz@infradead.org> (maintainer:SCHEDULER)
...

Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200316191333.115523-4-hannes@cmpxchg.org
2020-03-20 13:06:19 +01:00
Johannes Weiner
36b238d571 psi: Optimize switching tasks inside shared cgroups
When switching tasks running on a CPU, the psi state of a cgroup
containing both of these tasks does not change. Right now, we don't
exploit that, and can perform many unnecessary state changes in nested
hierarchies, especially when most activity comes from one leaf cgroup.

This patch implements an optimization where we only update cgroups
whose state actually changes during a task switch. These are all
cgroups that contain one task but not the other, up to the first
shared ancestor. When both tasks are in the same group, we don't need
to update anything at all.

We can identify the first shared ancestor by walking the groups of the
incoming task until we see TSK_ONCPU set on the local CPU; that's the
first group that also contains the outgoing task.

The new psi_task_switch() is similar to psi_task_change(). To allow
code reuse, move the task flag maintenance code into a new function
and the poll/avg worker wakeups into the shared psi_group_change().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200316191333.115523-3-hannes@cmpxchg.org
2020-03-20 13:06:19 +01:00
Johannes Weiner
b05e75d611 psi: Fix cpu.pressure for cpu.max and competing cgroups
For simplicity, cpu pressure is defined as having more than one
runnable task on a given CPU. This works on the system-level, but it
has limitations in a cgrouped reality: When cpu.max is in use, it
doesn't capture the time in which a task is not executing on the CPU
due to throttling. Likewise, it doesn't capture the time in which a
competing cgroup is occupying the CPU - meaning it only reflects
cgroup-internal competitive pressure, not outside pressure.

Enable tracking of currently executing tasks, and then change the
definition of cpu pressure in a cgroup from

	NR_RUNNING > 1

to

	NR_RUNNING > ON_CPU

which will capture the effects of cpu.max as well as competition from
outside the cgroup.

After this patch, a cgroup running `stress -c 1` with a cpu.max
setting of 5000 10000 shows ~50% continuous CPU pressure.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200316191333.115523-2-hannes@cmpxchg.org
2020-03-20 13:06:18 +01:00
Paul Turner
46a87b3851 sched/core: Distribute tasks within affinity masks
Currently, when updating the affinity of tasks via either cpusets.cpus,
or, sched_setaffinity(); tasks not currently running within the newly
specified mask will be arbitrarily assigned to the first CPU within the
mask.

This (particularly in the case that we are restricting masks) can
result in many tasks being assigned to the first CPUs of their new
masks.

This:
 1) Can induce scheduling delays while the load-balancer has a chance to
    spread them between their new CPUs.
 2) Can antogonize a poor load-balancer behavior where it has a
    difficult time recognizing that a cross-socket imbalance has been
    forced by an affinity mask.

This change adds a new cpumask interface to allow iterated calls to
distribute within the intersection of the provided masks.

The cases that this mainly affects are:
 - modifying cpuset.cpus
 - when tasks join a cpuset
 - when modifying a task's affinity via sched_setaffinity(2)

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Tested-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lkml.kernel.org/r/20200311010113.136465-1-joshdon@google.com
2020-03-20 13:06:18 +01:00
Vincent Guittot
fe61468b2c sched/fair: Fix enqueue_task_fair warning
When a cfs rq is throttled, the latter and its child are removed from the
leaf list but their nr_running is not changed which includes staying higher
than 1. When a task is enqueued in this throttled branch, the cfs rqs must
be added back in order to ensure correct ordering in the list but this can
only happens if nr_running == 1.
When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq()
when enqueuing an entity to make sure that the complete branch will be
added.

Similarly unthrottle_cfs_rq() can stop adding cfs in the list when a parent
is throttled. Iterate the remaining entity to ensure that the complete
branch will be added in the list.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org #v5.1+
Link: https://lkml.kernel.org/r/20200306135257.25044-1-vincent.guittot@linaro.org
2020-03-20 13:06:18 +01:00
Yuantian Tang
fd96a316d2 dt-bindings: thermal: make cooling-maps property optional
Cooling-maps doesn't have to be a required property because there may
be no cooling device on system, or there are no enough cooling devices for
each thermal zone in multiple thermal zone cases since cooling devices
can't be shared.
So make this property optional to remove such limitations.

For thermal zones with no cooling-maps, there could be critic trips
that can trigger CPU reset or shutdown. So they still can take actions.

Signed-off-by: Yuantian Tang <andy.tang@nxp.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200309045411.21859-1-andy.tang@nxp.com
2020-03-20 12:17:48 +01:00
Rob Herring
01c354e2ec dt-bindings: thermal: qcom-tsens: Remove redundant 'maxItems'
There's no need to specify 'maxItems' with the same value as the number
of entries in 'items'. A meta-schema update will catch future cases.

Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Amit Kucheria <amit.kucheria@linaro.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200313214552.845-2-robh@kernel.org
2020-03-20 12:17:48 +01:00
Rob Herring
8698977867 dt-bindings: thermal: sprd: Remove redundant 'maxItems'
There's no need to specify 'maxItems' with the same value as the number
of entries in 'items'. A meta-schema update will catch future cases.

Cc: Orson Zhai <orsonzhai@gmail.com>
Cc: Baolin Wang <baolin.wang7@gmail.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: linux-pm@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang7@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200313214552.845-1-robh@kernel.org
2020-03-20 12:17:48 +01:00
Anson Huang
9db11010f2 thermal: imx: Calling imx_thermal_unregister_legacy_cooling() in .remove
imx_thermal_unregister_legacy_cooling() should be used for handling
legacy cpufreq cooling cleanups in .remove callback instead of
calling cpufreq_cooling_unregister() and cpufreq_cpu_put() directly,
especially for !CONFIG_CPU_FREQ scenario, no operation needed for
handling legacy cpufreq cooling cleanups at all.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1584088094-24857-1-git-send-email-Anson.Huang@nxp.com
2020-03-20 12:17:48 +01:00
Anson Huang
ce68eeca8f thermal: qoriq: Sort includes alphabetically
Sort includes alphabetically for consistency, and take this chance
to remove unused include of of_address.h.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1583903252-2058-2-git-send-email-Anson.Huang@nxp.com
2020-03-20 12:17:48 +01:00
Anson Huang
85f0b61a6b thermal: qoriq: Use devm_add_action_or_reset() to handle all cleanups
Use devm_add_action_or_reset() to handle all cleanups of failure in
.probe and .remove, then .remove callback can be dropped.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1583903252-2058-1-git-send-email-Anson.Huang@nxp.com
2020-03-20 12:17:48 +01:00
Niklas Söderlund
0fa0420207 thermal: rcar_thermal: Remove lock in rcar_thermal_get_current_temp()
With the ctemp value returned instead of cached in the private data
structure their is no need to take the lock when translating ctemp into
a temperature.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200310170029.1648996-4-niklas.soderlund+renesas@ragnatech.se
2020-03-20 12:17:48 +01:00
Niklas Söderlund
57ed737f16 thermal: rcar_thermal: Do not store ctemp in rcar_thermal_priv
There is no need to cache the ctemp value in the private data structure
as it's always prefetched before it's used. Remove it from the structure
and have rcar_thermal_update_temp return the value instead of storing
it.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200310170029.1648996-3-niklas.soderlund+renesas@ragnatech.se
2020-03-20 12:17:48 +01:00
Niklas Söderlund
7617e771c1 thermal: rcar_thermal: Always update thermal zone on interrupt
Since commit a1ade56538 ("thermal: rcar: check every
rcar_thermal_update_temp() return value") the temperature is always read
in rcar_thermal_get_current_temp() so comparing it before and after
enabling interrupts have little effect. Remove the check and always
update the thermal zone when we get an interrupt that the temperature
have changed.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200310170029.1648996-2-niklas.soderlund+renesas@ragnatech.se
2020-03-20 12:17:48 +01:00
Amit Kucheria
8d3a6d4f43 drivers: thermal: tsens: Remove unnecessary irq flag
IRQF_TRIGGER_HIGH is already specified through devicetree interrupts
property. Remove it from code.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/8ac92e45b65fe411f4aaf70dcde4e7e7c3169b2d.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Amit Kucheria
348596969d drivers: thermal: tsens: kernel-doc fixup
Document ul_lock, threshold and control structure members and make
the following kernel-doc invocation happy:

$ scripts/kernel-doc -v -none drivers/thermal/qcom/*

drivers/thermal/qcom/qcom-spmi-temp-alarm.c:105: info: Scanning doc for qpnp_tm_get_temp_stage
drivers/thermal/qcom/tsens-common.c:18: info: Scanning doc for struct tsens_irq_data
drivers/thermal/qcom/tsens-common.c:130: info: Scanning doc for tsens_hw_to_mC
drivers/thermal/qcom/tsens-common.c:163: info: Scanning doc for tsens_mC_to_hw
drivers/thermal/qcom/tsens-common.c:245: info: Scanning doc for tsens_set_interrupt
drivers/thermal/qcom/tsens-common.c:268: info: Scanning doc for tsens_threshold_violated
drivers/thermal/qcom/tsens-common.c:362: info: Scanning doc for tsens_critical_irq_thread
drivers/thermal/qcom/tsens-common.c:438: info: Scanning doc for tsens_irq_thread
drivers/thermal/qcom/tsens.h:41: info: Scanning doc for struct tsens_sensor
drivers/thermal/qcom/tsens.h:59: info: Scanning doc for struct tsens_ops
drivers/thermal/qcom/tsens.h:494: info: Scanning doc for struct tsens_features
drivers/thermal/qcom/tsens.h:513: info: Scanning doc for struct tsens_plat_data
drivers/thermal/qcom/tsens.h:529: info: Scanning doc for struct tsens_context

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/7ea9c9ead90a91205a3f1717c0c86db9a51780ce.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Amit Kucheria
d22066c1af drivers: thermal: tsens: Add watchdog support
TSENS IP v2.3 onwards adds support for a watchdog to detect if the TSENS
HW FSM is stuck. Add support to detect and restart the FSM in the
driver. The watchdog is configured by the bootloader, we just enable the
watchdog bark as a debug feature in the kernel.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/a314747664a065db592ad77da7beae68128a5b6e.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Amit Kucheria
79125e03db drivers: thermal: tsens: Add critical interrupt support
TSENS IP v2.x adds critical threshold interrupt support for each sensor
in addition to the upper/lower threshold interrupt. Add support in the
driver.

While the critical interrupts themselves aren't currently used by Linux,
the HW line is also used by the TSENS watchdog. So this patch acts as
infrastructure to enable watchdog functionality for the TSENS IP.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/51b22461d4b5f85a817274568459db4579fd4298.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Amit Kucheria
f22a3bf0d2 drivers: thermal: tsens: Release device in success path
We don't currently call put_device in case of successfully initialising
the device. So we hold the reference and keep the device pinned forever.

Allow control to fall through so we can use same code for success and
error paths to put_device.

As a part of this fixup, change devm_ioremap_resource to act on the same
device pointer as that used to allocate regmap memory. That ensures that
we are free to release op->dev after examining its resources.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/d3996667e9f976bb30e97e301585cb1023be422e.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Amit Kucheria
c1c6f3b39c drivers: thermal: tsens: use simpler variables
We already dereference the sensor and save it into a variable. Use the
variable directly to make the code easier to read.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/5dc4356edfb8dffa377fb561359bf41a6f1fdf17.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Amit Kucheria
e604bdd2a7 drivers: thermal: tsens: Pass around struct tsens_sensor as a constant
All the sensor data is initialised at init time. Lock it down by passing
it to functions as a constant.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/740f9254484c08d65869df578628eb523c0049ff.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Amit Kucheria
0aef1ee5af drivers: thermal: tsens: De-constify struct tsens_features
struct tsens_features is currently initialized as part of platform data
at compile-time and not modifiable. We now have some usecases in feature
detection across IP versions where it is more flexible to update the
features after probing registers.

Remove const qualifier from tsens_features and the encapsulating
tsens_plat_data.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/2919a72220470366ae11e0bb5330a4ea39838f71.1584015867.git.amit.kucheria@linaro.org
2020-03-20 12:17:48 +01:00
Niklas Söderlund
39056e8a98 thermal: rcar_thermal: Handle probe error gracefully
If the common register memory resource is not available the driver needs
to fail gracefully to disable PM. Instead of returning the error
directly store it in ret and use the already existing error path.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200310114709.1483860-1-niklas.soderlund+renesas@ragnatech.se
2020-03-20 12:17:48 +01:00
Anson Huang
a9d8e61b93 thermal: imx: Remove unused includes
Remove unused includes to simplify the code.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1583762668-12099-1-git-send-email-Anson.Huang@nxp.com
2020-03-20 12:17:48 +01:00
Geert Uytterhoeven
8d74bf79df thermal: rcar_gen3_thermal: Add r8a77961 support
Add support for the Thermal Sensor/Chip Internal Voltage Monitor in the
R-Car M3-W+ (R8A77961) SoC.

According to the R-Car Gen3 Hardware Manual Errata for Revision 2.00 of
Jan 31, 2020, the thermal parameters for R-Car M3-W+ are the same as for
R-Car M3-W.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200306105503.24267-3-geert+renesas@glider.be
2020-03-20 12:17:48 +01:00