linux-xiaomi-chiron

History

Peter Zijlstra 94aafc3ee3 sched/fair: Age the average idle time This is a partial forward-port of Peter Ziljstra's work first posted at: https://lore.kernel.org/lkml/20180530142236.667774973@infradead.org/ Currently select_idle_cpu()'s proportional scheme uses the average idle time for when we are idle, that is temporally challenged. When a CPU is not at all idle, we'll happily continue using whatever value we did see when the CPU goes idle. To fix this, introduce a separate average idle and age it (the existing value still makes sense for things like new-idle balancing, which happens when we do go idle). The overall goal is to not spend more time scanning for idle CPUs than we're idle for. Otherwise we're inhibiting work. This means that we need to consider the cost over all the wake-ups between consecutive idle periods. To track this, the scan cost is subtracted from the estimated average idle time. The impact of this patch is related to workloads that have domains that are fully busy or overloaded. Without the patch, the scan depth may be too high because a CPU is not reaching idle. Due to the nature of the patch, this is a regression magnet. It potentially wins when domains are almost fully busy or overloaded -- at that point searches are likely to fail but idle is not being aged as CPUs are active so search depth is too large and useless. It will potentially show regressions when there are idle CPUs and a deep search is beneficial. This tbench result on a 2-socket broadwell machine partially illustates the problem 5.13.0-rc2 5.13.0-rc2 vanilla sched-avgidle-v1r5 Hmean 1 445.02 ( 0.00%) 451.36 * 1.42%* Hmean 2 830.69 ( 0.00%) 846.03 * 1.85%* Hmean 4 1350.80 ( 0.00%) 1505.56 * 11.46%* Hmean 8 2888.88 ( 0.00%) 2586.40 * -10.47%* Hmean 16 5248.18 ( 0.00%) 5305.26 * 1.09%* Hmean 32 8914.03 ( 0.00%) 9191.35 * 3.11%* Hmean 64 10663.10 ( 0.00%) 10192.65 * -4.41%* Hmean 128 18043.89 ( 0.00%) 18478.92 * 2.41%* Hmean 256 16530.89 ( 0.00%) 17637.16 * 6.69%* Hmean 320 16451.13 ( 0.00%) 17270.97 * 4.98%* Note that 8 was a regression point where a deeper search would have helped but it gains for high thread counts when searches are useless. Hackbench is a more extreme example although not perfect as the tasks idle rapidly hackbench-process-pipes 5.13.0-rc2 5.13.0-rc2 vanilla sched-avgidle-v1r5 Amean 1 0.3950 ( 0.00%) 0.3887 ( 1.60%) Amean 4 0.9450 ( 0.00%) 0.9677 ( -2.40%) Amean 7 1.4737 ( 0.00%) 1.4890 ( -1.04%) Amean 12 2.3507 ( 0.00%) 2.3360 * 0.62%* Amean 21 4.0807 ( 0.00%) 4.0993 * -0.46%* Amean 30 5.6820 ( 0.00%) 5.7510 * -1.21%* Amean 48 8.7913 ( 0.00%) 8.7383 ( 0.60%) Amean 79 14.3880 ( 0.00%) 13.9343 * 3.15%* Amean 110 21.2233 ( 0.00%) 19.4263 * 8.47%* Amean 141 28.2930 ( 0.00%) 25.1003 * 11.28%* Amean 172 34.7570 ( 0.00%) 30.7527 * 11.52%* Amean 203 41.0083 ( 0.00%) 36.4267 * 11.17%* Amean 234 47.7133 ( 0.00%) 42.0623 * 11.84%* Amean 265 53.0353 ( 0.00%) 47.7720 * 9.92%* Amean 296 60.0170 ( 0.00%) 53.4273 * 10.98%* Stddev 1 0.0052 ( 0.00%) 0.0025 ( 51.57%) Stddev 4 0.0357 ( 0.00%) 0.0370 ( -3.75%) Stddev 7 0.0190 ( 0.00%) 0.0298 ( -56.64%) Stddev 12 0.0064 ( 0.00%) 0.0095 ( -48.38%) Stddev 21 0.0065 ( 0.00%) 0.0097 ( -49.28%) Stddev 30 0.0185 ( 0.00%) 0.0295 ( -59.54%) Stddev 48 0.0559 ( 0.00%) 0.0168 ( 69.92%) Stddev 79 0.1559 ( 0.00%) 0.0278 ( 82.17%) Stddev 110 1.1728 ( 0.00%) 0.0532 ( 95.47%) Stddev 141 0.7867 ( 0.00%) 0.0968 ( 87.69%) Stddev 172 1.0255 ( 0.00%) 0.0420 ( 95.91%) Stddev 203 0.8106 ( 0.00%) 0.1384 ( 82.92%) Stddev 234 1.1949 ( 0.00%) 0.1328 ( 88.89%) Stddev 265 0.9231 ( 0.00%) 0.0820 ( 91.11%) Stddev 296 1.0456 ( 0.00%) 0.1327 ( 87.31%) Again, higher thread counts benefit and the standard deviation shows that results are also a lot more stable when the idle time is aged. The patch potentially matters when a socket was multiple LLCs as the maximum search depth is lower. However, some of the test results were suspiciously good (e.g. specjbb2005 gaining 50% on a Zen1 machine) and other results were not dramatically different to other mcahines. Given the nature of the patch, Peter's full series is not being forward ported as each part should stand on its own. Preferably they would be merged at different times to reduce the risk of false bisections. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210615111611.GH30378@techsingularity.net		2021-06-17 14:11:44 +02:00
..
autogroup.c	sched/autogroup: Make autogroup_path() always available	2019-06-24 19:23:40 +02:00
autogroup.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
clock.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
completion.c	completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()	2020-03-23 18:40:25 +01:00
core.c	sched/fair: Age the average idle time	2021-06-17 14:11:44 +02:00
core_sched.c	sched: prctl() core-scheduling interface	2021-05-12 11:43:31 +02:00
cpuacct.c	sched: Wrap rq::lock access	2021-05-12 11:43:26 +02:00
cpudeadline.c	sched,rt: Use the full cpumask for balancing	2020-11-10 18:39:00 +01:00
cpudeadline.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpufreq.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-12 17:59:43 +01:00
cpufreq_schedutil.c	sched/cpufreq: Consider reduced CPU capacity in energy calculation	2021-06-17 14:11:43 +02:00
cpupri.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
cpupri.h	sched/cpupri: Add CPUPRI_HIGHER	2020-10-29 11:00:30 +01:00
cputime.c	Scheduler updates for this cycle are:	2021-04-28 13:33:57 -07:00
deadline.c	sched: Introduce sched_class::pick_task()	2021-05-12 11:43:28 +02:00
debug.c	sched: Wrap rq::lock access	2021-05-12 11:43:26 +02:00
fair.c	sched/fair: Age the average idle time	2021-06-17 14:11:44 +02:00
features.h	sched: Warn on long periods of pending need_resched	2021-04-21 13:55:41 +02:00
idle.c	sched: Trivial forced-newidle balancer	2021-05-12 11:43:30 +02:00
isolation.c	sched/isolation: Reconcile rcu_nocbs= and nohz_full=	2021-05-13 14:12:47 +02:00
loadavg.c	sched: Make multiple runqueue task counters 32-bit	2021-05-12 21:34:17 +02:00
Makefile	sched: Trivial core scheduling cookie management	2021-05-12 11:43:31 +02:00
membarrier.c	sched/membarrier: fix missing local execution of ipi_sync_rq_state()	2021-03-06 12:40:21 +01:00
pelt.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
pelt.h	sched: Wrap rq::lock access	2021-05-12 11:43:26 +02:00
psi.c	psi: Fix psi state corruption when schedule() races with cgroup move	2021-05-06 15:33:26 +02:00
rt.c	sched: Introduce sched_class::pick_task()	2021-05-12 11:43:28 +02:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	sched/fair: Age the average idle time	2021-06-17 14:11:44 +02:00
smp.h	sched/headers: Split out open-coded prototypes into kernel/sched/smp.h	2020-05-28 11:03:20 +02:00
stats.c	sched: Fix various typos	2021-03-22 00:11:52 +01:00
stats.h	sched,stats: Further simplify sched_info	2021-05-18 12:53:53 +02:00
stop_task.c	sched: Introduce sched_class::pick_task()	2021-05-12 11:43:28 +02:00
swait.c	sched/swait: Prepare usage in completions	2020-03-21 16:00:23 +01:00
topology.c	sched: Wrap rq::lock access	2021-05-12 11:43:26 +02:00
wait.c	sched/wait: Add add_wait_queue_priority()	2020-11-15 09:49:09 -05:00
wait_bit.c	sched/wait: fix ___wait_var_event(exclusive)	2019-12-17 13:32:50 +01:00