linux-xiaomi-chiron/include
Tim Chen 2554db9165 sched/wait: Break up long wake list walk
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.

We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.

Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.

The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
extending the NMI watch dog timer there helped.

This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.

This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.

[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
  simply access to flags. ]

Reported-by: Kan Liang <kan.liang@intel.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-14 09:56:17 -07:00
..
acpi Device properties framework updates for v4.14-rc1 2017-09-05 12:50:00 -07:00
asm-generic MTD changes for 4.14: 2017-09-09 14:48:21 -07:00
clocksource
crypto crypto: hash - add crypto_(un)register_ahashes() 2017-08-22 14:54:52 +08:00
drm lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
dt-bindings The diff is dominated by the Allwinner A10/A20 SoCs getting converted to 2017-09-13 11:04:14 -07:00
keys net: rxrpc: Replace time_t type with time64_t type 2017-08-29 10:16:00 +01:00
kvm KVM: arm/arm64: PMU: Fix overflow interrupt injection 2017-07-25 14:18:01 +01:00
linux sched/wait: Break up long wake list walk 2017-09-14 09:56:17 -07:00
math-emu
media media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2017-09-08 11:35:55 -07:00
pcmcia
ras
rdma More RDMA work and some op-structure constification from Chuck Lever, 2017-09-09 13:31:49 -07:00
scsi SCSI misc on 20170913 2017-09-13 10:47:14 -07:00
soc ARM: SoC driver updates for v4.14 2017-09-10 20:40:00 -07:00
sound sound updates for 4.14-rc1 2017-09-07 12:44:53 -07:00
target iscsi-target: Fix iscsi_np reset hung task during parallel delete 2017-08-06 14:41:41 -07:00
trace Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-09-13 10:20:41 -07:00
uapi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2017-09-11 18:34:47 -07:00
video
xen xen: cleanup xen.h 2017-08-31 09:45:55 -04:00