Thanks to commit f91eb62f71 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.
With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function. This gets called in early as a
result of the time_init() call. Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:
WARNING: at init/main.c:560 start_kernel+0x250/0x410()
Interrupts were enabled early
CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
Call Trace:
show_stack+0x68/0x80
warn_slowpath_common+0x78/0xb0
warn_slowpath_fmt+0x38/0x48
start_kernel+0x250/0x410
Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore. Because we
need a flags variable, make it a static inline to avoid name space
issues.
[ Change from v1: Convert on_each_cpu to a static inline function, add
#include <linux/irqflags.h> to avoid build breakage on some files.
on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
on_each_cpu(), but they are not causing !SMP bugs for me, so I will
defer changing them to a less urgent patch. ]
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DT support to GPIO R-Car driver by Laurent Pinchart.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuWcSAAoJENfPZGlqN0++zVAP/i+sSeyv8MYVov8AVhZcPmIf
B+u+EPMn//e7ILsRquMw8TIf4NnJDDbafs6oCSSL6/0+dAHQX/6BbnlyX2O/7YFa
vd24gn/NzPNVz6+N8q5EqUT1OKIQMdusU0oK/w2Bg2D/U1OsfjCalcbISjoIJqfw
jqCRliC7n8Qt1OhMCVvUY8wmxcZYbwTsh3zRpVac7RjaWEt3+Qf8Ms1fUbV8A2ZG
ZI+JahxbjuAsQe7Ygp/yZYhdVN7PRatszESEXbx0236hLYUSMqT9TPYUIBOlCmaN
nP3qwpnEJJPsnFd3knnyle5rBrDUMwqBxUNEkBCRcDfvmM3FMgTnnqWGq5yBFj6W
Y4ZxADLcuntpFbiHDIy0a0yFAzNnvFJoYniDx5mvTOcmC1kK/dDHA3fNcni60fiK
KCuvCdwsB+1ZNqsQYMVWingj2DGm0KBZHpy9doPXRCS6H9aclk6DfCFAFpIs+9It
NMt8Bqz5nqAWYFFm/TF0UomFAvLCRaRnzeSGWEcdffkHC9/3qatdrbnLhvafI1fo
SMpAqDGwr1oC15lvbts1jzSs92uSYdh8NXvdsbsjCk3dOdLiOvdlcfo14bsgXCBa
Ia3Uq5sCDXMT9kVV/ILTjellJecMAhwCDv1bZF93aOCfNZo6Rb1frYHXf8+7+s7A
+lGNHdFtclCtAB6Nf+Xq
=NgKm
-----END PGP SIGNATURE-----
Merge tag 'renesas-gpio-rcar-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers
From Simon Horman:
Renesas ARM based SoC GPIO R-Car updates for v3.11
DT support to GPIO R-Car driver by Laurent Pinchart.
* tag 'renesas-gpio-rcar-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: (131 commits)
gpio-rcar: Add DT support
These updates are by Sergei Shtylyov to clean-up USB support
present for R8A7779/Marzen and then extend USB support coverage to
R8A7778/BOCK-W.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuWA6AAoJENfPZGlqN0++YFAP/ArB3kdvcKNQNjXM0BmX/Pim
TFIOIN4NRpRFcS467PSSSObm/YJsbR+5TViISZdpo7O6hmZzUPeK2MrKNq5wmUIf
OpX0eI8dCWCRZIAZjQcOiro1BR05UEzqyc8GjnmiVnV6TShO8x9IaFKGl+RRI4WD
KoG3wNo1OvOuEToXhsyB6kMnpD8Ab/Xrj5xAAzLkml8gx6K/kAvdCnn+0Zhv8Nvu
Vp0hEnCC/z6peBrFFWqXPpt0aV4iILVtxG36funGIuMB1YE0Ih5TnoJDlAPZSIO7
RnUz6vfjczobR7Pt8wBf5wFsMuWMUt5trLgruSeeskRw+728Vc7MzFwLV8IdtlAY
LJpnrA7bd4W6R6d9u+eiCEekA4/+6SZySsLGflwfB3yzhaDr5LIMK8YZZFlXb2Bm
p2Gkh556bT7lrYdqiyjqjSk71qFokynSFUp61GMWbQTNbSY2azlpDfxG1Epo6kFu
XxpAr/XjGm4mMOp8ObxjvTlb9encP1snAHBnB3CoFh/LyjcG0nLwdWySAbTGlTy2
iabAegy7pQ+wD3Qri7WR7VYyiLogb5cF+rvVSDzS0HhtYH9OMFdb8z3NrHY+mk3Q
OCZf9bbIk5pGhQyryDXg8G6nT9qLMsjy1SB2kj6S+Tyaox4WiLJZJiZiiwI+tGrn
7bst42iHBVMJaWQKRcAq
=5+Te
-----END PGP SIGNATURE-----
Merge tag 'renesas-phy-rcar-usb-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/soc
From Simon Horman:
Renesas USB updates for v3.11
These updates are by Sergei Shtylyov to clean-up USB support
present for R8A7779/Marzen and then extend USB support coverage to
R8A7778/BOCK-W.
* tag 'renesas-phy-rcar-usb-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: BOCK-W: add USB support
ARM: shmobile: r8a7778: add USB support
phy-rcar-usb: add R8A7778 support
phy-rcar-usb: handle platform data
ARM: shmobile: Marzen: pass platform data to USB PHY device
phy-rcar-usb: add platform data
phy-rcar-usb: correct base address
ARM: shmobile: r8a7779: remove USB PHY 2nd memory resource
phy-rcar-usb: remove EHCI internal buffer setup
ARM: shmobile: r8a7779: setup EHCI internal buffer
ehci-platform: add pre_setup() method to platform data
ARM: shmobile: Marzen: move USB EHCI, OHCI, and PHY devices to R8A7779 code
Conflicts:
arch/arm/mach-shmobile/board-marzen.c
arch/arm/mach-shmobile/setup-r8a7778.c
Signed-off-by: Olof Johansson <olof@lixom.net>
support for the DMA40. Now with MUSB and some platform
data removal.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJRrctTAAoJEEEQszewGV1zoFkP/0ZS3Hw4sGX4CRcVZysFZXon
tEWMtNq3WvugmAMos1BJtVYjbu9oaI0kPnTZOU83rF8YzEGvRUy+3AlZXwGmkyYq
NgdfDElAY815F9pJ3ffFL74Dd36paWPtO55JuUEsdxLZE7c4/qOBHykURU0NzMtK
at70fgVLaNo+mjB+Q2M1ouI5tBCrdwso+vI9SHYlof2wt8TiAmbBTOcKO35nUzlk
brcrTeTtdWCc2foP7cFgmjrsct3CzB4Dfl49MiCv1zsiQL5a+qB0EAMd4xYyuVqb
GE1WQDvDwzEDre0pAb/io/NwnlF81tgr6dhFTeFxB44knqbkptQFw9XCDnTQTy5C
dda5HSJ31ES3N2IUj5K0tlSm3/cBywt5IU79FwhFN4Ndq/nzDtZQ+Y2/X0IA99is
eTpLM+/20juOyTxt4vHhdp4aL4r4ZdmOb0GAsyAg/TtsG5LJhZPIctZ+xexKqAhX
wPkvHubv87ruQC6AOjVXwez+3tzTLuU7H8a9IOHi+oHk4r+mdba3kAKTyEl717Bk
44N0hcSLbZYktE9gdBJWuuySmfLXnbb/lc+2OjFYNWMHIgDbEwS2ZeVnKB8G+Eil
iGDIMMnbCc9vA6fiDARZ3DvLFvahnUbO4BqcFozc/hHbhTceSn9xkXX3NcM8NCWF
EAWWZlx4dtAbzB7ReTDm
=rn7t
-----END PGP SIGNATURE-----
Merge tag 'ux500-dma40-for-arm-soc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson into next/drivers
From Linus Walleij:
Second set of DMA40 changes: refactorings and device tree
support for the DMA40. Now with MUSB and some platform
data removal.
* tag 'ux500-dma40-for-arm-soc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
dmaengine: ste_dma40: Fetch disabled channels from DT
dmaengine: ste_dma40: Fetch the number of physical channels from DT
ARM: ux500: Stop passing DMA platform data though AUXDATA
dmaengine: ste_dma40: Allow memcpy channels to be configured from DT
dmaengine: ste_dma40_ll: Replace meaningless register set with comment
dmaengine: ste_dma40: Convert data_width from register bit format to value
dmaengine: ste_dma40_ll: Use the BIT macro to replace ugly '(1 << x)'s
ARM: ux500: Remove recently unused stedma40_xfer_dir enums
dmaengine: ste_dma40: Replace ST-E's home-brew DMA direction defs with generic ones
ARM: ux500: Replace ST-E's home-brew DMA direction definition with the generic one
dmaengine: ste_dma40: Use the BIT macro to replace ugly '(1 << x)'s
ARM: ux500: Remove empty function u8500_of_init_devices()
ARM: ux500: Remove ux500-musb platform registation when booting with DT
usb: musb: ux500: add device tree probing support
usb: musb: ux500: attempt to find channels by name before using pdata
usb: musb: ux500: harden checks for platform data
usb: musb: ux500: take the dma_mask from coherent_dma_mask
usb: musb: ux500: move the MUSB HDRC configuration into the driver
usb: musb: ux500: move channel number knowledge into the driver
minimal support for am43x SoCs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuLgpAAoJEBvUPslcq6Vz/iEP/1r99Hn+cqZE489f09GPYkbB
CtasWSkNMc6SQMogFovOD+z4AAnv+bQGc0Np6sW6Gx/W1KZWhg2fUw50hj6Bugp2
AluncXm/pbqi738BS0mit7+kQMmqmykLr56CMjlrdlo5aZq0JU9AGCv+makV7xMy
swCL8a4HfVfF33zCLTiCfWmGSVO1YUpCi7Y+R6BcFNgaX0aXFb5dl9IxFrf1RjBE
FvVBgt059mW8r95ytgrVfQ96FxkXOGVt37suqioKwHHXgmVziG0zkkOlAEogx7Ce
J+skalmpAmrWfPGC05x1kKOr0DBn0h3gx42gxSLRZzVyIOJFHgQTuR7KNW1DFEtx
tVZ9lmE0euMtKqIce/BuiUSFd6QhjjuAdp6H3ux55YFb2NCJOHe2qL58nMBRusaN
9o/+ArfvlLHaVE7dNCyn9Hz3vuQbL/moCnQN+oSs4UGNVwCXaswKsbZLN1Qi8RG0
75r9SFxl/TqslQ+QOMU5yEd9ZxVaOGGcPP887kGADEPJfYv08AXhlAzK+r6Dxapo
Gz0EtnfJ+5nKyq9fkzBg2AZBoIBSR5pZAMyGrPX4AE4BKsyJ+aKmr+7cz1so8IFR
I/jFtha2hXW3Zxvom2uZXInjHqb3g1/zw4Rdn4Tn4H6GmgGoib/9feTljN7qV7MP
Z+mEkmqIkutOWdVWO4LH
=3bVH
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v3.11/soc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/soc
From Tony Lindgren:
Omap SoC changes. Mostly improves am33xx support, and adds
minimal support for am43x SoCs.
* tag 'omap-for-v3.11/soc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: AM43x: SRAM base and size
ARM: OMAP2+: AM43x: GP or HS ?
ARM: OMAP2+: AM43x: early init
ARM: OMAP2+: AM43x: static mapping
ARM: OMAP2+: AM437x: SoC revision detection
ARM: OMAP2+: AM43x: soc_is support
ARM: OMAP2+: AM43x: kbuild
ARM: OMAP2+: AM43x: Kconfig
ARM: OMAP2+: separate out OMAP4 restart
ARM: AM33XX: clk: Add clock node for EHRPWM TBCLK
ARM: OMAP3: clock data: get rid of unused USB host clock aliases and dummies
ARM: OMAP2+: AM33xx: Add missing reset status info to GFX hwmod
+ Linux 3.10-rc5
Signed-off-by: Olof Johansson <olof@lixom.net>
a big pile of platform init code for things that are already handled by
device tree related code. As am33xx is already device tree based, we
can also remove the same data for am33xx.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuLcYAAoJEBvUPslcq6Vz5tgQAMSyQh6/pSOJ9U4hr2ygDSk2
zaXiXCSKEdL3+ADqSpdWalNTS65hNkufrzY49Tu5ChY3ugjbr3fYFYIYuQwYgxOC
aMhV2FgJyMzH83s3T1WagGsietrcKAxBnHrrdMaaU6YDEf6PZ0X4d5HiIyGU+QZG
ojIVrqmLecd/IG/zFs7xPEeqJTjCT/HQa6CLX/lcLYQh54PVVv1EAs3S8qn4n8Ef
ufXH0y4p4t5/BSQsE5E6JTcj87LM40vB0bFPrHlJo/baYhBynCIirjabwwXjCLMl
R7g3Ms0kqXNhtYawQcatSZjQt8m9gnjgieRNfsaLeSslCRWqOgPyEx0z8F4s0wXc
EzLsib/lT72xuhhMZyHLYf8LTlQz7tN/Vu1m2pbP5Ailkd79hbYOvi3rotZnu6k7
oMLaLf5HhvtLxo7ewSmP3BZ8plryA3l+evg6cPSXEv24A26WEgdK3jIwR7Kot4Cd
ByeXL8Yt/6h+85LJIi0Fr6rk/c2pwpOg2NutfnmYP7QCa1E5at+0WyGyz1NDjSAL
4zNDhjrpUSeVYr6dNBUUzedGZoo+SnHqzNqE73GIhPDD6gArIiXAffpQPMVBbiNG
0UvZ4EOBNnOS3cwqEL5gLe5c0GLI/jYL8TOaBPhm2j29vYnOq/e/jmkl4xzg/ad4
MoNVvZT/+cRxy8w6cAmJ
=K9EM
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v3.11/cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/cleanup
From Tony Lindgren:
Move omap4 over to device tree based booting. This allows us to get rid
a big pile of platform init code for things that are already handled by
device tree related code. As am33xx is already device tree based, we
can also remove the same data for am33xx.
* tag 'omap-for-v3.11/cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP4: hwmod data: Remove irq entries from mcspi, mmc hwmods
ARM: OMAP4: hwmod data: add DSS data back
ARM: OMAP4: hwmod data: Clean up the data file
ARM: AM33XX: hwmod data: irq, dma and addr info clean up
ARM: OMAP2+: Remove omap4 ocp2scp pdata
ARM: OMAP2+: Remove omap4 pdata for USB
ARM: OMAP2+: Remove omap4 pdata from hsmmc.c
ARM: OMAP2+: Remove legacy mux data for omap4
ARM: OMAP2+: Remove board-omap4panda.c
ARM: OMAP2+: Remove board-4430sdp.c
ARM: OMAP2+: Legacy support for wl12xx when booted with devicetree
Resolved merge conflict due to a fix for 3.10 (the fix is removed since
the code is no longer used -- data comes from device tree).
Signed-off-by: Olof Johansson <olof@lixom.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuK6ZAAoJEBvUPslcq6VzNRIP/2A20ibUpXBP4tJlY3g2RAfz
F3n0xy9PpPlkVYuswKK6CLzdO2Z79GLF2obqPhZQ2K6S4FIVOk3r6CSveXaEeybh
AWeweD4/9TzO8AwOgydQ35dajXlXUoXH2tgBnjC9BxwzIyROWOrluLTorHtBuHr8
xSb3orjVwYg2v9pPOhoKmen88rAyw/AIeSl+1cTVuEhF3/h02dV6SrP/mzUwofxM
H3S48N1BDaGkdP2urCLb2lATK/mhrKI+7NnbJetHfk3l4sb/fkia0XPBaEI5weeA
QbZSDayN0ykguRVim5hZmkjKBwh5PO6WlrFutnLWVCao6zHnFqSQUqsbrG8eIkPi
m3x3aJAY8XGNc0QSIuk4QJ0FA+inbaRtZo9hld9deAHMCeJlcBO4C53GtimlVZuj
tdEmu8WJYFPs/mKGNaYjlz2h4JqPtwvBpg7zfh3n8NAKjVXcpXdI50vurHNGYlHL
4o5+tnFy2b5L/YKnuhXajvJeMedJjvG80liliyS6DbQRHs0+aZQamAcDnMcENgBu
hzw/aHV0060mdjZDIkNcS0Z5AJZ1EXMTeGkO+f6XmlGg4gqIeMsUZGojpw418cyT
Z6fDvrir2NRLume0tJZQNd4DzDDf85zQ9qFmtL6WUNPSrJ6Y5a0BfZnzkGZgRZgu
hg55Xb0qnyfTaUNTCLUb
=vzQW
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v3.11/fixes-non-critical-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/fixes-non-critical
From Tony Lindgren:
Non-critical fixes for omaps for v3.11 merge window.
* tag 'omap-for-v3.11/fixes-non-critical-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: omap-usb-host: Fix memory leaks
ARM: OMAP2+: Fix serial init for device tree based booting
arm/omap: use const char properly
ARM: OMAP2+: devices: Do not print error when dss_hdmi hwmod lookup fails
ARM: OMAP2+: devices: Do not print error when DMIC hwmod lookup fails
ARM: OMAP2+: devices: Do not print error when McPDM hwmod lookup fails
ARM: OMAP: add vdds_sdi supply for omapdss_sdi.0
ARM: OMAP: add vdds_dsi supply for omapdss_dpi.0
ARM: OMAP: fix dsi regulator names
+ Linux 3.10-rc5
Signed-off-by: Olof Johansson <olof@lixom.net>
All Transparent Huge Pages are allocated by the buddy allocator.
A compile time check is in place that fails when the order of a
transparent huge page is too large to be allocated by the buddy
allocator. Unfortunately that compile time check passes when:
HPAGE_PMD_ORDER == MAX_ORDER
( which is incorrect as the buddy allocator can only allocate
memory of order strictly less than MAX_ORDER. )
This patch updates the compile time check to fail in the above
case.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Under x86, multiple puds can be made to reference the same bank of
huge pmds provided that they represent a full PUD_SIZE of shared
huge memory that is aligned to a PUD_SIZE boundary.
The code to share pmds does not require any architecture specific
knowledge other than the fact that pmds can be indexed, thus can
be beneficial to some other architectures.
This patch copies the huge pmd sharing (and unsharing) logic from
x86/ to mm/ and introduces a new config option to activate it:
CONFIG_ARCH_WANTS_HUGE_PMD_SHARE
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
f12dc02014 ("cgroup: mark "tasks" cgroup file as insane") and
cc5943a781 ("cgroup: mark "notify_on_release" and "release_agent"
cgroup files insane") forgot to update the changed behavior
documentation in cgroup.h. Update it.
Signed-off-by: Tejun Heo <tj@kernel.org>
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller. As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability. For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable. In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref. css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s. This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs. The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id(). While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it. Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
the return value. This causes two problems - the obvious lack
of error handling and percpu_ref_init() being called from
cgroup_init_subsys() before the allocators are up, which
triggers warnings but doesn't cause actual problems as the
refcnt isn't used for roots anyway. Fix both by moving
percpu_ref_init() to cgroup_create().
- The base references were put too early by
percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
refs one extra time. This wasn't noticeable because css's go
through another RCU grace period before being freed. Update
cgroup_destroy_locked() to grab an extra reference before
killing the refcnts. This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
Split cgroup_destroy_locked() into two steps and put the latter half
into cgroup_offline_fn() which is executed from a work item. The
latter half is responsible for offlining the css's, removing the
cgroup from internal lists, and propagating release notification to
the parent. The separation is to allow using percpu refcnt for css.
Note that this allows for other cgroup operations to happen between
the first and second halves of destruction, including creating a new
cgroup with the same name. As the target cgroup is marked DEAD in the
first half and cgroup internals don't care about the names of cgroups,
this should be fine. A comment explaining this will be added by the
next patch which implements the actual percpu refcnting.
As RCU freeing is guaranteed to happen after the second step of
destruction, we can use the same work item for both. This patch
renames cgroup->free_work to ->destroy_work and uses it for both
purposes. INIT_WORK() is now performed right before queueing the work
item.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed. Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.
For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.
While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().
v2: Patch description rephrased to emphasize that tryget() may
continue to succeed on some CPUs after kill() returns as suggested
by Kent.
v3: Function comment in percpu_ref_kill_and_confirm() updated warning
people to not depend on the implied RCU grace period from the
confirm callback as it's an implementation detail.
Signed-off-by: Tejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
Add support to change the link state of VF (vPort)
Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netlink directives and ndo entry to allow for controling
VF link, which can be in one of three states:
Auto - VF link state reflects the PF link state (default)
Up - VF link state is up, traffic from VF to VF works even if
the actual PF link is down
Down - VF link state is down, no traffic from/to this VF, can be of
use while configuring the VF
Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Caught by sparse:
- __rcu: missing annotation to sd->flow_limit
- __user: direct access in cpumask_scnprintf
Also
- add endline character when printing bitmap if room in buffer
- avoid bucket overflow by reducing FLOW_LIMIT_HISTORY
The last item warrants some explanation. The hashtable buckets are
subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
to bucket size, since all packets may end up in a single bucket. The
current (rather arbitrary) history value of 256 happens to match the
buffer size (u8).
As a result, with a single flow, the first 128 packets are accepted
(correct), the second 128 packets dropped (correct) and then the
history[] array has filled, so that each subsequent new packet
causes an increment in the bucket for new_flow plus a decrement
for old_flow: a steady state.
This is fine if packets are dropped, as the steady state goes away
as soon as a mix of traffic reappears. But, because the 256th packet
overflowed the bucket to 0: no packets are dropped.
Instead of explicitly adding an overflow check, this patch changes
FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
(first item)
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull RCU fixes from Paul McKenney:
"I must confess that this past merge window was not RCU's best showing.
This series contains three more fixes for RCU regressions:
1. A fix to __DECLARE_TRACE_RCU() that causes it to act as an
interrupt from idle rather than as a task switch from idle.
This change is needed due to the recent use of _rcuidle()
tracepoints that can be invoked from interrupt handlers as well
as from idle. Without this fix, invoking _rcuidle() tracepoints
from interrupt handlers results in splats and (more seriously)
confusion on RCU's part as to whether a given CPU is idle or not.
This confusion can in turn result in too-short grace periods and
therefore random memory corruption.
2. A fix to a subtle deadlock that could result due to RCU doing
a wakeup while holding one of its rcu_node structure's locks.
Although the probability of occurrence is low, it really
does happen. The fix, courtesy of Steven Rostedt, uses
irq_work_queue() to avoid the deadlock.
3. A fix to a silent deadlock (invisible to lockdep) due to the
interaction of timeouts posted by RCU debug code enabled by
CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
hotplug operations. This will not occur in production kernels,
but really does occur in randconfig testing. Diagnosis courtesy
of Steven Rostedt"
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
rcu: Don't call wakeup() with rcu_node structure ->lock held
trace: Allow idle-safe tracepoints to be called from irq
Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously. The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.
This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release. To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.
v2: Explain the weird name and usage restriction in the function
comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Two small changes.
* Unlike most init functions, percpu_ref_init() allocates memory and
may fail. Let's mark it with __must_check in case the caller
forgets.
* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
dereference @ref->pcpu_count, which can be misleading. The pointer
is guaranteed to be valid and visible and can't change underneath
the function. Drop ACCESS_ONCE().
Signed-off-by: Tejun Heo <tj@kernel.org>
cgroup->count tracks the number of css_sets associated with the cgroup
and used only to verify that no css_set is associated when the cgroup
is being destroyed. It's superflous as the destruction path can
simply check whether cgroup->cset_links is empty instead.
Drop cgroup->count and check ->cset_links directly from
cgroup_destroy_locked().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
We will add another flag indicating that the cgroup is in the process
of being killed. REMOVING / REMOVED is more difficult to distinguish
and cgroup_is_removing()/cgroup_is_removed() are a bit awkward. Also,
later percpu_ref usage will involve "kill"ing the refcnt.
s/CGRP_REMOVED/CGRP_DEAD/
s/cgroup_is_removed()/cgroup_is_dead()
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
* __css_get() isn't used by anyone. Fold it into css_get().
* Add proper function comments to all css reference functions.
This patch is purely cosmetic.
v2: Typo fix as per Li.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
cgroups and css_sets are mapped M:N and this M:N mapping is
represented by struct cg_cgroup_link which forms linked lists on both
sides. The naming around this mapping is already confusing and struct
cg_cgroup_link exacerbates the situation quite a bit.
>From cgroup side, it starts off ->css_sets and runs through
->cgrp_link_list. From css_set side, it starts off ->cg_links and
runs through ->cg_link_list. This is rather reversed as
cgrp_link_list is used to iterate css_sets and cg_link_list cgroups.
Also, this is the only place which is still using the confusing "cg"
for css_sets. This patch cleans it up a bit.
* s/cgroup->css_sets/cgroup->cset_links/
s/css_set->cg_links/css_set->cgrp_links/
s/cgroup_iter->cg_link/cgroup_iter->cset_link/
* s/cg_cgroup_link/cgrp_cset_link/
* s/cgrp_cset_link->cg/cgrp_cset_link->cset/
s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/
s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/
* s/init_css_set_link/init_cgrp_cset_link/
s/free_cg_links/free_cgrp_cset_links/
s/allocate_cg_links/allocate_cgrp_cset_links/
* s/cgl[12]/link[12]/ in compare_css_sets()
* s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar
adustments.
* Comment and whiteline adjustments.
After the changes, we have
list_for_each_entry(link, &cont->cset_links, cset_link) {
struct css_set *cset = link->cset;
instead of
list_for_each_entry(link, &cont->css_sets, cgrp_link_list) {
struct css_set *cset = link->cg;
This patch is purely cosmetic.
v2: Fix broken sentences in the patch description.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Currently some cpuset behaviors are not friendly when cpuset is co-mounted
with other cgroup controllers.
Now with this patchset if cpuset is mounted with sane_behavior option,
it behaves differently:
- Tasks will be kept in empty cpusets when hotplug happens and take
masks of ancestors with non-empty cpus/mems, instead of being moved to
an ancestor.
- A task can be moved into an empty cpuset, and again it takes masks of
ancestors, so the user can drop a task into a newly created cgroup without
having to do anything for it.
As tasks can reside in empy cpusets, here're some rules:
- They can be moved to another cpuset, regardless it's empty or not.
- Though it takes masks from ancestors, it takes other configs from the
empty cpuset.
- If the ancestors' masks are changed, those tasks will also be updated
to take new masks.
v2: add documentation in include/linux/cgroup.h
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
To achieve this:
- We call update_tasks_cpumask/nodemask() for empty cpusets when
hotplug happens, instead of moving tasks out of them.
- When a cpuset's masks are changed by writing cpuset.cpus/mems,
we also update tasks in child cpusets which are empty.
v3:
- do propagation work in one place for both hotplug and unplug
v2:
- drop rcu_read_lock before calling update_task_nodemask() and
update_task_cpumask(), instead of using workqueue.
- add documentation in include/linux/cgroup.h
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
- DT support for the MFD, TSC and ADC driver & platform device support,
which has no users, has been killed.
- iio_map from last series is gone and replaced by proper nodes in the
device tree.
- suspend fixes which means correct data structs are taken and no
interrupt storm
- fifo split which should problem with TSC & ADC beeing used at the same
time
- The ADC channels are now checked before blindly applied. That means the
touch part reads X, Y and Z coordinates and does not mix them up. Same
goes for the IIO ADC driver.
- The IIO ADC driver now creates files named in_voltageX_raw where X
represents the ADC line instead of a number starting at 0. A read from
this file can return -EBUSY in case touch is busy and the ADC didn't
collect a value.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuKbeAAoJEHuW6BYqjPXR7D4QAMCjJsNQiUceUG6mNSXVeSx2
BkRvltixFHEzQTdMyEF4yQ8P3u8fo58R6cThMJ/1dphMKMhDf7rMVGd8wwdKsiCf
1Vz1QrBUI47Pm2upGBEa40UpqeeL6uZc4ENOG1bqpeeUTEfgao3Iefk8pHNQVpOO
UE7hj8xrdgJ8Hp4fjdXmjLmlTU6ek+s7BeeqAVsaz/eJZv+kkHPPwoJ9zyAx5sAi
KBjO8Fch38jcU8rKGl8iZUAn8skGVckJnsljys6RP+1KwqxB7y2jU3uhLgU6ciBc
zZl3IpsYrsG0vHTl72DmJrtPu7YIQLiydMyFe+zSiA+dKkzI3GnxtzRr4SKGTZbj
u3wk8JbQCh3K2LW9gaHFOR+0FJMfF3w62MM19XjcfIGow7ZKnHZJJC0HeOLeUW7V
TPFu3jFNCqYTMq2shC7VaUNI6fYiswtAuXLzEQJ0PBaeRMamyrdXyq3EB71zMCJ6
BWW1ifz3R/Xusv7g1QYQpaLJCfD+bu+zWK1LbWO9knkLYZFJahnicmkJiImQPL8t
3aH+quz+hmgZg6Agvf0EVf9y4sFCtJD0qsFeL+SlZxL/vJNnjiGPTzIMP078j61m
RzQ3XYOq0AvhXBcU9+dHTM/UsmIM7mE8lz9W7NHnvE9NUuT4z6VE6w9p75YgtSTW
yDHv8csxNba5XKi3JfH0
=VGdP
-----END PGP SIGNATURE-----
Merge tag 'am335x_tsc-adc' of git://breakpoint.cc/bigeasy/linux
A complete refurbished series inclunding:
- DT support for the MFD, TSC and ADC driver & platform device support,
which has no users, has been killed.
- iio_map from last series is gone and replaced by proper nodes in the
device tree.
- suspend fixes which means correct data structs are taken and no
interrupt storm
- fifo split which should problem with TSC & ADC beeing used at the same
time
- The ADC channels are now checked before blindly applied. That means the
touch part reads X, Y and Z coordinates and does not mix them up. Same
goes for the IIO ADC driver.
- The IIO ADC driver now creates files named in_voltageX_raw where X
represents the ADC line instead of a number starting at 0. A read from
this file can return -EBUSY in case touch is busy and the ADC didn't
collect a value.
AB8540 RTC have changed between AB8540_cut1 and AB8540_cut2.Different
ressources to define for those two version.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Julien Delacou <julien.delacou@stericsson.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
A bunch of enhancements and fixes for the arizona devices, adding a few
new features (the main one being device tree) and improving robustness.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRnV9SAAoJELSic+t+oim9TsAQAJgDyiR+7bJ80uj6cVWJqq4L
geAoaWv6WHaHYwpNXjo6srUngRY3/cJwq2S3rNjnq3hxX6tbVMD0MwoyN/+AAfRQ
u/QQnYQWieE6JTGIpqLiKOVe4hm+uBXhHVwPOogVr1MngA5MRZPmk3SeqUZ6io7N
GdTjbh2VaffgHIsle+KjWfeA+OXTcEV3MFhY0rjrP+zIdmiB+csQVo286aYnuClN
wAiGyyAfRZ0uwYa+adA90zqmPKCL2Vvqeb8fbPHKJ2DVX1Y9N3mX8hpqhYrYJ+7j
FNK3AmZ8DN28VRwiuDtV0BSEcoIk8ejhXOIcX3lgGd0Vg2+5J4A0PnwCdMtHH8eb
MmUp2hjg3sGZlxabO5q1X1DGwx2n/Ilk1iWI8XIfQJWT8aLnxCuCVQtVX9Zx2tVX
kE9rCXOQ7rePDDKt1DVU7gX6nZ08CZ9Ixgif2PIFazYN4go8Bto48VLCDVUA3NQB
3uxrcxF2xzyMr8A+j0e8AkBh9sdx1fwmNElOxHVnd7HJ9uGxzDoOIlH628PvTZTZ
5R7PHKzr6atgjC3a3PfBvRtZeJ7GxMjElqO2gJNOUHxIRgNsRTlHhcxhTbMmAuP7
0d/ZQoCqXD5/HaqUcSPSrb8hrPHabYDdoIMyi1zzflLAcimnrGgf+6m5VRg/GKa5
w7WgVftJlkdqm+2yl4ih
=X6Aa
-----END PGP SIGNATURE-----
Merge tag 'mfd-arizona-v3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc
mfd: arizona: Updates for v3.10
A bunch of enhancements and fixes for the arizona devices, adding a few
new features (the main one being device tree) and improving robustness.
* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
postfix for types and the type is gonna be used for a different type
of callback too.
* Add @ARG to function comments.
* Drop unnecessary and unaligned indentation from percpu_ref_init()
function comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
percpu_ref_get/put() are using preempt_disable/enable() while
percpu_ref_kill() is using plain call_rcu() instead of
call_rcu_sched(). This is buggy as grace periods of the two may not
match. Fix it by using plain RCU in percpu_ref_get/put().
(I suggested using sched RCU in the first place but there's no actual
benefit in doing so unless we're gonna introduce different variants
of get/put to be called while preemption is alredy disabled, which we
definitely shouldn't.)
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Kent Overstreet <koverstreet@google.com>
Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):
[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079
i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:
(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104
It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suggested-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The bit_spinlock functions are only used for the jbd_lock_bh_state
functions (and friends) in jbd_common.h and are not directly used
by either of jbd.h or jbd2.h content.
The jbd_common file is new as of commit 446066724c ("jdb/jbd2: factor
out common functions from the jbd[2] header files") but common
(and isolated) headers were not considered for factoring at that time.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Inode's data or non journaled quota may be written w/o jounral so we
_must_ send a barrier at the end of ext4_sync_fs. But it can be
skipped if journal commit will do it for us.
Also fix data integrity for nojournal mode.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already have jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Pull networking update from David Miller:
1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
dump all objects properly, from Pablo Neira Ayuso.
2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
present. Fix from Phil Oester.
3) qdisc_get_rtab() looks for an existing matching rate table and uses
that instead of creating a new one. However, it's key matching is
incomplete, it fails to check to make sure the ->data[] array is
identical too. Fix from Eric Dumazet.
4) ip_vs_dest_entry isn't fully initialized before copying back to
userspace, fix from Dan Carpenter.
5) Fix ubuf reference counting regression in vhost_net, from Jason
Wang.
6) When sock_diag dumps a socket filter back to userspace, we have to
translate it out of the kernel's internal representation first.
From Nicolas Dichtel.
7) davinci_mdio holds a spinlock while calling pm_runtime, which
sleeps. Fix from Sebastian Siewior.
8) Timeout check in sh_eth_check_reset is off by one, from Sergei
Shtylyov.
9) If sctp socket init fails, we can NULL deref during cleanup. Fix
from Daniel Borkmann.
10) netlink_mmap() does not propagate errors properly, from Patrick
McHardy.
11) Disable powersave and use minstrel by default in ath9k. From Sujith
Manoharan.
12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
which prevents vhost from being able to use zerocopy. From Jason
Wang.
13) Fix race between port lookup and TX path in team driver, from Jiri
Pirko.
14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
Hedberg.
15) rtlwifi fails to connect to networking using any encryption method
other than WPA2. Fix from Larry Finger.
16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
management stuff. From Yijing Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
b43: stop format string leaking into error msgs
ath9k: Use minstrel rate control by default
Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
ath9k: Disable PowerSave by default
net: wireless: iwlegacy: fix build error for il_pm_ops
rtlwifi: Fix a false leak indication for PCI devices
wl12xx/wl18xx: scan all 5ghz channels
wl12xx: increase minimum singlerole firmware version required
wl12xx: fix minimum required firmware version for wl127x multirole
rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
mwifiex: debugfs: Fix out of bounds array access
Bluetooth: Fix mgmt handling of power on failures
Bluetooth: Fix missing length checks for L2CAP signalling PDUs
Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
Bluetooth: Fix checks for LE support on LE-only controllers
team: fix checks in team_get_first_port_txable_rcu()
team: move add to port list before port enablement
team: check return value of team_get_port_by_index_rcu() for NULL
tuntap: set SOCK_ZEROCOPY flag during open
netlink: fix error propagation in netlink_mmap()
...
Pull block layer fixes from Jens Axboe:
"Outside of bcache (which really isn't super big), these are all
few-liners. There are a few important fixes in here:
- Fix blk pm sleeping when holding the queue lock
- A small collection of bcache fixes that have been done and tested
since bcache was included in this merge window.
- A fix for a raid5 regression introduced with the bio changes.
- Two important fixes for mtip32xx, fixing an oops and potential data
corruption (or hang) due to wrong bio iteration on stacked devices."
* 'for-linus' of git://git.kernel.dk/linux-block:
scatterlist: sg_set_buf() argument must be in linear mapping
raid5: Initialize bi_vcnt
pktcdvd: silence static checker warning
block: remove refs to XD disks from documentation
blkpm: avoid sleep when holding queue lock
mtip32xx: Correctly handle bio->bi_idx != 0 conditions
mtip32xx: Fix NULL pointer dereference during module unload
bcache: Fix error handling in init code
bcache: clarify free/available/unused space
bcache: drop "select CLOSURES"
bcache: Fix incompatible pointer type warning
There is div64_long() to handle the s64/long division, but no mocro do
u64/ul division. It is necessary in some scenarios, so add this
function.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly and do busy looping on
hugepage fault until the migration finishes. As a result, users who try
to kick hugepage migration (via soft offlining, for example) occasionally
experience long delay or soft lockup.
This is because pte_offset_map_lock() can't get a correct migration entry
or a correct page table lock for hugepage. This patch introduces
migration_entry_wait_huge() to solve this.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [2.6.35+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation. Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.
Restructure the code and make it generic enough to be useful for other
usecases too.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All function drivers are now converted to our new configfs-based
binding. Eventually this will help us getting rid of in-kernel
gadget drivers and only keep function drivers in the kernel.
MUSB was taught that it needs to be built for host-only and
device-only modes too. We had this support long ago but it
involved a ridiculous amount of ifdefs. Now we have a much
cleaner approach.
Samsung Exynos4 platform now implements HSIC support.
We're introducing support for AB8540 and AB9540 PHYs.
MUSB module reinsertion now works as expected, before we were
getting -EBUSY being returned by the resource checks done on
driver core.
DWC3 now has minimum support for TI's AM437x series of SoCs.
OMAP5 USB3 PHY learned one extra DPLL configuration values because
that PHY is reused in TI's DRA7xx devices.
We're introducing support for Faraday fotg210 UDCs.
Last, but not least, the usual set of non-critical fixes and cleanups
ranging from usage of platform_{get,set}_drvdata to lock improvements.
Signed-of-by: Felipe Balbi <balbi@ti.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuODuAAoJEIaOsuA1yqREEYMP/212PIcMM/niwl2T97l+Ispc
EVe8ebg69/t+LjEHmipyw00HvBuGv+6ccJbuU+NBSSi229iIkxXlE+Q7MoywHOZg
eSozqiIXIotkNTPg4vT6YfWspyNaoiDrl9TK3KMP9SyctlgxqMdcfke5dqpGpdUP
xqYhWCAbZ6uvu6Lq6r3NwX1pMKhXxbnTDCY77YOCb/H8UPlSHSW4nwjAKYvsEWwD
RLXn0UKDZF4FRm296ftIHDD8rDazCaQPkkglQejFrqheNpbR7SUkC672veca7xF5
2iaWS62p7SWDHsfzyLpeJwoglHcxRa3E8ZqdT9ALvrimMTm0jVM0pzDSCF2xBpFq
UP78YX2S94o/YC8NXfp6GMf5CFSlLDxQ7oahcUpUBVtx5l2v8bfyb2/KOrB6kHBS
v8RJqFbcYXHHygaYS0oXGqKg2ScwYeVIenlrk8ByPrfkJqS3v7CKLB0wNrV5ZWyC
nnfyMF+bW+M00nb9jKjS+Utni8looKpWdKcmAdP/zPVKDZE5zh5WL2q/zWepWdgP
8nIslvivXmAkNs8wN5ji/E/w9qqkXiYCVkSQXfXPgBLWesaQqBR2geRWduSetKSm
AHINjU4+wXkRR0V1HyKzn+b1v5yZ5ksV7n5SXltyXKNeO0IeBDHNBHRVPFqHdgau
u2prz3aPvqEFENqgr7z5
=O9AH
-----END PGP SIGNATURE-----
Merge tag 'usb-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next
Felipe writes:
usb: patches for v3.11 merge window
All function drivers are now converted to our new configfs-based
binding. Eventually this will help us getting rid of in-kernel
gadget drivers and only keep function drivers in the kernel.
MUSB was taught that it needs to be built for host-only and
device-only modes too. We had this support long ago but it
involved a ridiculous amount of ifdefs. Now we have a much
cleaner approach.
Samsung Exynos4 platform now implements HSIC support.
We're introducing support for AB8540 and AB9540 PHYs.
MUSB module reinsertion now works as expected, before we were
getting -EBUSY being returned by the resource checks done on
driver core.
DWC3 now has minimum support for TI's AM437x series of SoCs.
OMAP5 USB3 PHY learned one extra DPLL configuration values because
that PHY is reused in TI's DRA7xx devices.
We're introducing support for Faraday fotg210 UDCs.
Last, but not least, the usual set of non-critical fixes and cleanups
ranging from usage of platform_{get,set}_drvdata to lock improvements.
Signed-of-by: Felipe Balbi <balbi@ti.com>