-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuK6ZAAoJEBvUPslcq6VzNRIP/2A20ibUpXBP4tJlY3g2RAfz
F3n0xy9PpPlkVYuswKK6CLzdO2Z79GLF2obqPhZQ2K6S4FIVOk3r6CSveXaEeybh
AWeweD4/9TzO8AwOgydQ35dajXlXUoXH2tgBnjC9BxwzIyROWOrluLTorHtBuHr8
xSb3orjVwYg2v9pPOhoKmen88rAyw/AIeSl+1cTVuEhF3/h02dV6SrP/mzUwofxM
H3S48N1BDaGkdP2urCLb2lATK/mhrKI+7NnbJetHfk3l4sb/fkia0XPBaEI5weeA
QbZSDayN0ykguRVim5hZmkjKBwh5PO6WlrFutnLWVCao6zHnFqSQUqsbrG8eIkPi
m3x3aJAY8XGNc0QSIuk4QJ0FA+inbaRtZo9hld9deAHMCeJlcBO4C53GtimlVZuj
tdEmu8WJYFPs/mKGNaYjlz2h4JqPtwvBpg7zfh3n8NAKjVXcpXdI50vurHNGYlHL
4o5+tnFy2b5L/YKnuhXajvJeMedJjvG80liliyS6DbQRHs0+aZQamAcDnMcENgBu
hzw/aHV0060mdjZDIkNcS0Z5AJZ1EXMTeGkO+f6XmlGg4gqIeMsUZGojpw418cyT
Z6fDvrir2NRLume0tJZQNd4DzDDf85zQ9qFmtL6WUNPSrJ6Y5a0BfZnzkGZgRZgu
hg55Xb0qnyfTaUNTCLUb
=vzQW
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v3.11/fixes-non-critical-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/fixes-non-critical
From Tony Lindgren:
Non-critical fixes for omaps for v3.11 merge window.
* tag 'omap-for-v3.11/fixes-non-critical-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: omap-usb-host: Fix memory leaks
ARM: OMAP2+: Fix serial init for device tree based booting
arm/omap: use const char properly
ARM: OMAP2+: devices: Do not print error when dss_hdmi hwmod lookup fails
ARM: OMAP2+: devices: Do not print error when DMIC hwmod lookup fails
ARM: OMAP2+: devices: Do not print error when McPDM hwmod lookup fails
ARM: OMAP: add vdds_sdi supply for omapdss_sdi.0
ARM: OMAP: add vdds_dsi supply for omapdss_dpi.0
ARM: OMAP: fix dsi regulator names
+ Linux 3.10-rc5
Signed-off-by: Olof Johansson <olof@lixom.net>
All Transparent Huge Pages are allocated by the buddy allocator.
A compile time check is in place that fails when the order of a
transparent huge page is too large to be allocated by the buddy
allocator. Unfortunately that compile time check passes when:
HPAGE_PMD_ORDER == MAX_ORDER
( which is incorrect as the buddy allocator can only allocate
memory of order strictly less than MAX_ORDER. )
This patch updates the compile time check to fail in the above
case.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Under x86, multiple puds can be made to reference the same bank of
huge pmds provided that they represent a full PUD_SIZE of shared
huge memory that is aligned to a PUD_SIZE boundary.
The code to share pmds does not require any architecture specific
knowledge other than the fact that pmds can be indexed, thus can
be beneficial to some other architectures.
This patch copies the huge pmd sharing (and unsharing) logic from
x86/ to mm/ and introduces a new config option to activate it:
CONFIG_ARCH_WANTS_HUGE_PMD_SHARE
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
f12dc02014 ("cgroup: mark "tasks" cgroup file as insane") and
cc5943a781 ("cgroup: mark "notify_on_release" and "release_agent"
cgroup files insane") forgot to update the changed behavior
documentation in cgroup.h. Update it.
Signed-off-by: Tejun Heo <tj@kernel.org>
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller. As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability. For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable. In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref. css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s. This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs. The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id(). While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it. Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
the return value. This causes two problems - the obvious lack
of error handling and percpu_ref_init() being called from
cgroup_init_subsys() before the allocators are up, which
triggers warnings but doesn't cause actual problems as the
refcnt isn't used for roots anyway. Fix both by moving
percpu_ref_init() to cgroup_create().
- The base references were put too early by
percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
refs one extra time. This wasn't noticeable because css's go
through another RCU grace period before being freed. Update
cgroup_destroy_locked() to grab an extra reference before
killing the refcnts. This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
Split cgroup_destroy_locked() into two steps and put the latter half
into cgroup_offline_fn() which is executed from a work item. The
latter half is responsible for offlining the css's, removing the
cgroup from internal lists, and propagating release notification to
the parent. The separation is to allow using percpu refcnt for css.
Note that this allows for other cgroup operations to happen between
the first and second halves of destruction, including creating a new
cgroup with the same name. As the target cgroup is marked DEAD in the
first half and cgroup internals don't care about the names of cgroups,
this should be fine. A comment explaining this will be added by the
next patch which implements the actual percpu refcnting.
As RCU freeing is guaranteed to happen after the second step of
destruction, we can use the same work item for both. This patch
renames cgroup->free_work to ->destroy_work and uses it for both
purposes. INIT_WORK() is now performed right before queueing the work
item.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed. Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.
For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.
While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().
v2: Patch description rephrased to emphasize that tryget() may
continue to succeed on some CPUs after kill() returns as suggested
by Kent.
v3: Function comment in percpu_ref_kill_and_confirm() updated warning
people to not depend on the implied RCU grace period from the
confirm callback as it's an implementation detail.
Signed-off-by: Tejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
Add support to change the link state of VF (vPort)
Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netlink directives and ndo entry to allow for controling
VF link, which can be in one of three states:
Auto - VF link state reflects the PF link state (default)
Up - VF link state is up, traffic from VF to VF works even if
the actual PF link is down
Down - VF link state is down, no traffic from/to this VF, can be of
use while configuring the VF
Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Caught by sparse:
- __rcu: missing annotation to sd->flow_limit
- __user: direct access in cpumask_scnprintf
Also
- add endline character when printing bitmap if room in buffer
- avoid bucket overflow by reducing FLOW_LIMIT_HISTORY
The last item warrants some explanation. The hashtable buckets are
subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
to bucket size, since all packets may end up in a single bucket. The
current (rather arbitrary) history value of 256 happens to match the
buffer size (u8).
As a result, with a single flow, the first 128 packets are accepted
(correct), the second 128 packets dropped (correct) and then the
history[] array has filled, so that each subsequent new packet
causes an increment in the bucket for new_flow plus a decrement
for old_flow: a steady state.
This is fine if packets are dropped, as the steady state goes away
as soon as a mix of traffic reappears. But, because the 256th packet
overflowed the bucket to 0: no packets are dropped.
Instead of explicitly adding an overflow check, this patch changes
FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
(first item)
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull RCU fixes from Paul McKenney:
"I must confess that this past merge window was not RCU's best showing.
This series contains three more fixes for RCU regressions:
1. A fix to __DECLARE_TRACE_RCU() that causes it to act as an
interrupt from idle rather than as a task switch from idle.
This change is needed due to the recent use of _rcuidle()
tracepoints that can be invoked from interrupt handlers as well
as from idle. Without this fix, invoking _rcuidle() tracepoints
from interrupt handlers results in splats and (more seriously)
confusion on RCU's part as to whether a given CPU is idle or not.
This confusion can in turn result in too-short grace periods and
therefore random memory corruption.
2. A fix to a subtle deadlock that could result due to RCU doing
a wakeup while holding one of its rcu_node structure's locks.
Although the probability of occurrence is low, it really
does happen. The fix, courtesy of Steven Rostedt, uses
irq_work_queue() to avoid the deadlock.
3. A fix to a silent deadlock (invisible to lockdep) due to the
interaction of timeouts posted by RCU debug code enabled by
CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
hotplug operations. This will not occur in production kernels,
but really does occur in randconfig testing. Diagnosis courtesy
of Steven Rostedt"
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
rcu: Don't call wakeup() with rcu_node structure ->lock held
trace: Allow idle-safe tracepoints to be called from irq
Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously. The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.
This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release. To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.
v2: Explain the weird name and usage restriction in the function
comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Two small changes.
* Unlike most init functions, percpu_ref_init() allocates memory and
may fail. Let's mark it with __must_check in case the caller
forgets.
* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
dereference @ref->pcpu_count, which can be misleading. The pointer
is guaranteed to be valid and visible and can't change underneath
the function. Drop ACCESS_ONCE().
Signed-off-by: Tejun Heo <tj@kernel.org>
cgroup->count tracks the number of css_sets associated with the cgroup
and used only to verify that no css_set is associated when the cgroup
is being destroyed. It's superflous as the destruction path can
simply check whether cgroup->cset_links is empty instead.
Drop cgroup->count and check ->cset_links directly from
cgroup_destroy_locked().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
We will add another flag indicating that the cgroup is in the process
of being killed. REMOVING / REMOVED is more difficult to distinguish
and cgroup_is_removing()/cgroup_is_removed() are a bit awkward. Also,
later percpu_ref usage will involve "kill"ing the refcnt.
s/CGRP_REMOVED/CGRP_DEAD/
s/cgroup_is_removed()/cgroup_is_dead()
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
* __css_get() isn't used by anyone. Fold it into css_get().
* Add proper function comments to all css reference functions.
This patch is purely cosmetic.
v2: Typo fix as per Li.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
cgroups and css_sets are mapped M:N and this M:N mapping is
represented by struct cg_cgroup_link which forms linked lists on both
sides. The naming around this mapping is already confusing and struct
cg_cgroup_link exacerbates the situation quite a bit.
>From cgroup side, it starts off ->css_sets and runs through
->cgrp_link_list. From css_set side, it starts off ->cg_links and
runs through ->cg_link_list. This is rather reversed as
cgrp_link_list is used to iterate css_sets and cg_link_list cgroups.
Also, this is the only place which is still using the confusing "cg"
for css_sets. This patch cleans it up a bit.
* s/cgroup->css_sets/cgroup->cset_links/
s/css_set->cg_links/css_set->cgrp_links/
s/cgroup_iter->cg_link/cgroup_iter->cset_link/
* s/cg_cgroup_link/cgrp_cset_link/
* s/cgrp_cset_link->cg/cgrp_cset_link->cset/
s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/
s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/
* s/init_css_set_link/init_cgrp_cset_link/
s/free_cg_links/free_cgrp_cset_links/
s/allocate_cg_links/allocate_cgrp_cset_links/
* s/cgl[12]/link[12]/ in compare_css_sets()
* s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar
adustments.
* Comment and whiteline adjustments.
After the changes, we have
list_for_each_entry(link, &cont->cset_links, cset_link) {
struct css_set *cset = link->cset;
instead of
list_for_each_entry(link, &cont->css_sets, cgrp_link_list) {
struct css_set *cset = link->cg;
This patch is purely cosmetic.
v2: Fix broken sentences in the patch description.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Currently some cpuset behaviors are not friendly when cpuset is co-mounted
with other cgroup controllers.
Now with this patchset if cpuset is mounted with sane_behavior option,
it behaves differently:
- Tasks will be kept in empty cpusets when hotplug happens and take
masks of ancestors with non-empty cpus/mems, instead of being moved to
an ancestor.
- A task can be moved into an empty cpuset, and again it takes masks of
ancestors, so the user can drop a task into a newly created cgroup without
having to do anything for it.
As tasks can reside in empy cpusets, here're some rules:
- They can be moved to another cpuset, regardless it's empty or not.
- Though it takes masks from ancestors, it takes other configs from the
empty cpuset.
- If the ancestors' masks are changed, those tasks will also be updated
to take new masks.
v2: add documentation in include/linux/cgroup.h
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
To achieve this:
- We call update_tasks_cpumask/nodemask() for empty cpusets when
hotplug happens, instead of moving tasks out of them.
- When a cpuset's masks are changed by writing cpuset.cpus/mems,
we also update tasks in child cpusets which are empty.
v3:
- do propagation work in one place for both hotplug and unplug
v2:
- drop rcu_read_lock before calling update_task_nodemask() and
update_task_cpumask(), instead of using workqueue.
- add documentation in include/linux/cgroup.h
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
- DT support for the MFD, TSC and ADC driver & platform device support,
which has no users, has been killed.
- iio_map from last series is gone and replaced by proper nodes in the
device tree.
- suspend fixes which means correct data structs are taken and no
interrupt storm
- fifo split which should problem with TSC & ADC beeing used at the same
time
- The ADC channels are now checked before blindly applied. That means the
touch part reads X, Y and Z coordinates and does not mix them up. Same
goes for the IIO ADC driver.
- The IIO ADC driver now creates files named in_voltageX_raw where X
represents the ADC line instead of a number starting at 0. A read from
this file can return -EBUSY in case touch is busy and the ADC didn't
collect a value.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuKbeAAoJEHuW6BYqjPXR7D4QAMCjJsNQiUceUG6mNSXVeSx2
BkRvltixFHEzQTdMyEF4yQ8P3u8fo58R6cThMJ/1dphMKMhDf7rMVGd8wwdKsiCf
1Vz1QrBUI47Pm2upGBEa40UpqeeL6uZc4ENOG1bqpeeUTEfgao3Iefk8pHNQVpOO
UE7hj8xrdgJ8Hp4fjdXmjLmlTU6ek+s7BeeqAVsaz/eJZv+kkHPPwoJ9zyAx5sAi
KBjO8Fch38jcU8rKGl8iZUAn8skGVckJnsljys6RP+1KwqxB7y2jU3uhLgU6ciBc
zZl3IpsYrsG0vHTl72DmJrtPu7YIQLiydMyFe+zSiA+dKkzI3GnxtzRr4SKGTZbj
u3wk8JbQCh3K2LW9gaHFOR+0FJMfF3w62MM19XjcfIGow7ZKnHZJJC0HeOLeUW7V
TPFu3jFNCqYTMq2shC7VaUNI6fYiswtAuXLzEQJ0PBaeRMamyrdXyq3EB71zMCJ6
BWW1ifz3R/Xusv7g1QYQpaLJCfD+bu+zWK1LbWO9knkLYZFJahnicmkJiImQPL8t
3aH+quz+hmgZg6Agvf0EVf9y4sFCtJD0qsFeL+SlZxL/vJNnjiGPTzIMP078j61m
RzQ3XYOq0AvhXBcU9+dHTM/UsmIM7mE8lz9W7NHnvE9NUuT4z6VE6w9p75YgtSTW
yDHv8csxNba5XKi3JfH0
=VGdP
-----END PGP SIGNATURE-----
Merge tag 'am335x_tsc-adc' of git://breakpoint.cc/bigeasy/linux
A complete refurbished series inclunding:
- DT support for the MFD, TSC and ADC driver & platform device support,
which has no users, has been killed.
- iio_map from last series is gone and replaced by proper nodes in the
device tree.
- suspend fixes which means correct data structs are taken and no
interrupt storm
- fifo split which should problem with TSC & ADC beeing used at the same
time
- The ADC channels are now checked before blindly applied. That means the
touch part reads X, Y and Z coordinates and does not mix them up. Same
goes for the IIO ADC driver.
- The IIO ADC driver now creates files named in_voltageX_raw where X
represents the ADC line instead of a number starting at 0. A read from
this file can return -EBUSY in case touch is busy and the ADC didn't
collect a value.
AB8540 RTC have changed between AB8540_cut1 and AB8540_cut2.Different
ressources to define for those two version.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Julien Delacou <julien.delacou@stericsson.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
A bunch of enhancements and fixes for the arizona devices, adding a few
new features (the main one being device tree) and improving robustness.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRnV9SAAoJELSic+t+oim9TsAQAJgDyiR+7bJ80uj6cVWJqq4L
geAoaWv6WHaHYwpNXjo6srUngRY3/cJwq2S3rNjnq3hxX6tbVMD0MwoyN/+AAfRQ
u/QQnYQWieE6JTGIpqLiKOVe4hm+uBXhHVwPOogVr1MngA5MRZPmk3SeqUZ6io7N
GdTjbh2VaffgHIsle+KjWfeA+OXTcEV3MFhY0rjrP+zIdmiB+csQVo286aYnuClN
wAiGyyAfRZ0uwYa+adA90zqmPKCL2Vvqeb8fbPHKJ2DVX1Y9N3mX8hpqhYrYJ+7j
FNK3AmZ8DN28VRwiuDtV0BSEcoIk8ejhXOIcX3lgGd0Vg2+5J4A0PnwCdMtHH8eb
MmUp2hjg3sGZlxabO5q1X1DGwx2n/Ilk1iWI8XIfQJWT8aLnxCuCVQtVX9Zx2tVX
kE9rCXOQ7rePDDKt1DVU7gX6nZ08CZ9Ixgif2PIFazYN4go8Bto48VLCDVUA3NQB
3uxrcxF2xzyMr8A+j0e8AkBh9sdx1fwmNElOxHVnd7HJ9uGxzDoOIlH628PvTZTZ
5R7PHKzr6atgjC3a3PfBvRtZeJ7GxMjElqO2gJNOUHxIRgNsRTlHhcxhTbMmAuP7
0d/ZQoCqXD5/HaqUcSPSrb8hrPHabYDdoIMyi1zzflLAcimnrGgf+6m5VRg/GKa5
w7WgVftJlkdqm+2yl4ih
=X6Aa
-----END PGP SIGNATURE-----
Merge tag 'mfd-arizona-v3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc
mfd: arizona: Updates for v3.10
A bunch of enhancements and fixes for the arizona devices, adding a few
new features (the main one being device tree) and improving robustness.
* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
postfix for types and the type is gonna be used for a different type
of callback too.
* Add @ARG to function comments.
* Drop unnecessary and unaligned indentation from percpu_ref_init()
function comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
percpu_ref_get/put() are using preempt_disable/enable() while
percpu_ref_kill() is using plain call_rcu() instead of
call_rcu_sched(). This is buggy as grace periods of the two may not
match. Fix it by using plain RCU in percpu_ref_get/put().
(I suggested using sched RCU in the first place but there's no actual
benefit in doing so unless we're gonna introduce different variants
of get/put to be called while preemption is alredy disabled, which we
definitely shouldn't.)
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Kent Overstreet <koverstreet@google.com>
Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):
[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079
i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:
(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104
It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suggested-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The bit_spinlock functions are only used for the jbd_lock_bh_state
functions (and friends) in jbd_common.h and are not directly used
by either of jbd.h or jbd2.h content.
The jbd_common file is new as of commit 446066724c ("jdb/jbd2: factor
out common functions from the jbd[2] header files") but common
(and isolated) headers were not considered for factoring at that time.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Inode's data or non journaled quota may be written w/o jounral so we
_must_ send a barrier at the end of ext4_sync_fs. But it can be
skipped if journal commit will do it for us.
Also fix data integrity for nojournal mode.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already have jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Pull networking update from David Miller:
1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
dump all objects properly, from Pablo Neira Ayuso.
2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
present. Fix from Phil Oester.
3) qdisc_get_rtab() looks for an existing matching rate table and uses
that instead of creating a new one. However, it's key matching is
incomplete, it fails to check to make sure the ->data[] array is
identical too. Fix from Eric Dumazet.
4) ip_vs_dest_entry isn't fully initialized before copying back to
userspace, fix from Dan Carpenter.
5) Fix ubuf reference counting regression in vhost_net, from Jason
Wang.
6) When sock_diag dumps a socket filter back to userspace, we have to
translate it out of the kernel's internal representation first.
From Nicolas Dichtel.
7) davinci_mdio holds a spinlock while calling pm_runtime, which
sleeps. Fix from Sebastian Siewior.
8) Timeout check in sh_eth_check_reset is off by one, from Sergei
Shtylyov.
9) If sctp socket init fails, we can NULL deref during cleanup. Fix
from Daniel Borkmann.
10) netlink_mmap() does not propagate errors properly, from Patrick
McHardy.
11) Disable powersave and use minstrel by default in ath9k. From Sujith
Manoharan.
12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
which prevents vhost from being able to use zerocopy. From Jason
Wang.
13) Fix race between port lookup and TX path in team driver, from Jiri
Pirko.
14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
Hedberg.
15) rtlwifi fails to connect to networking using any encryption method
other than WPA2. Fix from Larry Finger.
16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
management stuff. From Yijing Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
b43: stop format string leaking into error msgs
ath9k: Use minstrel rate control by default
Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
ath9k: Disable PowerSave by default
net: wireless: iwlegacy: fix build error for il_pm_ops
rtlwifi: Fix a false leak indication for PCI devices
wl12xx/wl18xx: scan all 5ghz channels
wl12xx: increase minimum singlerole firmware version required
wl12xx: fix minimum required firmware version for wl127x multirole
rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
mwifiex: debugfs: Fix out of bounds array access
Bluetooth: Fix mgmt handling of power on failures
Bluetooth: Fix missing length checks for L2CAP signalling PDUs
Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
Bluetooth: Fix checks for LE support on LE-only controllers
team: fix checks in team_get_first_port_txable_rcu()
team: move add to port list before port enablement
team: check return value of team_get_port_by_index_rcu() for NULL
tuntap: set SOCK_ZEROCOPY flag during open
netlink: fix error propagation in netlink_mmap()
...
Pull block layer fixes from Jens Axboe:
"Outside of bcache (which really isn't super big), these are all
few-liners. There are a few important fixes in here:
- Fix blk pm sleeping when holding the queue lock
- A small collection of bcache fixes that have been done and tested
since bcache was included in this merge window.
- A fix for a raid5 regression introduced with the bio changes.
- Two important fixes for mtip32xx, fixing an oops and potential data
corruption (or hang) due to wrong bio iteration on stacked devices."
* 'for-linus' of git://git.kernel.dk/linux-block:
scatterlist: sg_set_buf() argument must be in linear mapping
raid5: Initialize bi_vcnt
pktcdvd: silence static checker warning
block: remove refs to XD disks from documentation
blkpm: avoid sleep when holding queue lock
mtip32xx: Correctly handle bio->bi_idx != 0 conditions
mtip32xx: Fix NULL pointer dereference during module unload
bcache: Fix error handling in init code
bcache: clarify free/available/unused space
bcache: drop "select CLOSURES"
bcache: Fix incompatible pointer type warning
There is div64_long() to handle the s64/long division, but no mocro do
u64/ul division. It is necessary in some scenarios, so add this
function.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly and do busy looping on
hugepage fault until the migration finishes. As a result, users who try
to kick hugepage migration (via soft offlining, for example) occasionally
experience long delay or soft lockup.
This is because pte_offset_map_lock() can't get a correct migration entry
or a correct page table lock for hugepage. This patch introduces
migration_entry_wait_huge() to solve this.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [2.6.35+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation. Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.
Restructure the code and make it generic enough to be useful for other
usecases too.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All function drivers are now converted to our new configfs-based
binding. Eventually this will help us getting rid of in-kernel
gadget drivers and only keep function drivers in the kernel.
MUSB was taught that it needs to be built for host-only and
device-only modes too. We had this support long ago but it
involved a ridiculous amount of ifdefs. Now we have a much
cleaner approach.
Samsung Exynos4 platform now implements HSIC support.
We're introducing support for AB8540 and AB9540 PHYs.
MUSB module reinsertion now works as expected, before we were
getting -EBUSY being returned by the resource checks done on
driver core.
DWC3 now has minimum support for TI's AM437x series of SoCs.
OMAP5 USB3 PHY learned one extra DPLL configuration values because
that PHY is reused in TI's DRA7xx devices.
We're introducing support for Faraday fotg210 UDCs.
Last, but not least, the usual set of non-critical fixes and cleanups
ranging from usage of platform_{get,set}_drvdata to lock improvements.
Signed-of-by: Felipe Balbi <balbi@ti.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRuODuAAoJEIaOsuA1yqREEYMP/212PIcMM/niwl2T97l+Ispc
EVe8ebg69/t+LjEHmipyw00HvBuGv+6ccJbuU+NBSSi229iIkxXlE+Q7MoywHOZg
eSozqiIXIotkNTPg4vT6YfWspyNaoiDrl9TK3KMP9SyctlgxqMdcfke5dqpGpdUP
xqYhWCAbZ6uvu6Lq6r3NwX1pMKhXxbnTDCY77YOCb/H8UPlSHSW4nwjAKYvsEWwD
RLXn0UKDZF4FRm296ftIHDD8rDazCaQPkkglQejFrqheNpbR7SUkC672veca7xF5
2iaWS62p7SWDHsfzyLpeJwoglHcxRa3E8ZqdT9ALvrimMTm0jVM0pzDSCF2xBpFq
UP78YX2S94o/YC8NXfp6GMf5CFSlLDxQ7oahcUpUBVtx5l2v8bfyb2/KOrB6kHBS
v8RJqFbcYXHHygaYS0oXGqKg2ScwYeVIenlrk8ByPrfkJqS3v7CKLB0wNrV5ZWyC
nnfyMF+bW+M00nb9jKjS+Utni8looKpWdKcmAdP/zPVKDZE5zh5WL2q/zWepWdgP
8nIslvivXmAkNs8wN5ji/E/w9qqkXiYCVkSQXfXPgBLWesaQqBR2geRWduSetKSm
AHINjU4+wXkRR0V1HyKzn+b1v5yZ5ksV7n5SXltyXKNeO0IeBDHNBHRVPFqHdgau
u2prz3aPvqEFENqgr7z5
=O9AH
-----END PGP SIGNATURE-----
Merge tag 'usb-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next
Felipe writes:
usb: patches for v3.11 merge window
All function drivers are now converted to our new configfs-based
binding. Eventually this will help us getting rid of in-kernel
gadget drivers and only keep function drivers in the kernel.
MUSB was taught that it needs to be built for host-only and
device-only modes too. We had this support long ago but it
involved a ridiculous amount of ifdefs. Now we have a much
cleaner approach.
Samsung Exynos4 platform now implements HSIC support.
We're introducing support for AB8540 and AB9540 PHYs.
MUSB module reinsertion now works as expected, before we were
getting -EBUSY being returned by the resource checks done on
driver core.
DWC3 now has minimum support for TI's AM437x series of SoCs.
OMAP5 USB3 PHY learned one extra DPLL configuration values because
that PHY is reused in TI's DRA7xx devices.
We're introducing support for Faraday fotg210 UDCs.
Last, but not least, the usual set of non-critical fixes and cleanups
ranging from usage of platform_{get,set}_drvdata to lock improvements.
Signed-of-by: Felipe Balbi <balbi@ti.com>
John W. Linville says:
====================
This pull request is intended for the 3.11 stream...
One big highlight is the cw1200 driver the ST-E CW1100 & CW1200
WLAN chipsets. This one has been lingering for a while, lacking
some review comments. Once started getting pulled into linux-next,
it got a bit more attention and a number of improvements were made
over the initial cut. No doubt there will be more changes ahead,
but I think it is looking alright at this point.
Along with that, there is the usual flurry of updates to the mac80211
core and the iwlwifi, mwifiex, ath9k, rt2x00, wil6210, and other
drivers. A few of the highlights are some rt2x00 refactoring/cleanup
by Gabor Juhos, some rt2800 hardware support enhancements by Stanislaw
Gruszka, some iwlwifi power management updates from Alexander Bondar,
some enhanced bcma SPROM support from Rafał Miłecki, and a variety
of other things here and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing about the sched_clock implementation in the ARM port is
specific to the architecture. Generalize the code so that other
architectures can use it by selecting GENERIC_SCHED_CLOCK.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[jstultz: Merge minor collisions with other patches in my tree]
Signed-off-by: John Stultz <john.stultz@linaro.org>
- use DECLARE_CLOCKSOURCE_OF and convert its users
- handle the sptimer not being present as sched_clock
- add optional handling of timer clocks
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABCAAGBQJRuGStAAoJEPOmecmc0R2BLWcH/iHYyHg0IimZFFngD5H50F+S
WO+R3xVUz9Y1/Hlsh9IQK9R2f/z2QUje1sQOVZhx4WYMAarUTAsX/oDs7TnPyoZ6
MaFJHKsAysXukiWWrrb1MaG8NGUqh4nt4YwCVsBHkG02i9dA3qy55+3BoMg3YiKP
afWyXbJ/9IP1q9U9/tIjYO7fHDUmHtKRIo1MTx7erCm3gBSMUw7fWvs0Gc0QqUAn
cfCnwilK8/BAnCTOi8RiTdUmQaI5CqGrNcBTw47E88HmoQVkp5W6BCY0brw4CfE8
XeqchZH6wEH89hRmf/0+zGuLT4qQ+jBL8bM88+KEbnsXREiIKFU/oZYygqskY60=
=geqn
-----END PGP SIGNATURE-----
Merge tag 'dw_apb_timer_of' of git://github.com/mmind/linux-rockchip into next/drivers
From Heiko Stuebner, enhancements for dw_apb_timer:
- use DECLARE_CLOCKSOURCE_OF and convert its users
- handle the sptimer not being present as sched_clock
- add optional handling of timer clocks
* tag 'dw_apb_timer_of' of git://github.com/mmind/linux-rockchip:
clocksource: dw_apb_timer_of: use clocksource_of_init
clocksource: dw_apb_timer_of: select DW_APB_TIMER
clocksource: dw_apb_timer_of: add clock-handling
clocksource: dw_apb_timer_of: enable the use the clocksource as sched clock
Certain use cases may require specific DRE settings so expose the
necessary registers.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
The driver programs a threshold of "coordinate_readouts" say 5. The
REG_FIFO0THR registers says it should it be programmed to "threshold
minus one". The driver does not expect just 5 coordinates but 5 * 2 + 2.
Multiplied by two because 5 for X and 5 for Y and plus 2 because we have
two Z.
The whole thing kind of works because It reads the 5 coordinates for X
and Y from FIFO0 and FIFO1 and the last element in each FIFO is ignored
within the loop and read later.
Nothing guaranties that FIFO1 is ready by the time it is read. In fact I
could see that that FIFO1 reaturns for Y channels 8,9, 10, 12, 6 and for
Y channel 7 for Z. The problem is that channel 7 and channel 12 got
somehow mixed up.
The other Problem is that FIFO1 is also used by the IIO part leading to
wrong results if both (tsc & adc) are used.
The patch tries to clean up the whole thing a little:
- Remove the +1 and -1 in REG_STEPCONFIG, REG_STEPDELAY and its counter
part in the for loop. This is just confusing.
- Use only FIFO0 in TSC. The fifo has space for 64 entries so should be
fine.
- Read the whole FIFO in one function and check the channel.
- in case we dawdle around, make sure we only read a multiple of our
coordinate set. On the second interrupt we will cleanup the remaining
enties.
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
The two header files removed here are unused and have no users as this
platform was never used with platform devices.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fix the mfd device in the case where a subdevice might not be activated.
Signed-off-by: Pantelis Antoniou <panto@antoniou-consulting.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>