When a filesystem outside of init_user_ns is mounted it could have
uids and gids stored in it that do not map to init_user_ns.
The plan is to allow those filesystems to set i_uid to INVALID_UID and
i_gid to INVALID_GID for unmapped uids and gids and then to handle
that strange case in the vfs to ensure there is consistent robust
handling of the weirdness.
Upon a careful review of the vfs and filesystems about the only case
where there is any possibility of confusion or trouble is when the
inode is written back to disk. In that case filesystems typically
read the inode->i_uid and inode->i_gid and write them to disk even
when just an inode timestamp is being updated.
Which leads to a rule that is very simple to implement and understand
inodes whose i_uid or i_gid is not valid may not be written.
In dealing with access times this means treat those inodes as if the
inode flag S_NOATIME was set. Reads of the inodes appear safe and
useful, but any write or modification is disallowed. The only inode
write that is allowed is a chown that sets the uid and gid on the
inode to valid values. After such a chown the inode is normal and may
be treated as such.
Denying all writes to inodes with uids or gids unknown to the vfs also
prevents several oddball cases where corruption would have occurred
because the vfs does not have complete information.
One problem case that is prevented is attempting to use the gid of a
directory for new inodes where the directories sgid bit is set but the
directories gid is not mapped.
Another problem case avoided is attempting to update the evm hash
after setxattr, removexattr, and setattr. As the evm hash includeds
the inode->i_uid or inode->i_gid not knowning the uid or gid prevents
a correct evm hash from being computed. evm hash verification also
fails when i_uid or i_gid is unknown but that is essentially harmless
as it does not cause filesystem corruption.
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
KAS: keep-alive support and granularity of kato in units of 100 ms
nvme_admin_keep_alive opcode: 0x18
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The NVMe over Fabrics specification defines a protocol interface and
related extensions to NVMe that enable operation over network protocols.
The NVMe over Fabrics specification has an NVMe Transport binding for
each NVMe Transport.
This patch adds the fabrics related definitions:
- fabric specific command set and error codes
- transport addressing and binding definitions
- fabrics sgl extensions
- controller identification fabrics enhancements
- discovery log page definition
Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
For some protocols like NVMe over Fabrics we need to be able to send
initialization commands to a specific queue.
Based on an earlier patch from Christoph Hellwig <hch@lst.de>.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
[hch: disallow sleeping allocation, req_op fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
L2 upper device needs to propagate neigh_construct/destroy calls down to
lower devices. Do this by defining default ndo functions and use them in
team, bond, bridge and vlan.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the following patch will allow upper devices to follow the call down
lower devices, we need to add dev here and not rely on n->dev.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement etrhtool set_rxnfc callback to support ethtool flow spec
direct steering. This patch adds only the support of ether flow type
spec. L3/L4 flow specs support will be added in downstream patches.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having all steering private name spaces and
steering module fields flat in mlx5_core_priv, we wrap
them in mlx5_flow_steering for better modularity and
API exposure.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the set of arguments passed to mlx5_add_flow_rule
by introducing flow_spec structure.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Rework of SCM driver
* Add file patterns for Qualcomm Maintainers entry
* Add worker for wcnss_ctrl signaling
* Fixes for smp2p
* Update smem_state properties to match documentation
* Add SCM Peripheral Authentication service
* Expose SCM PAS command 10 as a reset controller
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXcaVxAAoJEFKiBbHx2RXVoxwQAMPI+3iN8M60JJ0SJvhP/VQA
AlPUcuaS2j50K16srmLif4aG8JYReRI4SNdy0J1kaRRi9OAEHiyQ5yVVlDA+FTTT
RErBxVzRUYyeCzrBpP1CL4khT+691QVx0dMcdlhnNRUttP3Cd42FM0J4GExWuDJO
UuOVn1UV0PmgSNfmYzyk87LXOA1di7ZGht02Vnjnh+kcoVjUy4Ty/yLHzxGPYcaF
RIj6lgYREtU7TN63MNkbYyx2FF+8WS9l5Q6U9E4Z/3nwRzhzhtPimWGtz1EHmxfy
a8YXmkQYM3RvPa8Cigiyg5ubkLlPC7w2rkNExKUyC/Y/jbvE4l/XJNleCEiGZmMT
f48F+7iAGgOwITL0+Lqj4VbKlUihwW28+OjcS/M5fTpDR6k0dmU0zHMhJKR9pYVU
Cr7ucYcPvzF2M6d4Frre9tZski4DZkpke81xjf6HQ00gI2CaDV/lieXtFjikL/Cl
ktdwQAo2+ydJ+7Ps5Qas2A+d66REe90vx2RCZWFRs9Nxe/CfSSz01LM6EeyONBKf
XfQvbKc6V/ZVHkWkA/EILlfRFCho6KwpEAHpZpFt1NiplJXfUcDcQyy/CuIKLR4H
c/UfKeGZCK+VyezbMWAKDb7ig3KywHD8yRieQpCyysVB/97oulJ/bHEs1KT5Bl5I
uDjnR7KqyXJEOGANr5S2
=WprH
-----END PGP SIGNATURE-----
Merge tag 'qcom-drivers-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into next/drivers
Qualcomm ARM Based Driver Updates for v4.8
* Rework of SCM driver
* Add file patterns for Qualcomm Maintainers entry
* Add worker for wcnss_ctrl signaling
* Fixes for smp2p
* Update smem_state properties to match documentation
* Add SCM Peripheral Authentication service
* Expose SCM PAS command 10 as a reset controller
* tag 'qcom-drivers-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
firmware: qcom: scm: Expose PAS command 10 as reset-controller
firmware: qcom: scm: Peripheral Authentication Service
soc: qcom: Update properties for smem state referencing
soc: qcom: smp2p: Drop io-accessors
soc: qcom: smp2p: Correct addressing of outgoing value
soc: qcom: wcnss_ctrl: Make wcnss_ctrl parent the other components
firmware: qcom: scm: Add support for ARM64 SoCs
firmware: qcom: scm: Convert to streaming DMA APIS
firmware: qcom: scm: Generalize shared error map
firmware: qcom: scm: Use atomic SCM for cold boot
firmware: qcom: scm: Convert SCM to platform driver
MAINTAINERS: Add file patterns for qcom device tree bindings
Signed-off-by: Olof Johansson <olof@lixom.net>
1. Adds support for device power state management using generic power
domains and runtime PM
2. Other minor/miscellaneous fixes to the driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJXaQqwAAoJEABBurwxfuKYcL4QANeUJ+5igWpK97M64GQ44fR1
msDxt44ylTY1CiAeHnDDrC0tFSUprOmigSCxx6ve8USHX6cuSmkBatt1eltVdAo5
7/BZUGIAN8RzLpcHkNgYRu3DjOnEchq/vjSWAmfcF6HqLpInNtlaQPPh4ZC5UCSj
AJljaeBPLXkTXzN1w6sLsqBghyTwEI9aj5EQkzdpRDLlKQe7TtVbxfIbZpj4a/JH
gwvNsyfRbEBfQ46bhA4XNgkACVjAEHevCzJlbjfCQ/a2KS45Hr9D2PhieQ8uDRsP
J19dGdwuUjsVmh1esWY/7jrm5SUj4dpmrbcyQBTO2G9PJuxmuwr++LOkuBprjSQe
PFjWQ6JsL+jjVQGXNS6OWDkHPhwOOlvYYGEdOI5nqFRRcffiaFdXiowOmC8OoOxf
QTNVXHGuMQGT+bL09b79OXptDuixKoroXQcvkLe4w/hJypHVRUfB/6UMn+6mlkF8
rx2Wr62ZmrVHYLVAJFNm20+hYmCOuYeOMvhXtbwBThynWsR14nsCypE/mxmRObIf
QBlVvi8gsRnqjM8j1BTRNCEi5U5dIOy15Mca5YgUSLgtzow7TD2z4e8+MUBKEm1k
68JCS20eZ4iYG+LQsCX44Lcvj35tJxcQYmQ40wm4rTgGDM7hD5w0WunTuYdoHWHw
/U17mAATlz/xDqHyGYKH
=V2AG
-----END PGP SIGNATURE-----
Merge tag 'scpi-updates-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into next/drivers
SCPI updates and fixes for v4.8
1. Adds support for device power state management using generic power
domains and runtime PM
2. Other minor/miscellaneous fixes to the driver
* tag 'scpi-updates-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: scpi: add device power domain support using genpd
Documentation: add DT bindings for ARM SCPI power domains
firmware: arm_scpi: add support for device power state management
firmware: arm_scpi: make it depend on MAILBOX instead of ARM_MHU
firmware: arm_scpi: mark scpi_get_sensor_value as static
firmware: arm_scpi: remove dvfs_get packed structure
Signed-off-by: Olof Johansson <olof@lixom.net>
Add functions that iterate over lower devices and find port device.
As a dependency add netdev_for_each_all_lower_dev and
netdev_for_each_all_lower_dev_rcu macro with
netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.
Also, add functions to return mlxsw struct according to lower device
found and mlxsw_port struct with a reference to lower device.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial use for this is for PHYs that have a mode related to USB OTG.
There are several SoCs (e.g. TI OMAP and DA8xx) that have a mode setting
in the USB PHY to override OTG VBUS and ID signals.
Of course, the enum can be expaned in the future to include modes for
other types of PHYs as well.
Suggested-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: David Lechner <david@lechnology.com>
netfilter uses multiple FWINV #defines with identical form that hide a
specific structure variable and dereference it with a invflags member.
$ git grep "#define FWINV"
include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))
Consolidate these macros into a single NF_INVF macro.
Miscellanea:
o Neaten the alignment around these uses
o A few lines are > 80 columns for intelligibility
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Leonard Crestez observed the following phenomenon: when using
hard interrupt triggers (the DRDY line coming out of an ST
sensor) sometimes a new value would arrive while reading the
previous value, due to latencies in the system.
We discovered that the ST hardware as far as can be observed
is designed for level interrupts: the DRDY line will be held
asserted as long as there are new values coming. The interrupt
handler should be re-entered until we're out of values to
handle from the sensor.
If interrupts were handled as occurring on the edges (usually
low-to-high) new values could appear and the line be held
asserted after that, and these values would be missed, the
interrupt handler would also lock up as new data was
available, but as no new edges occurs on the DRDY signal,
nothing happens: the edge detector only detects edges.
To counter this, do the following:
- Accept interrupt lines to be flagged as level interrupts
using IRQF_TRIGGER_HIGH and IRQF_TRIGGER_LOW. If the line
is marked like this (in the device tree node or ACPI
table or similar) it will be utilized as a level IRQ.
We mark the line with IRQF_ONESHOT and mask the IRQ
while processing a sample, then the top half will be
entered again if new values are available.
- If we are flagged as using edge interrupts with
IRQF_TRIGGER_RISING or IRQF_TRIGGER_FALLING: remove
IRQF_ONESHOT so that the interrupt line is not
masked while running the thread part of the interrupt.
This way we will never miss an interrupt, then introduce
a loop that polls the data ready registers repeatedly
until no new samples are available, then exit the
interrupt handler. This way we know no new values are
available when the interrupt handler exits and
new (edge) interrupts will be triggered when data arrives.
Take some extra care to update the timestamp in the poll
loop if this happens. The timestamp will not be 100%
perfect, but it will at least be closer to the actual
events. Usually the extra poll loop will handle the new
samples, but once in a blue moon, we get a new IRQ
while exiting the loop, before returning from the
thread IRQ bottom half with IRQ_HANDLED. On these rare
occasions, the removal of IRQF_ONESHOT means the
interrupt will immediately fire again.
- If no interrupt type is indicated from the DT/ACPI,
choose IRQF_TRIGGER_RISING as default, as this is necessary
for legacy boards.
Tested successfully on the LIS331DL and L3G4200D by setting
sampling frequency to 400Hz/800Hz and stressing the system:
extra reads in the threaded interrupt handler occurs.
Cc: Giuseppe Barba <giuseppe.barba@st.com>
Cc: Denis Ciocca <denis.ciocca@st.com>
Tested-by: Crestez Dan Leonard <cdleonard@gmail.com>
Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
To allow creating more than one netdev over the same PCI function, we
change the driver such that global NIC resources are created once and
later be shared amongst all the mlx5e netdevs running over that port.
Move the CQ UAR, PD (pdn), Transport Domain (tdn), MKey resources from
being kept in the mlx5e priv part to a new resources structure
(mlx5e_resources) placed under the mlx5_core device.
This patch doesn't add any new functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new namespace (MLX5_FLOW_NAMESPACE_OFFLOADS) to be populated
with flow steering rules that deal with rules that have have to
be executed before the EN NIC steering rules are matched.
The namespace is located after the bypass name-space and before the
kernel name-space. Therefore, it precedes the HW processing done for
rules set for the kernel NIC name-space.
Under SRIOV, it would allow us to match on e-switch missed packet
and forward them to the relevant VF representor TIR.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXcHi9AAoJEHm+PkMAQRiGSJ0H/2o4t9VWYmhyPC1sdIHoCExJ
P4tBrcZYBmKcsOmIfnJDa5g/+IdhouEUM0v0fHPogS2UUWT9eRuJWYD3sY+HpEQ+
heKTli8X73gsFB25odeIbIt0jAoSiiMYWDrWqLNsuUV1tjEYVA8rH0SM94FiOC/5
7WVWXLTuH+Rm7JHP18BnKxmMMbzrTFmwisLMqFKyfZRRSlS+/ix7iLUNO9AFa39B
YHxNPihLrZ0oONyCOAQoHTIXXrw0cQbxV2utg3vnMcCZdme2xOn+iXMntTSKfZ39
iC9/T0vsO3R6OrRo2aDZAnCPUAniXnMEIhrKG37WMyXpj6cucZ/2QiNXcXviGV4=
=iLte
-----END PGP SIGNATURE-----
Back-merge tag 'v4.7-rc5' into drm-next
Linux 4.7-rc5
The fsl-dcu pull needs -rc3 so go to -rc5 for now.
adding suspend and resume funtionality for extcon-adc-jack
driver to configure system wake up for extcon events,
also adding support to enable/disable system wakeup
through flag wakeup_source based on platform requirement.
Signed-off-by: Venkat Reddy Talla <vreddytalla@nvidia.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
On Freescale i.MX7D platform, all clocks operations, including
enable/disable, rate change and re-parent, requires its parent
clock enable. Current clock core can not support it well.
This patch introduce a new flag CLK_OPS_PARENT_ENABLE to handle this
special case in clock core that enable its parent clock firstly for
each operation and disable it later after operation complete.
The patch part 1 fixes the possible disabling clocks while its parent
is off during kernel booting phase in clk_disable_unused_subtree().
Before the completion of kernel booting, clock tree is still not built
completely, there may be a case that the child clock is on but its
parent is off which could be caused by either HW initial reset state
or bootloader initialization.
Taking bootloader as an example, we may enable all clocks in HW by default.
And during kernel booting time, the parent clock could be disabled in its
driver probe due to calling clk_prepare_enable and clk_disable_unprepare.
Because it's child clock is only enabled in HW while its SW usecount
in clock tree is still 0, so clk_disable of parent clock will gate
the parent clock in both HW and SW usecount ultimately. Then there will
be a child clock is still on in HW but its parent is already off.
Later in clk_disable_unused(), this clock disable accessing while its
parent off will cause system hang due to the limitation of HW which
must require its parent on.
This patch simply enables the parent clock first before disabling
if flag CLK_OPS_PARENT_ENABLE is set in clk_disable_unused_subtree().
This is a simple solution and only affects booting time.
After kernel booting up the clock tree is already created, there will
be no case that child is off but its parent is off.
So no need do this checking for normal clk_disable() later.
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
power_supply_get_property() should ideally return -EAGAIN if it is
called while the power_supply is being registered. There was no way
previously to determine if use_cnt == 0 meant that the power_supply
wasn't fully registered yet, or if it had already been unregistered.
Add a new boolean to the power_supply struct to simply show if
registration is completed. Lastly, modify the check in
power_supply_show_property() to also ignore -EAGAIN when so it
doesn't complain about not returning the property.
Signed-off-by: Rhyland Klein <rklein@nvidia.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Add a helper function to get a cgroup2 from a fd. It will be
stored in a bpf array (BPF_MAP_TYPE_CGROUP_ARRAY) which will
be introduced in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to commit 9b368814b3 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.
The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.
But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.
Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.
Reported-by: Eric Leblond <eric@regit.org>
Tested-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since bpf_prog_get() and program type check is used in a couple of places,
refactor this into a small helper function that we can make use of. Since
the non RO prog->aux part is not used in performance critical paths and a
program destruction via RCU is rather very unlikley when doing the put, we
shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
check, but actually not taking the ref at all (due to being in fdget() /
fdput() section of the bpf fd) is even cleaner and makes the diff smaller
as well, so just go for that. Callsites are changed to make use of the new
helper where possible.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:
- Set up a process with threads T1, T2 and T3
- Let T1 set up a socket filter F1 that invokes another filter F2
through a BPF map [tail call]
- Let T1 trigger the socket filter via a unix domain socket write,
don't wait for completion
- Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
- Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
- Let T3 close the file descriptor for F2, dropping the reference
count of F2 to 2
- At this point, T1 should have looked up F2 from the map, but not
finished executing it
- Let T3 remove F2 from the BPF map, dropping the reference count of
F2 to 1
- Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
via schedule_work()
- At this point, the BPF program could be freed
- BPF execution is still running in a freed BPF program
While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().
That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb5607035 ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
their destruction) with call_rcu() look good to me.
Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are a number of small USB and PHY driver fixes for 4.7-rc6.
Nothing major here, all are described in the shortlog below. All have
been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iFYEABECABYFAld2ljcPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspN+sAn28Z
+GbDzyY5XwZ7Qs5/5uPYoGONAKCObDpgZr1A+/EMvDd57gfHWmFqTw==
=uYl3
-----END PGP SIGNATURE-----
Merge tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and PHY fixes from Greg KH:
"Here are a number of small USB and PHY driver fixes for 4.7-rc6.
Nothing major here, all are described in the shortlog below. All have
been in linux-next with no reported issues"
* tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: don't free bandwidth_mutex too early
USB: EHCI: declare hostpc register as zero-length array
phy-sun4i-usb: Fix irq free conditions to match request conditions
phy: bcm-ns-usb2: checking the wrong variable
phy-sun4i-usb: fix missing __iomem *
phy: phy-sun4i-usb: Fix optional gpios failing probe
phy: rockchip-dp: fix return value check in rockchip_dp_phy_probe()
phy: rcar-gen3-usb2: fix unexpected repeat interrupts of VBUS change
usb: common: otg-fsm: add license to usb-otg-fsm
Every implementation of RSA that we have naturally generates
output with leading zeroes. The one and only user of RSA,
pkcs1pad wants to have those leading zeroes in place, in fact
because they are currently absent it has to write those zeroes
itself.
So we shouldn't be stripping leading zeroes in the first place.
In fact this patch makes rsa-generic produce output with fixed
length so that pkcs1pad does not need to do any extra work.
This patch also changes DH to use the new interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are code duplications of a masked ethernet address comparison here
so make it a separate function instead.
Miscellanea:
o Neaten alignment of FWINV macro uses to make it clearer for the reader
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get fw response.
Add delayed callback timeout work before posting the command to fw.
In case of real fw command completion we will cancel the delayed work.
In case of fw command timeout the callback timeout handler will be
called and it will simulate fw completion with timeout error.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.
This patch tries to address this by:
- switch from sk_receive_queue to a skb_array, and resize it when
tx_queue_len was changed.
- introduce a new proto_ops peek_len which was used for peeking the
skb length.
- implement a tun version of peek_len for vhost_net to use and convert
vhost_net to use peek_len if possible.
Pktgen test shows about 15.3% improvement on guest receiving pps for small
buffers:
Before: ~1300000pps
After : ~1500000pps
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
will be triggered when tx_queue_len. It could be used by net device
who want to do some processing at that time. An example is tun who may
want to resize tx array when tx_queue_len is changed.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes, we need support resizing multiple queues at once. This is
because it was not easy to recover to recover from a partial failure
of multiple queues resizing.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes, we need zero length ring. But current code will crash since
we don't do any check before accessing the ring. This patch fixes this.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the functions from context_tracking.h directly.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>