Commit graph

871037 commits

Author SHA1 Message Date
Dan Robertson
fdc7d8e829 hwmon: (shtc1) fix shtc1 and shtw1 id mask
Fix an error in the bitmaskfor the shtc1 and shtw1 bitmask used to
retrieve the chip ID from the ID register. See section 5.7 of the shtw1
or shtc1 datasheet for details.

Fixes: 1a539d372e ("hwmon: add support for Sensirion SHTC1 sensor")
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Link: https://lore.kernel.org/r/20190905014554.21658-3-dan@dlrobertson.com
[groeck: Reordered to be first in series and adjusted accordingly]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-09-10 11:42:22 -07:00
Tejun Heo
7c1ee704a1 iocost_monitor: Report debt
Report debt and rename del_ms row to delay for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
b06f2d35c6 iocost_monitor: Report more info with higher accuracy
When outputting json:

* Don't truncate numbers.

* Report address of iocg to ease drilling down further.

When outputting table:

* Use math.ceil() for delay_ms so that small delays don't read as 0.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e742bd5cb5 iocost_monitor: Always use strings for json values
Json has limited accuracy for numbers and can silently truncate 64bit
values, which can be extremely confusing.  Let's consistently use
string encapsulated values for json output.

While at it, convert an unnecesary f-string to str().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e1518f63f2 blk-iocost: Don't let merges push vtime into the future
Merges have the same problem that forced-bios had which is fixed by
the previous patch.  The cost of a merge is calculated at the time of
issue and force-advances vtime into the future.  Until global vtime
catches up, how the cgroup's hweight changes in the meantime doesn't
matter and it often leads to situations where the cost is calculated
at one hweight and paid at a very different one.  See the previous
patch for more details.

Fix it by never advancing vtime into the future for merges.  If budget
is available, vtime is advanced.  Otherwise, the cost is charged as
debt.

This brings merge cost handling in line with issue cost handling in
ioc_rqos_throttle().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
36a524814f blk-iocost: Account force-charged overage in absolute vtime
Currently, when a bio needs to be force-charged and there isn't enough
budget, vtime is simply pushed into the future.  This means that the
cost of the whole bio is scaled using the current hweight and then
charged immediately.  Until the global vtime advances beyond this
future vtime, the cgroup won't be allowed to issue normal IOs.

This is incorrect and can lead to, for example, exploding vrate or
extended stalls if vrate range is constrained.  Consider the following
scenario.

1. A cgroup with a very low hweight runs out of budget.

2. A storm of swap-out happens on it.  All of them are scaled
   according to the current low hweight and charged to vtime pushing
   it to a far future.

3. All other cgroups go idle and now the above cgroup has access to
   the whole device.  However, because vtime is already wound using
   the past low hweight, what its current hweight is doesn't matter
   until global vtime catches up to the local vtime.

4. As a result, either vrate gets ramped up extremely or the IOs stall
   while the underlying device is idle.

This is because the hweight the overage is calculated at is different
from the hweight that it's being paid at.

Fix it by remembering the overage in absoulte vtime and continuously
paying with the actual budget according to the current hweight at each
period.

Note that non-forced bios which wait already remembers the cost in
absolute vtime.  This brings forced-bio accounting in line.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:31:39 -06:00
Tejun Heo
e036c4caba blk-iocost: Fix incorrect operation order during iocg free
ioc_pd_free() first cancels the hrtimers and then deactivates the
iocg.  However, the iocg timer can run inbetween and reschedule the
hrtimers which will end up running after the iocg is freed leading to
crashes like the following.

  general protection fault: 0000 [#1] SMP
  ...
  RIP: 0010:iocg_kick_delay+0xbe/0x1b0
  RSP: 0018:ffffc90003598ea0 EFLAGS: 00010046
  RAX: 1cee00fd69512b54 RBX: ffff8881bba48400 RCX: 00000000000003e8
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881bba48400
  RBP: 0000000000004e20 R08: 0000000000000002 R09: 00000000000003e8
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003598ef0
  R13: 00979f3810ad461f R14: ffff8881bba4b400 R15: 25439f950d26e1d1
  FS:  0000000000000000(0000) GS:ffff88885f800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f64328c7e40 CR3: 0000000002409005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <IRQ>
   iocg_delay_timer_fn+0x3d/0x60
   __hrtimer_run_queues+0xfe/0x270
   hrtimer_interrupt+0xf4/0x210
   smp_apic_timer_interrupt+0x5e/0x120
   apic_timer_interrupt+0xf/0x20
   </IRQ>

Fix it by canceling hrtimers after deactivating the iocg.

Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 12:17:04 -06:00
Xin Long
f794dc2304 sctp: fix the missing put_user when dumping transport thresholds
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump
a transport thresholds info.

Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds.

Fixes: 8add543e36 ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 18:32:28 +01:00
Cong Wang
d4d6ec6dac sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero
In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero,
it would make no progress inside the loop in hhf_dequeue() thus
kernel would get stuck.

Fix this by checking this corner case in hhf_change().

Fixes: 10239edf86 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com
Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com
Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Terry Lam <vtlam@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 18:31:00 +01:00
Cong Wang
8b142a00ed net_sched: check cops->tcf_block in tc_bind_tclass()
At least sch_red and sch_tbf don't implement ->tcf_block()
while still have a non-zero tc "class".

Instead of adding nop implementations to each of such qdisc's,
we can just relax the check of cops->tcf_block() in
tc_bind_tclass(). They don't support TC filter anyway.

Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 18:28:56 +01:00
Sean Christopherson
1edce0a9eb KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code
Move RDMSR and WRMSR emulation into common x86 code to consolidate
nearly identical SVM and VMX code.

Note, consolidating RDMSR introduces an extra indirect call, i.e.
retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a
guest kernel likely has bigger problems if increasing the latency of
RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM
performance.  E.g. the only recurring RDMSR VM-Exits (after booting) on
my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via
arch_cpu_idle_enter().

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:18:29 +02:00
Sean Christopherson
f20935d85a KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers
Refactor the top-level MSR accessors to take/return the index and value
directly instead of requiring the caller to dump them into a msr_data
struct.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:18:14 +02:00
Xiaoyao Li
b274a29081 doc: kvm: Fix return description of KVM_SET_MSRS
Userspace can use ioctl KVM_SET_MSRS to update a set of MSRs of guest.
This ioctl set specified MSRs one by one. If it fails to set an MSR,
e.g., due to setting reserved bits, the MSR is not supported/emulated by
KVM, etc..., it stops processing the MSR list and returns the number of
MSRs have been set successfully.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:53 +02:00
Peter Xu
4f75bcc332 KVM: X86: Tune PLE Window tracepoint
The PLE window tracepoint triggers even if the window is not changed,
and the wording can be a bit confusing too.  One example line:

  kvm_ple_window: vcpu 0: ple_window 4096 (shrink 4096)

It easily let people think of "the window now is 4096 which is
shrinked", but the truth is the value actually didn't change (4096).

Let's only dump this message if the value really changed, and we make
the message even simpler like:

  kvm_ple_window: vcpu 4 old 4096 new 8192 (growed)

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:21 +02:00
Peter Xu
c5c5d6fae0 KVM: VMX: Change ple_window type to unsigned int
The VMX ple_window is 32 bits wide, so logically it can overflow with
an int.  The module parameter is declared as unsigned int which is
good, however the dynamic variable is not.  Switching all the
ple_window references to use unsigned int.

The tracepoint changes will also affect SVM, but SVM is using an even
smaller width (16 bits) so it's always fine.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:20 +02:00
Peter Xu
13a7e370cb KVM: X86: Remove tailing newline for tracepoints
It's done by TP_printk() already.

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:19 +02:00
Peter Xu
d94fdcd7ea KVM: X86: Trace vcpu_id for vmexit
Tracing the ID helps to pair vmenters and vmexits for guests with
multiple vCPUs.

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 19:13:18 +02:00
Paolo Bonzini
32d1d15c52 KVM/arm updates for 5.4
- New ITS translation cache
 - Allow up to 512 CPUs to be supported with GICv3 (for real this time)
 - Now call kvm_arch_vcpu_blocking early in the blocking sequence
 - Tidy-up device mappings in S2 when DIC is available
 - Clean icache invalidation on VMID rollover
 - General cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl12QlAPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDXxUQAMd+GlOlmTXqEiuKudVApTkl6WIebfh0vkn6
 /1j8yNgJqRtZEY/YqE/XhAaqz1tx88VtzqSrNG4Pmrl9rDHMD9mDuk+w5UvEN2vy
 D5/nEe/wnzyVpuROBlHhsRbCRkT/6dNpnDnydwxCUqQPhfsAHnTNx6IygVzH9BHS
 D/1+KLI1imW8YziSSf6SGlIKJtk0eo5qo/aT6/mhb+e18Dobax3miItZL4mAqFPd
 tCV8fvOLb/phdSmOZuD/3XF9JOodk2ycvF9MW9Rp/FxDx9HULCXPv/3KnoHg9ca5
 QSGz1Chj0C2avaQJ4GbHZnZZjdvL2TmVxMpixocc/VZCqlO3ifRKf91t/rq4cElG
 HxLE9AX6kqW6UK66RHUQiHxjqRG8ynz8xEmlhwd7YhCLmtmJSXLTrmc2ABf64+BT
 RaexRa3h6D19fLBcMN5gpP8I48XaRpfxg6E/jCw5ZEr/8zhzLajFnE89ftgRR04f
 bSXOnj0kAhrBZ6jRTEata1MrFAt58wiaulxTxgMlnj1hHpqA3b+x6woRECAEVOlc
 6JJuzReJSBuCJL/rVtXGF31mXNnqUo+oTcDpQSle/fDtQ/44+xlYj6V/ZeFIRHAz
 nwUw9DHyZ/JMSwPNsqtdzCnLths1rNw34A7VgdVWiqiPYEcGGUnMzkRrXKMYjjJn
 LD4+Rh/e
 =0dD/
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.4

- New ITS translation cache
- Allow up to 512 CPUs to be supported with GICv3 (for real this time)
- Now call kvm_arch_vcpu_blocking early in the blocking sequence
- Tidy-up device mappings in S2 when DIC is available
- Clean icache invalidation on VMID rollover
- General cleanup
2019-09-10 19:09:14 +02:00
David S. Miller
074be7fd99 Merge branch 'nfp-implement-firmware-loading-policy'
Simon Horman says:

====================
nfp: implement firmware loading policy

Dirk says:

This series adds configuration capabilities to the firmware loading policy of
the NFP driver.

NFP firmware loading is controlled via three HWinfo keys which can be set per
device: 'abi_drv_reset', 'abi_drv_load_ifc' and 'app_fw_from_flash'.
Refer to patch #11 for more detail on how these control the firmware loading.

In order to configure the full extend of FW loading policy, a new devlink
parameter has been introduced, 'reset_dev_on_drv_probe', which controls if the
driver should reset the device when it's probed. This, in conjunction with the
existing 'fw_load_policy' (extended to include a 'disk' option) provides the
means to tweak the NFP HWinfo keys as required by users.

Patches 1 and 2 adds the devlink modifications and patches 3 through 9 adds the
support into the NFP driver. Furthermore, the last 2 patches are documentation
only.

v2:
  Renamed all 'reset_dev_on_drv_probe' defines the same as the devlink parameter
  name (Jiri)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
40a962beeb Documentation: nfp: add nfp driver specific notes
This adds the initial documentation for the NFP driver specific
documentation.

Right now, only basic information is provided about acquiring firmware
and configuring device firmware loading.

Original driver documentation can be found here:
https://github.com/Netronome/nfp-drv-kmods/blob/master/README.md

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
8fb822ce93 kdoc: fix nfp_fw_load documentation
Fixed the incorrect prefix for the 'nfp_fw_load' function.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
0fbee0ec1f nfp: devlink: add 'reset_dev_on_drv_probe' support
Add support for the 'reset_dev_on_drv_probe' devlink parameter. The
reset control policy is controlled by the 'abi_drv_reset' hwinfo key.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
ff04788c5b nfp: devlink: add 'fw_load_policy' support
Add support for the 'fw_load_policy' devlink parameter. The FW load
policy is controlled by the 'app_fw_from_flash' hwinfo key.

Remap the values from devlink to the hwinfo key and back.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
165c3c9f8c nfp: add devlink param infrastructure
Register devlink parameters for driver use. Subsequent patches will add
support for specific parameters.

In order to support devlink parameters, the management firmware needs to
be able to lookup and set hwinfo keys.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
f8921d7330 nfp: honor FW reset and loading policies
The firmware reset and loading policies can be controlled with the
combination of three hwinfo keys, 'abi_drv_reset', 'abi_drv_load_ifc'
and 'app_fw_from_flash'.

'app_fw_from_flash' defines which firmware should take precedence,
'Disk', 'Flash' or the 'Preferred' firmware. When 'Preferred'
is selected, the management firmware makes the decision on which
firmware will be loaded by comparing versions of the flash firmware
and the host supplied firmware.

'abi_drv_reset' defines when the driver should reset the firmware when
the driver is probed, either 'Disk' if firmware was found on disk,
'Always' reset or 'Never' reset. Note that the device is always reset
on driver unload if firmware was loaded when the driver was probed.

'abi_drv_load_ifc' defines a list of PF devices allowed to load FW on
the device.

Furthermore, we limit the cases to where the driver will unload firmware
again when the driver is removed to only when firmware was loaded by the
driver and only if this particular device was the only one that could
have loaded firmware. This is needed to avoid firmware being removed
while in use on multi-host platforms.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:27 +01:00
Dirk van der Merwe
e69e9db903 nfp: nsp: add support for hwinfo set operation
Add support for the NSP HWinfo set command. This closely follows the
HWinfo lookup command.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
74612cdaf5 nfp: nsp: add support for optional hwinfo lookup
There are cases where we want to read a hwinfo entry from the NFP, and
if it doesn't exist, use a default value instead.

To support this, we must silence warning/error messages when the hwinfo
entry doesn't exist since this is a valid use case. The NSP command
structure provides the ability to silence command errors, in which case
the caller should log any command errors appropriately. Protocol errors
are unaffected by this.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
1da16f0c84 nfp: nsp: add support for fw_loaded command
Add support for the simple command that indicates whether application
firmware is loaded.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
5bbd21df5a devlink: add 'reset_dev_on_drv_probe' param
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the
device reset policy on driver probe.

This parameter is useful in conjunction with the existing
'fw_load_policy' parameter.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
Dirk van der Merwe
e019a3b17f devlink: extend 'fw_load_policy' values
Add the 'disk' value to the generic 'fw_load_policy' devlink parameter.
This value indicates that firmware should always be loaded from disk
only.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 17:29:26 +01:00
David S. Miller
4bb2f84a2a Merge branch 'net-dsa-mv88e6xxx-add-PCL-support'
Vivien Didelot says:

====================
net: dsa: mv88e6xxx: add PCL support

This small series implements the ethtool RXNFC operations in the
mv88e6xxx DSA driver to configure a port's Layer 2 Policy Control List
(PCL) supported by models such as 88E6352 and 88E6390 and equivalent.

This allows to configure a port to discard frames based on a configured
destination or source MAC address and an optional VLAN, with e.g.:

    # ethtool --config-nfc lan1 flow-type ether src 00:11:22:33:44:55 action -1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Vivien Didelot
da7dc87553 net: dsa: mv88e6xxx: add RXNFC support
Implement the .get_rxnfc and .set_rxnfc DSA operations to configure
a port's Layer 2 Policy Control List (PCL) via ethtool.

Currently only dropping frames based on MAC Destination or Source
Address (including the option VLAN parameter) is supported.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Vivien Didelot
f3a2cd326e net: dsa: mv88e6xxx: introduce .port_set_policy
Introduce a new .port_set_policy operation to configure a port's
Policy Control List, based on mapping such as DA, SA, Etype and so on.

Models similar to 88E6352 and 88E6390 are supported at the moment.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Vivien Didelot
d8291a956a net: dsa: mv88e6xxx: complete ATU state definitions
Marvell has different values for the state of a MAC address,
depending on its multicast bit. This patch completes the definitions
for these states.

At the same time, use 0 which is intuitive enough and simplifies the
code a bit, instead of the UC or MC unused value.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10 16:53:31 +01:00
Jens Axboe
54a91f3bb9 io_uring: limit parallelism of buffered writes
All the popular filesystems need to grab the inode lock for buffered
writes. With io_uring punting buffered writes to async context, we
observe a lot of contention with all workers hamming this mutex.

For buffered writes, we generally don't need a lot of parallelism on
the submission side, as the flushing will take care of that for us.
Hence we don't need a deep queue on the write side, as long as we
can safely punt from the original submission context.

Add a workqueue with a limit of 2 that we can use for buffered writes.
This greatly improves the performance and efficiency of higher queue
depth buffered async writes with io_uring.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 09:49:35 -06:00
Jens Axboe
18d9be1a97 io_uring: add io_queue_async_work() helper
Add a helper for queueing a request for async execution, in preparation
for optimizing it.

No functional change in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 09:13:05 -06:00
Eric W. Biederman
821cc7b0b2
waitid: Add support for waiting for the current process group
It was recently discovered that the linux version of waitid is not a
superset of the other wait functions because it does not include support
for waiting for the current process group. This has two downsides:
1. An extra system call is needed to get the current process group.
2. After the current process group is received and before it is passed
   to waitid a signal could arrive causing the current process group to change.
   Inherent race-conditions as these make it impossible for userspace to
   emulate this functionaly and thus violate async-signal safety
   requirements for waitpid.

Arguments can be made for using a different choice of idtype and id
for this case but the BSDs already use this P_PGID and 0 to indicate
waiting for the current process's process group.  So be nice to user
space programmers and don't introduce an unnecessary incompatibility.

Some people have noted that the posix description is that
waitpid will wait for the current process group, and that in
the presence of pthreads that process group can change.  To get
clarity on this issue I looked at XNU, FreeBSD, and Luminos.  All of
those flavors of unix waited for the current process group at the
time of call and as written could not adapt to the process group
changing after the call.

At one point Linux did adapt to the current process group changing but
that stopped in 161550d74c ("pid: sys_wait... fixes").  It has been
over 11 years since Linux has that behavior, no programs that fail
with the change in behavior have been reported, and I could not
find any other unix that does this.  So I think it is safe to clarify
the definition of current process group, to current process group
at the time of the wait function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Zong Li <zongbox@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: GNU C Library <libc-alpha@sourceware.org>
Link: https://lore.kernel.org/r/20190814154400.6371-2-christian.brauner@ubuntu.com
2019-09-10 17:05:46 +02:00
Paolo Bonzini
8146856b0a PPC KVM update for 5.4
- Some prep for extending the uses of the rmap array
 - Various minor fixes
 - Commits from the powerpc topic/ppc-kvm branch, which fix a problem
   with interrupts arriving after free_irq, causing host hangs and crashes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJdZwd7AAoJEJ2a6ncsY3GffDQH/2q+c2z56ZO2lzfk4Hy9piWn
 Z9PR9n72Z6TiMyVCl7CtLCyI+lRy3QVZnol14ugQNX4aFJiiwDGRHJF0wNxjeok4
 4DAIqBc60qD2dkp1LwtUM1YsLsr/n3tdrGU1b0VrHGoGTVhJDpbjhJsblXZ1ujGr
 KxQ1Uf4XsW5T7kovHuzj+FFlbB5nbEX5cBIU68maBGZSCl355wCOW35rKVITTIIv
 +VKkO2aNbk6bRmZmOi2v1D65eQa2+TKe/o48TneJv1WhL4h4hDyHdmVeWRNoAI6C
 ve8mwCAVs7IITjCJ1qcGnI8NzVxMlXgwVir7sQ1aslRLZfeRAm5FOIPNEz1ADXs=
 =3oLd
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 5.4

- Some prep for extending the uses of the rmap array
- Various minor fixes
- Commits from the powerpc topic/ppc-kvm branch, which fix a problem
  with interrupts arriving after free_irq, causing host hangs and crashes.
2019-09-10 16:51:17 +02:00
Masahiro Yamada
a0469f989f export.h: remove defined(__KERNEL__), which is no longer needed
The conditional, define(__KERNEL__), was added by commit f235541699
("export.h: allow for per-symbol configurable EXPORT_SYMBOL()").

It was needed at that time to avoid the build error of modpost
with CONFIG_TRIM_UNUSED_KSYMS=y.

Since commit b2c5cdcfd4 ("modpost: remove symbol prefix support"),
modpost no longer includes linux/export.h, thus the define(__KERNEL__)
is unneeded.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
2019-09-10 23:51:05 +09:00
Sean Christopherson
16cfacc808 KVM: x86: Manually calculate reserved bits when loading PDPTRS
Manually generate the PDPTR reserved bit mask when explicitly loading
PDPTRs.  The reserved bits that are being tracked by the MMU reflect the
current paging mode, which is unlikely to be PAE paging in the vast
majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation,
__set_sregs(), etc...  This can cause KVM to incorrectly signal a bad
PDPTR, or more likely, miss a reserved bit check and subsequently fail
a VM-Enter due to a bad VMCS.GUEST_PDPTR.

Add a one off helper to generate the reserved bits instead of sharing
code across the MMU's calculations and the PDPTR emulation.  The PDPTR
reserved bits are basically set in stone, and pushing a helper into
the MMU's calculation adds unnecessary complexity without improving
readability.

Oppurtunistically fix/update the comment for load_pdptrs().

Note, the buggy commit also introduced a deliberate functional change,
"Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was
effectively (and correctly) reverted by commit cd9ae5fe47 ("KVM: x86:
Fix page-tables reserved bits").  A bit of SDM archaeology shows that
the SDM from late 2008 had a bug (likely a copy+paste error) where it
listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved
for 2mb entries.  I.e. the SDM contradicted itself, and bits 6:5 are and
always have been reserved.

Fixes: 20c466b561 ("KVM: Use rsvd_bits_mask in load_pdptrs()")
Cc: stable@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Reported-by: Doug Reiland <doug.reiland@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 16:41:50 +02:00
Alexander Graf
fdcf756213 KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes
We can easily route hardware interrupts directly into VM context when
they target the "Fixed" or "LowPriority" delivery modes.

However, on modes such as "SMI" or "Init", we need to go via KVM code
to actually put the vCPU into a different mode of operation, so we can
not post the interrupt

Add code in the VMX and SVM PI logic to explicitly refuse to establish
posted mappings for advanced IRQ deliver modes. This reflects the logic
in __apic_accept_irq() which also only ever passes Fixed and LowPriority
interrupts as posted interrupts into the guest.

This fixes a bug I have with code which configures real hardware to
inject virtual SMIs into my guest.

Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10 16:39:34 +02:00
Miklos Szeredi
05ea48cc2b fuse: stop copying pages to fuse_req
The page array pointers are also duplicated across fuse_args_pages and
fuse_req.  Get rid of the fuse_req ones.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi
d49937749f fuse: stop copying args to fuse_req
No need to duplicate the argument arrays in fuse_req, so just dereference
req->args instead of copying to the fuse_req internal ones.

This allows further cleanup of the fuse_req structure.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi
145b673bd2 fuse: clean up fuse_req
Get rid of request specific fields in fuse_req that are not used anymore.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi
7213394c4e fuse: simplify request allocation
Page arrays are not allocated together with the request anymore.  Get rid
of the dead code

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi
66abc3599c fuse: unexport request ops
All requests are now sent with one of the fuse_simple_... helpers.  Get rid
of the old api from the fuse internal header.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi
75b399dda5 fuse: convert retrieve to simple api
Rename fuse_request_send_notify_reply() to fuse_simple_notify_reply() and
convert to passing fuse_args instead of fuse_req.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi
4cb548666e fuse: convert release to simple api
Since we cannot reserve the request structure up-front, make sure that the
request allocation doesn't fail using __GFP_NOFAIL.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi
b50ef7c52a cuse: convert init to simple api
This is a straightforward conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:49 +02:00
Miklos Szeredi
615047eff1 fuse: convert init to simple api
Bypass the fc->initialized check by setting the force flag.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:49 +02:00