-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKEBFkUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vx3Qw/+LrVFKL6YvTJKgQf3dKiTkTXwoOiD
AWnLUEB3dSh1RSJh42o5NDFerqWR4uKIJReOZ1SFWC/AGBorxsYtmmKbTrs6CZ5H
x1jixWxI773XWiye6c8GlPwsdqhw6Zm6yjVGNuh9NX6ma0Mfw2wHWGwu0nrlTNeO
VEh6US2McrnhFlqVmIEGR6op14JP/10haLPW1uy4a3mDjGltpprXiFjLug/4amrU
rEl7H5bNl0GVDWl/WiCBB+ouS4QgK+krWENH63YyDlXdSPQwEjhNTiff/NDM/Hzc
MW7sJgcOxS6mU6TbKM+oQ3aoewph8TvC3wN56VXncfwALVpkp1qlDBeNYkapecBO
qO4nTPyL0nwHvLon62D+p2QFjlBq1FJM0jrCB4vXV4PwPux9TZG1UjaiMXrIuJMb
3gXYtxiiSZjWat+qhnRsNiyNu5Dtvcctd9YIUtx2LLvEgAB4CeC66mBIt6qh5QZv
j39ITKJGmDPUmGOnEbtvS6knmWOUQYGFgiUjSgX1ODQ9LYR9xJb3TheTkv257a2l
t3kgHzXZO5cebqAu444N1f6NeRu6rEU9OiR9HQBfEDfp9+cgDjDltHp2KlmS968o
xY2pn+neC+w4gMg3CjIdWu5x4yfZ0OFeCzXtWewqNX18pljwvKGjKCRUF5vosYdd
Q3KJu8/CuGkV9MU=
=5giZ
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Avoid putting Elo i2 PCIe Ports in D3cold because downstream devices
are inaccessible after going back to D0 (Rafael J. Wysocki)
- Qualcomm SM8250 has a ddrss_sf_tbu clock but SC8180X does not; make a
SC8180X-specific config without the clock so it probes correctly
(Bjorn Andersson)
- Revert aardvark chained IRQ handler rewrite because it broke
interrupt affinity (Pali Rohár)
* tag 'pci-v5.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler"
PCI: qcom: Remove ddrss_sf_tbu clock from SC8180X
PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold
Fix up a recent change in the int340x thermal driver that inadvertently
broke thermal zone handling on some systems (Srinivas Pandruvada).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKDzeASHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxnJ8P/ipAdgNInjghu35HdHZYD6SlXxlePoHJ
TyRN6hEh6Z725SSAhbbIDZKi8bXInqNKDK3N8IXibpCjOq39ZkQngm96mMDeV0Qg
526MFCdtXIRExNH+9/ZQfJVzXbowewH4j7UybIcXSfDaS67n+k6zGYVxgc6O/4kO
rdlIwIcwlSFiUYkW3I+ECywzqDEN/Shriu1kyIrsQ+178WOZfTsn7xG1ADHcKFFL
eX0gKXGEmzWZwENQxGA4/xpHV1wRPHaqgBeFBkQdML56a6r4mBtOdSZ55k/I21RG
qayesFIsKsIn4h4nKwfg2AA/cUo+sNqPOz/S7pOsOUgFDBLd1EQ51NXf+/IS+pu4
iAwhPUJ2jizJAk+7CaU1+Q8MCGJFb1hjfpStLYSuV7F5G0nhQPJ2ebeJPL5cEcOD
aBqhfcDoynsa3gyA9LrRhYxH9qyRkrqRSmoUWyPPTvQfLasTvY4NQu7AwvyWU7oa
8snNOimP5Kt9FuVmg5aaZFYZb4gtwIE0r9qtR+VFBnwicgt99/NXeX3PnpZ/DEnD
MtgW5Hwov0rRxA/T1DDp42NHwZH3Q7f0iJTJoaUgjS9zuT/zjWvgcMeFYIjwSX6P
F4SPh05wLMUFJyI+a4tC+AvqOP9YZ4d0HKOAWAUvbN5nF0Gis1f6hUkq+2Eeg3wK
UtzCGYvit6/B
=41e0
-----END PGP SIGNATURE-----
Merge tag 'thermal-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix up a recent change in the int340x thermal driver that
inadvertently broke thermal zone handling on some systems
(Srinivas Pandruvada)"
* tag 'thermal-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Mode setting with new OS handshake
Implement compat_setup_rt_frame for sigcontext save & restore. The
main process is the same with signal, but the rv32 pt_regs' size
is different from rv64's, so we needs convert them.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220405071314.3225832-19-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The code attempts to free the 'new' pointer using kmem_cache_free(),
which is wrong because this function isn't responsible of freeing it.
Instead, the function should free new->htable and clear the contents of
*new (to prevent double-free).
Cc: stable@vger.kernel.org
Fixes: c7c556f1e8 ("selinux: refactor changing booleans")
Reported-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
clock bindings.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAmKD9AQQHGhlaWtvQHNu
dGVjaC5kZQAKCRDzpnnJnNEdgY0aB/94ZFYX6dtuQkxs0XP7zVm5RMMQRAVTVFOl
VsaDsEJ3ZzOJrGvb/buUXrd7+iuhFTfnyVHIqbdritbMgu+RUn0XFkIVBYuo+u+J
WxsZUIFM2Clk/+wX7lzJgUWIAeyFLkw0g3NOlN0tYU5LJd+7C7OtUALyloTwNBND
XVt/GBuIzxcQSOXXkYn6U7yCWpzrGf9wEy0P8tItLJVt/1HkZBomETsPbMoY9i6b
P/jiT8whkLguFFYqujAXDHrqjMNUQg6oXSXoja+ul3X7cPCLDPLincojebrw3qc7
tkpYVJVdE1CuwPv4KKPIpG8WFvOVwbETMvlkzTG9oUWoxfVyZfv6
=aYRg
-----END PGP SIGNATURE-----
Merge tag 'v5.19-rockchip-clk2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-rockchip
Pull Rockchip clk driver updates from Heiko Stuebner:
Conversion from txt to Yaml for a number of Rockchip
clock bindings.
Some fixes for recent yaml conversion of clock bindinds
and making the hclk_vo critical for rk3568.
* tag 'v5.19-rockchip-clk2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
dt-bindings: clock: convert rockchip,rk3368-cru.txt to YAML
dt-bindings: clock: convert rockchip,rk3228-cru.txt to YAML
dt-bindings: clock: convert rockchip,rk3036-cru.txt to YAML
dt-bindings: clock: convert rockchip,rk3308-cru.txt to YAML
dt-bindings: clock: convert rockchip,px30-cru.txt to YAML
dt-bindings: clock: convert rockchip,rk3188-cru.txt to YAML
dt-bindings: clock: convert rockchip,rk3288-cru.txt to YAML
dt-bindings: clock: convert rockchip,rv1108-cru.txt to YAML
dt-binding: clock: Add missing rk3568 cru bindings
clk: rockchip: Mark hclk_vo as critical on rk3568
dt-bindings: clock: fix rk3399 cru clock issues
dt-bindings: clock: use generic node name for pmucru example in rockchip,rk3399-cru.yaml
dt-bindings: clock: replace a maintainer for rockchip,rk3399-cru.yaml
dt-bindings: clock: fix some conversion style issues for rockchip,rk3399-cru.yaml
The PCA85073A RTC has the same programming model as the PCF85063A.
Add a compatible entry for it.
Tested on a custom i.MX6SX based board.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220419014445.341444-2-festevam@gmail.com
The PCA85073A RTC has the same programming model as the PCF85063A.
Add a compatible entry for it.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220419014445.341444-1-festevam@gmail.com
Add generic support for try_cmpxchg64{,_acquire,_release,_relaxed}
and their falbacks involving cmpxchg64.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220515184205.103089-2-ubizjak@gmail.com
There's two problems with the current amd_brs_adjust_period() code:
- it isn't in fact AMD specific and wil always adjust the period;
- it adjusts the period, while it should only adjust the event count,
resulting in repoting a short period.
Fix this by using x86_pmu.limit_period, this makes it specific to the
AMD BRS case and ensures only the event count is adjusted while the
reported period is unmodified.
Fixes: ba2fe75008 ("perf/x86/amd: Add AMD branch sampling period adjustment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
GuC error capture blurts some debug messages about empty
register lists for certain register types on engines during
firmware initialization.
These are not errors or warnings, so get rid of them.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220507045847.862261-2-alan.previn.teres.alexis@intel.com
With the kmalloc() size annotations, GCC is smart enough to realize that
LKDTM is intentionally writing past the end of the buffer. This is on
purpose, of course, so hide the buffer from the optimizer. Silences:
../drivers/misc/lkdtm/heap.c: In function 'lkdtm_SLAB_LINEAR_OVERFLOW':
../drivers/misc/lkdtm/heap.c:59:13: warning: array subscript 256 is outside array bounds of 'void[1020]' [-Warray-bounds]
59 | data[1024 / sizeof(u32)] = 0x12345678;
| ~~~~^~~~~~~~~~~~~~~~~~~~
In file included from ../drivers/misc/lkdtm/heap.c:7:
In function 'kmalloc',
inlined from 'lkdtm_SLAB_LINEAR_OVERFLOW' at ../drivers/misc/lkdtm/heap.c:54:14:
../include/linux/slab.h:581:24: note: at offset 1024 into object of size 1020 allocated by 'kmem_cache_alloc_trace'
581 | return kmem_cache_alloc_trace(
| ^~~~~~~~~~~~~~~~~~~~~~~
582 | kmalloc_caches[kmalloc_type(flags)][index],
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
583 | flags, size);
| ~~~~~~~~~~~~
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Add config options which are needed for LKDTM sub-tests:
STACKLEAK_ERASING test needs GCC_PLUGIN_STACKLEAK config.
READ_AFTER_FREE and READ_BUDDY_AFTER_FREE tests need
INIT_ON_FREE_DEFAULT_ON config.
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220517132932.1484719-1-usama.anjum@collabora.com
Add coverage for the recently added usercopy checks for vmalloc and
folios, via USERCOPY_VMALLOC and USERCOPY_FOLIO respectively.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
When a Root Port or Root Complex Event Collector receives an error Message
e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status
register and logs the Requester ID in the Error Source Identification
register. If it receives a second ERR_COR Message before software clears
PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the
Requester ID is lost.
In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared:
- hardware receives ERR_COR message
- hardware sets PCI_ERR_ROOT_COR_RCV
- aer_irq() entered
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_COR_RCV
- hardware receives second ERR_COR message
- hardware sets PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status)
- PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set
- aer_irq() entered again
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set
- PCI_ERR_ROOT_MULTI_COR_RCV is still set
The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and
PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV.
Fix the problem by queueing an AER event and clearing the Root Error Status
bits when any of these bits are set:
PCI_ERR_ROOT_COR_RCV
PCI_ERR_ROOT_UNCOR_RCV
PCI_ERR_ROOT_MULTI_COR_RCV
PCI_ERR_ROOT_MULTI_UNCOR_RCV
See the bugzilla link for details from Eric about how to reproduce this
problem.
[bhelgaas: commit log, move repro details to bugzilla]
Fixes: e167bfcaa4 ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992
Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com
Reported-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
The RZN1 RTC can compensate the imprecision of the oscillator up to
approximately 190ppm.
Seconds can last slightly shorter or longer depending on the
configuration.
Below ~65ppm of correction, we can change the time spent in a second
every minute, which is the most accurate compensation that the RTC can
offer. Above, the compensation will be active every 20s.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220516082504.33913-5-miquel.raynal@bootlin.com
The RZN1 RTC can trigger an interrupt when reaching a particular date up
to 7 days ahead. Bring support for this alarm.
One drawback though, the granularity is about a minute.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220516082504.33913-4-miquel.raynal@bootlin.com
The sun6i RTC provides 32 bytes of general-purpose data registers.
They can be used to save data in the always-on RTC power domain.
The registers are writable via 32-bit MMIO accesses only.
Expose them with a NVMEM provider so they can be used by other drivers.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220413231731.56709-1-samuel@sholland.org
This change fixes the following:
1) The flags variable is not initialized. Always use raw_spin_lock_irqsave
and raw_spin_unlock_irqrestore to serialize patching.
2) flush_kernel_vmap_range is primarily intended for DMA flushes.
The whole cache flush in flush_kernel_vmap_range is only possible
when interrupts are enabled on SMP machines. Since __patch_text_multiple
calls flush_kernel_vmap_range with interrupts disabled, it is better
to directly call flush_kernel_dcache_range_asm and
flush_kernel_icache_range_asm.
3) The final call to flush_icache_range is unnecessary.
Tested with `[PATCH, V3] parisc: Rewrite cache flush code for
PA8800/PA8900' change on rp3440, c8000 and c3750 (32 and 64-bit).
Note by Helge:
This patch had been temporarily reverted shortly before v5.18-rc6 in order
to fix boot issues. Now it can be re-applied.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Originally, I was convinced that we needed to use tmpalias flushes
everwhere, for both user and kernel flushes. However, when I modified
flush_kernel_dcache_page_addr, to use a tmpalias flush, my c8000
would crash quite early when booting.
The PDC returns alias values of 0 for the icache and dcache. This
indicates that either the alias boundary is greater than 16MB or
equivalent aliasing doesn't work. I modified the tmpalias code to
make it easy to try alternate boundaries. I tried boundaries up to
128MB but still kernel tmpalias flushes didn't work on c8000.
This led me to conclude that tmpalias flushes don't work on PA8800
and PA8900 machines, and that we needed to flush directly using the
virtual address of user and kernel pages. This is likely the major
cause of instability on the c8000 and rp34xx machines.
Flushing user pages requires doing a temporary context switch as we
have to flush pages that don't belong to the current context. Further,
we have to deal with pages that aren't present. If a page isn't
present, the flush instructions fault on every line.
Other code has been rearranged and simplified based on testing. For
example, I introduced a flush_cache_dup_mm routine. flush_cache_mm
and flush_cache_dup_mm differ in that flush_cache_mm calls
purge_cache_pages and flush_cache_dup_mm calls flush_cache_pages.
In some implementations, pdc is more efficient than fdc. Based on
my testing, I don't believe there's any performance benefit on the
c8000.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Change the "BUG" to "WARNING" and disable the message because it triggers
occasionally in spite of the check in flush_cache_page_if_present.
The pte value extracted for the "from" page in copy_user_highpage is racy and
occasionally the pte is cleared before the flush is complete. I assume that
the page is simultaneously flushed by flush_cache_mm before the pte is cleared
as nullifying the fdc doesn't seem to cause problems.
I investigated various locking scenarios but I wasn't able to find a way to
sequence the flushes. This code is called for every COW break and locks impact
performance.
This patch is related to the bigger cache flush patch because we need the pte
on PA8800/PA8900 to flush using the vma context.
I have also seen this from copy_to_user_page and copy_from_user_page.
The messages appear infrequently when enabled.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
It turns out that networking.pdf has exceeded 100 chapters and
titles of chapters >= 100 collide with their counts in its table
of contents (TOC).
Increase relevant params by 0.6em in the preamble to avoid such
ugly collisions.
While at it, fix a typo in comment (subsection).
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Link: https://lore.kernel.org/r/bdb60ba3-7813-47d0-74f9-7c31dd912d95@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
clk_generated_best_diff() helps in finding the parent and the divisor to
compute a rate closest to the required one. However, it doesn't take into
account the request's range for the new rate. Make sure the new rate
is within the required range.
Fixes: 8a8f4bf0c4 ("clk: at91: clk-generated: create function to find best_diff")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Link: https://lore.kernel.org/r/20220413071318.244912-1-codrin.ciubotariu@microchip.com
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pass updated i_size in fscache_unuse_cookie() when called
from nfs_fscache_release_file(), which ensures the size of
an fscache object gets written to the cache storage. Failing
to do so results in unnessary reads from the NFS server, even
when the data is cached, due to a cachefiles object coherency
check failing with a trace similar to the following:
cachefiles_coherency: o=0000000e BAD osiz B=afbb3 c=0
This problem can be reproduced as follows:
#!/bin/bash
v=4.2; NFS_SERVER=127.0.0.1
set -e; trap cleanup EXIT; rc=1
function cleanup {
umount /mnt/nfs > /dev/null 2>&1
RC_STR="TEST PASS"
[ $rc -eq 1 ] && RC_STR="TEST FAIL"
echo "$RC_STR on $(uname -r) with NFSv$v and server $NFS_SERVER"
}
mount -o vers=$v,fsc $NFS_SERVER:/export /mnt/nfs
rm -f /mnt/nfs/file1.bin > /dev/null 2>&1
dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1 > /dev/null 2>&1
echo 3 > /proc/sys/vm/drop_caches
echo Read file 1st time from NFS server into fscache
dd if=/mnt/nfs/file1.bin of=/dev/null > /dev/null 2>&1
umount /mnt/nfs && mount -o vers=$v,fsc $NFS_SERVER:/export /mnt/nfs
echo 3 > /proc/sys/vm/drop_caches
echo Read file 2nd time from fscache
dd if=/mnt/nfs/file1.bin of=/dev/null > /dev/null 2>&1
echo Check mountstats for NFS read
grep -q "READ: 0" /proc/self/mountstats # (1st number) == 0
[ $? -eq 0 ] && rc=0
Fixes: a6b5a28eb5 "nfs: Convert to new fscache volume/cookie API"
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
cpufreq_offline() calls offline() and exit() under the policy rwsem
But they are called outside the rwsem in cpufreq_online().
Make cpufreq_online() call offline() and exit() as well as online() and
init() under the policy rwsem to achieve a clear lock relationship.
All of the init() and online() implementations in the tree only
initialize the policy object without attempting to acquire the policy
rwsem and they won't call cpufreq APIs attempting to acquire it.
Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If policy initialization fails after the sysfs files are created,
there is a possibility to end up running show()/store() callbacks
for half-initialized policies, which may have unpredictable
outcomes.
Abort show()/store() in such a case by making sure the policy is active.
Also dectivate the policy on such failures.
Signed-off-by: Schspa Shi <schspa@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To enable NFSv4 to work correctly, NFSv4 client identifiers have
to be globally unique and persistent over client reboots. We
believe that in many cases, a good default identifier can be
chosen and set when a client system is imaged.
Because there are many different ways a system can be imaged,
provide an explanation of how NFSv4 client identifiers and
principals can be set by install scripts and imaging tools.
Additional cases, such as NFSv4 clients running in containers, also
need unique and persistent identifiers. The Linux NFS community
sets forth this explanation to aid those who create and manage
container environments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The documentation for nfs4_unique_id is out-of-date. In particular it
claim that when nfs4_unique_id is set, the host name is not used. since
Commit 55b592933b ("NFSv4: Fix nfs4_init_uniform_client_string for net
namespaces") both the unique_id AND the host name are used.
Update the documentation to match the code.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFSv4 can lose locks if, for example there is a network partition for
longer than the lease period. When this happens a warning message
NFS: __nfs4_reclaim_open_state: Lock reclaim failed!
is generated, possibly once for each lock (though rate limited).
This is potentially misleading as is can be read as suggesting that lock
reclaim was attempted. However the default behaviour is to not attempt
to recover locks (except due to server report).
This patch changes the reporting to produce at most one message for each
attempt to recover all state from a given server. The message reports
the server name and the number of locks lost if that number is non-zero.
It reports that locks were lost and give no suggestion as to whether
there was an attempt or not.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Now that everything is fully locked there is no need for container_users
to remain as an atomic, change it to an unsigned int.
Use 'if (group->container)' as the test to determine if the container is
present or not instead of using container_users.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/6-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Once userspace opens a group FD it is prevented from opening another
instance of that same group FD until all the prior group FDs and users of
the container are done.
The first is done trivially by checking the group->opened during group FD
open.
However, things get a little weird if userspace creates a device FD and
then closes the group FD. The group FD still cannot be re-opened, but this
time it is because the group->container is still set and container_users
is elevated by the device FD.
Due to this mismatched lifecycle we have the
vfio_group_try_dissolve_container() which tries to auto-free a container
after the group FD is closed but the device FD remains open.
Instead have the device FD hold onto a reference to the single group
FD. This directly prevents vfio_group_fops_release() from being called
when any device FD exists and makes the lifecycle model more
understandable.
vfio_group_try_dissolve_container() is removed as the only place a
container is auto-deleted is during vfio_group_fops_release(). At this
point the container_users is either 1 or 0 since all device FDs must be
closed.
Change group->opened to group->opened_file which points to the single
struct file * that is open for the group. If the group->open_file is
NULL then group->container == NULL.
If all device FDs have closed then the group's notifier list must be
empty.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/5-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is necessary to avoid various user triggerable races, for instance
racing SET_CONTAINER/UNSET_CONTAINER:
ioctl(VFIO_GROUP_SET_CONTAINER)
ioctl(VFIO_GROUP_UNSET_CONTAINER)
vfio_group_unset_container
int users = atomic_cmpxchg(&group->container_users, 1, 0);
// users == 1 container_users == 0
__vfio_group_unset_container(group);
container = group->container;
vfio_group_set_container()
if (!atomic_read(&group->container_users))
down_write(&container->group_lock);
group->container = container;
up_write(&container->group_lock);
down_write(&container->group_lock);
group->container = NULL;
up_write(&container->group_lock);
vfio_container_put(container);
/* woops we lost/leaked the new container */
This can then go on to NULL pointer deref since container == 0 and
container_users == 1.
Wrap all touches of container, except those on a performance path with a
known open device, with the group_rwsem.
The only user of vfio_group_add_container_user() holds the user count for
a simple operation, change it to just hold the group_lock over the
operation and delete vfio_group_add_container_user(). Containers now only
gain a user when a device FD is opened.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/4-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>