mirror of
https://github.com/nxp-imx/linux-imx.git
synced 2025-07-15 13:49:36 +02:00
ef901b38d3
851 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
856726396d |
- Fix DM discard regressions due to DM core switching over to using
queue_limits_set() without DM core and targets first being updated to set (and stack) discard limits in terms of max_hw_discard_sectors and not max_discard_sectors. - Fix stable@ DM integrity discard support to set device's discard_granularity limit to the device's logical block size. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmZMv3kACgkQxSPxCi2d A1pX2gf+NjaTP+6otMJ44sYwUGHuWtgwDy7NcoTiF4RvLQjjrUh8Lgm2p3i4evFs 4AzMNXy5V6/mEx/AZb7sjCtPkIGOykkBud3+jhStohQKvXEJQcYtwACNv151NW2c Fw2tcPZbPKH8P8UkDkegLaCu+4DotYjhuw44dfHFVZ95Wlhcm2UmcjWQEf/LtYW0 Si2FE0z2n1yi1mY6cExuL0bJ+LaMrXRQzHE3ZPaPRn4PFNUTY1juKsHYbgyL7cZ+ xQM1W/KBrKzDztCJGKH4Cl86L3kNPRkiQ7BQ/2Wtse20o10EGT0+lPvevCNcPNpi gbjAo7OPy9WPA9N/AR8Wfj06YQFaGA== =Pp+x -----END PGP SIGNATURE----- Merge tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM discard regressions due to DM core switching over to using queue_limits_set() without DM core and targets first being updated to set (and stack) discard limits in terms of max_hw_discard_sectors and not max_discard_sectors - Fix stable@ DM integrity discard support to set device's discard_granularity limit to the device's logical block size * tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: always manage discard support in terms of max_hw_discard_sectors dm-integrity: set discard_granularity to logical block size |
||
![]() |
825d8bbd2f |
dm: always manage discard support in terms of max_hw_discard_sectors
Commit |
||
![]() |
0c9f4ac808 |
for-6.10/block-20240511
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmY/YgsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpvi0EACwnFRtYioizBH0x7QUHTBcIr0IhACd5gfz bm+uwlDUtf6G6lupHdJT9gOVB2z2z1m2Pz//8RuUVWw3Eqw2+rfgG8iJd+yo7IaV DpX3WaM4NnBvB7FKOKHlMPvGuf7KgbZ3uPm3x8cbrn/axMmkZ6ljxTixJ3p5t4+s xRsef/lVdG71DkXIFgTKATB86yNRJNlRQTbL+sZW22vdXdtfyBbOgR1sBuFfp7Hd g/uocZM/z0ahM6JH/5R2IX2ttKXMIBZLA8HRkJdvYqg022cj4js2YyRCPU3N6jQN MtN4TpJV5I++8l6SPQOOhaDNrK/6zFtDQpwG0YBiKKj3nQDgVbWWb8ejYTIUv4MP SrEto4MVBEqg5N65VwYYhIf45rmueFyJp6z0Vqv6Owur5nuww/YIFknmoMa/WDMd V8dIU3zL72FZDbPjIBjxHeqAGz9OgzEVafled7pi0Xbw6wqiB4kZihlMGXlD+WBy Yd6xo8PX4i5+d2LLKKPxpW1X0eJlKYJ/4dnYCoFN8LmXSiPJnMx2pYrV+NqMxy4X Thr8lxswLQC7j9YBBuIeDl8NB9N5FZZLvaC6I25QKq045M2ckJ+VrounsQb3vGwJ 72nlxxBZL8wz3sasgX9Pc1Cez9AqYbM+UZahq8ezPY5y3Jh0QfRw/MOk1ZaDNC8V CNOHBH0E+Q== =HnjE -----END PGP SIGNATURE----- Merge tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - Add a partscan attribute in sysfs, fixing an issue with systemd relying on an internal interface that went away. - Attempt #2 at making long running discards interruptible. The previous attempt went into 6.9, but we ended up mostly reverting it as it had issues. - Remove old ida_simple API in bcache - Support for zoned write plugging, greatly improving the performance on zoned devices. - Remove the old throttle low interface, which has been experimental since 2017 and never made it beyond that and isn't being used. - Remove page->index debugging checks in brd, as it hasn't caught anything and prepares us for removing in struct page. - MD pull request from Song - Don't schedule block workers on isolated CPUs * tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits) blk-throttle: delay initialization until configuration blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW block: fix that util can be greater than 100% block: support to account io_ticks precisely block: add plug while submitting IO bcache: fix variable length array abuse in btree_iter bcache: Remove usage of the deprecated ida_simple_xx() API md: Revert "md: Fix overflow in is_mddev_idle" blk-lib: check for kill signal in ioctl BLKDISCARD block: add a bio_await_chain helper block: add a blk_alloc_discard_bio helper block: add a bio_chain_and_submit helper block: move discard checks into the ioctl handler block: remove the discard_granularity check in __blkdev_issue_discard block/ioctl: prefer different overflow check null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION() block: fix and simplify blkdevparts= cmdline parsing block: refine the EOF check in blkdev_iomap_begin block: add a partscan sysfs attribute for disks block: add a disk_has_partscan helper ... |
||
![]() |
f211268ed1 |
dm: Use the block layer zone append emulation
For targets requiring zone append operation emulation with regular writes (e.g. dm-crypt), we can use the block layer emulation provided by zone write plugging. Remove DM implemented zone append emulation and enable the block layer one. This is done by setting the max_zone_append_sectors limit of the mapped device queue to 0 for mapped devices that have a target table that cannot support native zone append operations (e.g. dm-crypt). Such mapped devices are flagged with the DMF_EMULATE_ZONE_APPEND flag. dm_split_and_process_bio() is modified to execute blk_zone_write_plug_bio() for such device to let the block layer transform zone append operations into regular writes. This is done after ensuring that the submitted BIO is split if it straddles zone boundaries. Both changes are implemented unsing the inline helpers dm_zone_write_plug_bio() and dm_zone_bio_needs_split() respectively. dm_revalidate_zones() is also modified to use the block layer provided function blk_revalidate_disk_zones() so that all zone resources needed for zone append emulation are initialized by the block layer without DM core needing to do anything. Since the device table is not yet live when dm_revalidate_zones() is executed, enabling the use of blk_revalidate_disk_zones() requires adding a pointer to the device table in struct mapped_device. This avoids errors in dm_blk_report_zones() trying to get the table with dm_get_live_table(). The mapped device table pointer is set to the table passed as argument to dm_revalidate_zones() before calling blk_revalidate_disk_zones() and reset to NULL after this function returns to restore the live table handling for user call of report zones. All the code related to zone append emulation is removed from dm-zone.c. This leads to simplifications of the functions __map_bio() and dm_zone_endio(). This later function now only needs to deal with completions of real zone append operations for targets that support it. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240408014128.205141-13-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
48ef0ba12e |
dm: restore synchronous close of device mapper block device
'dmsetup remove' and 'dmsetup remove_all' require synchronous bdev
release. Otherwise dm_lock_for_deletion() may return -EBUSY if the open
count is > 0, because the open count is dropped in dm_blk_close()
which occurs after fput() completes.
So if dm_blk_close() is delayed because of asynchronous fput(), this
device mapper device is skipped during remove, which is a regression.
Fix the issue by using __fput_sync().
Also, DM device removal has long supported being made asynchronous by
setting the DMF_DEFERRED_REMOVE flag on the DM device. So leverage
using async fput() in close_table_device() if DMF_DEFERRED_REMOVE flag
is set.
Reported-by: Zhong Changhui <czhong@redhat.com>
Fixes:
|
||
![]() |
902861e34c |
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx TMNhHfyiHYDTx/GAV9NXW84tasJSDgA= =TG55 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ... |
||
![]() |
d2bac0823d |
- Fix DM core's IO submission (which include dm-io and dm-bufio) such
that a bio's IO priority is propagated. Work focused on enabling both DM crypt and verity targets to retain the appropriate IO priority. - Fix DM raid reshape logic to not allow an empty flush bio to be requeued due to false concern about the bio, which doesn't have a data payload, accessing beyond the end of the device. - Fix DM core's internal resume so that it properly calls both presume and resume methods, which fixes the potential for a postsuspend and resume imbalance. - Update DM verity target to set DM_TARGET_SINGLETON flag because it doesn't make sense to have a DM table with a mix of targets that include dm-verity. - Small cleanups in DM crypt, thin, and integrity targets. - Fix references to dm-devel mailing list to use latest list address. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmXwWHkACgkQxSPxCi2d A1oavggAtmftaoAXUJNdiYgIDm6/gKPa0CDyQEoD5cTyQ32R7c6SYNN7u2wrltHX af36lzoksfsOra8uMa4+Q46/XFeqlRzvfu29bhmfAWMkMT3MMq7iGtTFG/SluD7b NWjXhsUT8Fv2Q7BKglUc6cXUIbCZNwNUs5cxx2QobdrD57qxVKDz5HkH5/EggptA cA6dpD7DbpWQhHWm0UN6cOmYNh4kGLiQs4S50N5hc7zlbXfJhhVzflecsVPY+MVN wS/iv/hNenlLuJ7gzIPBwmxRgBbjHHbrbFNVa2yFrPQ8m/AuRbAUqYQO1sOXNOMZ Mkk2G0IB7sJtpnEm+6l0x7A4VmSWvg== =Eoxd -----END PGP SIGNATURE----- Merge tag 'for-6.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fix DM core's IO submission (which include dm-io and dm-bufio) such that a bio's IO priority is propagated. Work focused on enabling both DM crypt and verity targets to retain the appropriate IO priority - Fix DM raid reshape logic to not allow an empty flush bio to be requeued due to false concern about the bio, which doesn't have a data payload, accessing beyond the end of the device - Fix DM core's internal resume so that it properly calls both presume and resume methods, which fixes the potential for a postsuspend and resume imbalance - Update DM verity target to set DM_TARGET_SINGLETON flag because it doesn't make sense to have a DM table with a mix of targets that include dm-verity - Small cleanups in DM crypt, thin, and integrity targets - Fix references to dm-devel mailing list to use latest list address * tag 'for-6.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: call the resume method on internal suspend dm raid: fix false positive for requeue needed during reshape dm-integrity: set max_integrity_segments in dm_integrity_io_hints dm: update relevant MODULE_AUTHOR entries to latest dm-devel mailing list dm ioctl: update DM_DRIVER_EMAIL to new dm-devel mailing list dm verity: set DM_TARGET_SINGLETON feature flag dm crypt: Fix IO priority lost when queuing write bios dm verity: Fix IO priority lost when reading FEC and hash dm bufio: Support IO priority dm io: Support IO priority dm crypt: remove redundant state settings after waking up dm thin: add braces around conditional code that spans lines |
||
![]() |
65e8fbde64 |
dm: call the resume method on internal suspend
There is this reported crash when experimenting with the lvm2 testsuite.
The list corruption is caused by the fact that the postsuspend and resume
methods were not paired correctly; there were two consecutive calls to the
origin_postsuspend function. The second call attempts to remove the
"hash_list" entry from a list, while it was already removed by the first
call.
Fix __dm_internal_resume so that it calls the preresume and resume
methods of the table's targets.
If a preresume method of some target fails, we are in a tricky situation.
We can't return an error because dm_internal_resume isn't supposed to
return errors. We can't return success, because then the "resume" and
"postsuspend" methods would not be paired correctly. So, we set the
DMF_SUSPENDED flag and we fake normal suspend - it may confuse userspace
tools, but it won't cause a kernel crash.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 8343 Comm: dmsetup Not tainted 6.8.0-rc6 #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
RIP: 0010:__list_del_entry_valid_or_report+0x77/0xc0
<snip>
RSP: 0018:ffff8881b831bcc0 EFLAGS: 00010282
RAX: 000000000000004e RBX: ffff888143b6eb80 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff819053d0 RDI: 00000000ffffffff
RBP: ffff8881b83a3400 R08: 00000000fffeffff R09: 0000000000000058
R10: 0000000000000000 R11: ffffffff81a24080 R12: 0000000000000001
R13: ffff88814538e000 R14: ffff888143bc6dc0 R15: ffffffffa02e4bb0
FS: 00000000f7c0f780(0000) GS:ffff8893f0a40000(0000) knlGS:0000000000000000
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000057fb5000 CR3: 0000000143474000 CR4: 00000000000006b0
Call Trace:
<TASK>
? die+0x2d/0x80
? do_trap+0xeb/0xf0
? __list_del_entry_valid_or_report+0x77/0xc0
? do_error_trap+0x60/0x80
? __list_del_entry_valid_or_report+0x77/0xc0
? exc_invalid_op+0x49/0x60
? __list_del_entry_valid_or_report+0x77/0xc0
? asm_exc_invalid_op+0x16/0x20
? table_deps+0x1b0/0x1b0 [dm_mod]
? __list_del_entry_valid_or_report+0x77/0xc0
origin_postsuspend+0x1a/0x50 [dm_snapshot]
dm_table_postsuspend_targets+0x34/0x50 [dm_mod]
dm_suspend+0xd8/0xf0 [dm_mod]
dev_suspend+0x1f2/0x2f0 [dm_mod]
? table_deps+0x1b0/0x1b0 [dm_mod]
ctl_ioctl+0x300/0x5f0 [dm_mod]
dm_compat_ctl_ioctl+0x7/0x10 [dm_mod]
__x64_compat_sys_ioctl+0x104/0x170
do_syscall_64+0x184/0x1b0
entry_SYSCALL_64_after_hwframe+0x46/0x4e
RIP: 0033:0xf7e6aead
<snip>
---[ end trace 0000000000000000 ]---
Fixes:
|
||
![]() |
1ddeeb2a05 |
for-6.9/block-20240310
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmXuFO4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpq33D/9hyNyBce2A9iyo026eK8EqLDoed6BPzuvB kLKj5tsGvX4YlfuswvP86M5dgibTASXclnfUK394TijW/JPOfJ3mNhi9gMnHzRoK ZaR1di0Lum56dY1FkpMmWiGmE4fB79PAtXYKtajOkuoIcNzylncEAAACUY4/Ouhg Cm+LMg2prcc+m9g8rKDNQ51pUFg4U21KAUTl35XLMUAaQk1ahW3EDEVYhweC/zwE V/5hJsv8UY72+oQGY2Dc/YgQk/Zj4ZDh7C+oHR9XeB/ro99kr3/Vopagu0gBMLZi Rq6qqz6PVMhVcuz8uN2rsTQKXmXhsBn9/adsl4AKtdxcW5D5moWb5BLq1P0WQylc nzMxa1d6cVcTKZpaUQQv3Rj6ZMrLuDwP277UYHfn5x1oPWYRZCG7FtHuOo1gNcpG DrSNwVG6BSDcbABqI+MIS2oD1JoUMyevjwT7e2hOXukZhc6GLO5F3ODWE5j3KnCR S/aGSAmcdR4fTcgavULqWdQVt7SYl4f1IxT8KrUirJGVhc2LgahaWj69ooklVHoU fPDFRiruwJ5YkH4RWCSDm9mi4kAz6eUf+f4yE06wZOFOb2fT8/1ZK2Snpz2KeXuZ INO0RejtFzT8L0OUlu7dBmF20y6rgAYt87lR8mIt71yuuATIrVhzlX1VdsvhdrAo VLHGV1Ncgw== =WlVL -----END PGP SIGNATURE----- Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - MD pull requests via Song: - Cleanup redundant checks (Yu Kuai) - Remove deprecated headers (Marc Zyngier, Song Liu) - Concurrency fixes (Li Lingfeng) - Memory leak fix (Li Nan) - Refactor raid1 read_balance (Yu Kuai, Paul Luse) - Clean up and fix for md_ioctl (Li Nan) - Other small fixes (Gui-Dong Han, Heming Zhao) - MD atomic limits (Christoph) - NVMe pull request via Keith: - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith) - Improvement and cleanup for cached request handling (Christoph) - Moving towards atomic queue limits. Core changes and driver bits so far (Christoph) - Fix UAF issues in aoeblk (Chun-Yi) - Zoned fix and cleanups (Damien) - s390 dasd cleanups and fixes (Jan, Miroslav) - Block issue timestamp caching (me) - noio scope guarding for zoned IO (Johannes) - block/nvme PI improvements (Kanchan) - Ability to terminate long running discard loop (Keith) - bdev revalidation fix (Li) - Get rid of old nr_queues hack for kdump kernels (Ming) - Support for async deletion of ublk (Ming) - Improve IRQ bio recycling (Pavel) - Factor in CPU capacity for remote vs local completion (Qais) - Add shared_tags configfs entry for null_blk (Shin'ichiro - Fix for a regression in page refcounts introduced by the folio unification (Tony) - Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid, Ricardo, Roman, Tang, Uwe) * tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits) block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC block/swim: Convert to platform remove callback returning void cdrom: gdrom: Convert to platform remove callback returning void block: remove disk_stack_limits md: remove mddev->queue md: don't initialize queue limits md/raid10: use the atomic queue limit update APIs md/raid5: use the atomic queue limit update APIs md/raid1: use the atomic queue limit update APIs md/raid0: use the atomic queue limit update APIs md: add queue limit helpers md: add a mddev_is_dm helper md: add a mddev_add_trace_msg helper md: add a mddev_trace_remap helper bcache: move calculation of stripe_size and io_opt into bcache_device_init virtio_blk: Do not use disk_set_max_open/active_zones() aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts block: move capacity validation to blkpg_do_ioctl() block: prevent division by zero in blk_rq_stat_sum() drbd: atomically update queue limits in drbd_reconsider_queue_parameters ... |
||
![]() |
a28d893eb3
|
md: port block device access to file
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-4-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
c29290728d |
dm: treat alloc_dax() -EOPNOTSUPP failure as non-fatal
In preparation for checking whether the architecture has data cache
aliasing within alloc_dax(), modify the error handling of dm alloc_dev()
to treat alloc_dax() -EOPNOTSUPP failure as non-fatal.
Link: https://lkml.kernel.org/r/20240215144633.96437-5-mathieu.desnoyers@efficios.com
Fixes:
|
||
![]() |
fa34e5893f |
dm: update relevant MODULE_AUTHOR entries to latest dm-devel mailing list
Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
74fa8f9c55 |
block: pass a queue_limits argument to blk_alloc_disk
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also change blk_alloc_disk to return an ERR_PTR instead of just NULL which can't distinguish errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
982c3b3058
|
bdev: rename freeze and thaw helpers
We have bdev_mark_dead() etc and we're going to move block device freezing to holder ops in the next patch. Make the naming consistent: * freeze_bdev() -> bdev_freeze() * thaw_bdev() -> bdev_thaw() Also document the return code. Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-2-599c19f4faac@kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
0364249d20 |
- Update DM core to directly call the map function for both the linear
and stripe targets; which are provided by DM core. - Various updates to use new safer string functions. - Update DM core to respect REQ_NOWAIT flag in normal bios so that memory allocations are always attempted with GFP_NOWAIT. - Add Mikulas Patocka to MAINTAINERS as a DM maintainer! - Improve DM delay target's handling of short delays (< 50ms) by using a kthread to check expiration of IOs rather than timers and a wq. - Update the DM error target so that it works with zoned storage. This helps xfstests to provide proper IO error handling coverage when testing a filesystem with native zoned storage support. - Update both DM crypt and integrity targets to improve performance by using crypto_shash_digest() rather than init+update+final sequence. - Fix DM crypt target by backfilling missing memory allocation accounting for compound pages. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmVCeG8ACgkQxSPxCi2d A1oXAAgAnT0mb1psEwDhKiJG26bUeJDJHIaNTPPw4UpCvKEWU7lTxzJUmthnwLZI D+JnkrenAfGppdsFkHh1ogTBz8r60aeBTY31SUoCUVS/clwHxRstLCkSE2655an3 WNgrGItta5BRtcgTMxU7bmqVOD4Xyw704L7BK5CM4zKBQuXJtkegylALhdXvYSjs mvHWoPczFW6Rv79CkF0uTkon7Ji041BvKVjbp+fYQf9Pul5d6S1v4Jrvlo2EqblU 6OM75OosNgziQhdw7faVOgFCVkQfLkEeYsJ47dBOjpC3+sbyKENbC4r+ZmkCZGyI uJERswpmAkW7pp+Ab6sqG582DXvPvA== =PPwQ -----END PGP SIGNATURE----- Merge tag 'for-6.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Update DM core to directly call the map function for both the linear and stripe targets; which are provided by DM core - Various updates to use new safer string functions - Update DM core to respect REQ_NOWAIT flag in normal bios so that memory allocations are always attempted with GFP_NOWAIT - Add Mikulas Patocka to MAINTAINERS as a DM maintainer! - Improve DM delay target's handling of short delays (< 50ms) by using a kthread to check expiration of IOs rather than timers and a wq - Update the DM error target so that it works with zoned storage. This helps xfstests to provide proper IO error handling coverage when testing a filesystem with native zoned storage support - Update both DM crypt and integrity targets to improve performance by using crypto_shash_digest() rather than init+update+final sequence - Fix DM crypt target by backfilling missing memory allocation accounting for compound pages * tag 'for-6.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm crypt: account large pages in cc->n_allocated_pages dm integrity: use crypto_shash_digest() in sb_mac() dm crypt: use crypto_shash_digest() in crypt_iv_tcw_whitening() dm error: Add support for zoned block devices dm delay: for short delays, use kthread instead of timers and wq MAINTAINERS: add Mikulas Patocka as a DM maintainer dm: respect REQ_NOWAIT flag in normal bios issued to DM dm: enhance alloc_multiple_bios() to be more versatile dm: make __send_duplicate_bios return unsigned int dm log userspace: replace deprecated strncpy with strscpy dm ioctl: replace deprecated strncpy with strscpy_pad dm crypt: replace open-coded kmemdup_nul dm cache metadata: replace deprecated strncpy with strscpy dm: shortcut the calls to linear_map and stripe_map |
||
![]() |
c2fce61fb2
|
dm: Convert to bdev_open_by_dev()
Convert device mapper to use bdev_open_by_dev() and pass the handle around. CC: Alasdair Kergon <agk@redhat.com> CC: Mike Snitzer <snitzer@kernel.org> CC: dm-devel@redhat.com Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-10-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
6f25dd1c57 |
dm: respect REQ_NOWAIT flag in normal bios issued to DM
Update DM core's normal IO submission to allocate required memory
using GFP_NOWAIT if REQ_NOWAIT is set.
Tested with simple test provided in commit
|
||
![]() |
4a2fe29608 |
dm: enhance alloc_multiple_bios() to be more versatile
alloc_multiple_bios() has the useful ability to try allocating bios with GFP_NOWAIT but will fallback to using GFP_NOIO. The callers service both empty flush bios and abnormal bios (e.g. discard). alloc_multiple_bios() enhancements offered in this commit: - don't require table_devices_lock if num_bios = 1 - allow caller to pass GFP_NOWAIT to do usual GFP_NOWAIT with GFP_NOIO fallback - allow caller to pass GFP_NOIO to _only_ allocate using GFP_NOIO Flush bios with data may be issued to DM with REQ_NOWAIT, as such it makes sense to attempt servicing them with GFP_NOWAIT allocations. But abnormal IO should never be issued using REQ_NOWAIT (if that changes in the future that's fine, but no sense supporting it now). While at it, rename __send_changing_extent_only() to __send_abnormal_io(). [Thanks to both Ming and Mikulas for help with translating known possible IO scenarios to requirements.] Suggested-by: Ming Lei <ming.lei@redhat.com> Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
34dbaa88ca |
dm: make __send_duplicate_bios return unsigned int
All the callers cast the value returned by __send_duplicate_bios to unsigned int type, so we can return unsigned int as well. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
f144503217 |
dm: shortcut the calls to linear_map and stripe_map
Shortcut the calls to linear_map and stripe_map, so that they don't suffer the overhead of retpolines used for indirect calls. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
a9ce385344 |
dm: don't attempt to queue IO under RCU protection
dm looks up the table for IO based on the request type, with an
assumption that if the request is marked REQ_NOWAIT, it's fine to
attempt to submit that IO while under RCU read lock protection. This
is not OK, as REQ_NOWAIT just means that we should not be sleeping
waiting on other IO, it does not mean that we can't potentially
schedule.
A simple test case demonstrates this quite nicely:
int main(int argc, char *argv[])
{
struct iovec iov;
int fd;
fd = open("/dev/dm-0", O_RDONLY | O_DIRECT);
posix_memalign(&iov.iov_base, 4096, 4096);
iov.iov_len = 4096;
preadv2(fd, &iov, 1, 0, RWF_NOWAIT);
return 0;
}
which will instantly spew:
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5580, name: dm-nowait
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
CPU: 7 PID: 5580 Comm: dm-nowait Not tainted 6.6.0-rc1-g39956d2dcd81 #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x11d/0x1b0
__might_resched+0x3c3/0x5e0
? preempt_count_sub+0x150/0x150
mempool_alloc+0x1e2/0x390
? mempool_resize+0x7d0/0x7d0
? lock_sync+0x190/0x190
? lock_release+0x4b7/0x670
? internal_get_user_pages_fast+0x868/0x2d40
bio_alloc_bioset+0x417/0x8c0
? bvec_alloc+0x200/0x200
? internal_get_user_pages_fast+0xb8c/0x2d40
bio_alloc_clone+0x53/0x100
dm_submit_bio+0x27f/0x1a20
? lock_release+0x4b7/0x670
? blk_try_enter_queue+0x1a0/0x4d0
? dm_dax_direct_access+0x260/0x260
? rcu_is_watching+0x12/0xb0
? blk_try_enter_queue+0x1cc/0x4d0
__submit_bio+0x239/0x310
? __bio_queue_enter+0x700/0x700
? kvm_clock_get_cycles+0x40/0x60
? ktime_get+0x285/0x470
submit_bio_noacct_nocheck+0x4d9/0xb80
? should_fail_request+0x80/0x80
? preempt_count_sub+0x150/0x150
? lock_release+0x4b7/0x670
? __bio_add_page+0x143/0x2d0
? iov_iter_revert+0x27/0x360
submit_bio_noacct+0x53e/0x1b30
submit_bio_wait+0x10a/0x230
? submit_bio_wait_endio+0x40/0x40
__blkdev_direct_IO_simple+0x4f8/0x780
? blkdev_bio_end_io+0x4c0/0x4c0
? stack_trace_save+0x90/0xc0
? __bio_clone+0x3c0/0x3c0
? lock_release+0x4b7/0x670
? lock_sync+0x190/0x190
? atime_needs_update+0x3bf/0x7e0
? timestamp_truncate+0x21b/0x2d0
? inode_owner_or_capable+0x240/0x240
blkdev_direct_IO.part.0+0x84a/0x1810
? rcu_is_watching+0x12/0xb0
? lock_release+0x4b7/0x670
? blkdev_read_iter+0x40d/0x530
? reacquire_held_locks+0x4e0/0x4e0
? __blkdev_direct_IO_simple+0x780/0x780
? rcu_is_watching+0x12/0xb0
? __mark_inode_dirty+0x297/0xd50
? preempt_count_add+0x72/0x140
blkdev_read_iter+0x2a4/0x530
do_iter_readv_writev+0x2f2/0x3c0
? generic_copy_file_range+0x1d0/0x1d0
? fsnotify_perm.part.0+0x25d/0x630
? security_file_permission+0xd8/0x100
do_iter_read+0x31b/0x880
? import_iovec+0x10b/0x140
vfs_readv+0x12d/0x1a0
? vfs_iter_read+0xb0/0xb0
? rcu_is_watching+0x12/0xb0
? rcu_is_watching+0x12/0xb0
? lock_release+0x4b7/0x670
do_preadv+0x1b3/0x260
? do_readv+0x370/0x370
__x64_sys_preadv2+0xef/0x150
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5af41ad806
Code: 41 54 41 89 fc 55 44 89 c5 53 48 89 cb 48 83 ec 18 80 3d e4 dd 0d 00 00 74 7a 45 89 c1 49 89 ca 45 31 c0 b8 47 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 be 00 00 00 48 85 c0 79 4a 48 8b 0d da 55
RSP: 002b:00007ffd3145c7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000147
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5af41ad806
RDX: 0000000000000001 RSI: 00007ffd3145c850 RDI: 0000000000000003
RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffd3145c850 R14: 000055f5f0431dd8 R15: 0000000000000001
</TASK>
where in fact it is dm itself that attempts to allocate a bio clone with
GFP_NOIO under the rcu read lock, regardless of the request type.
Fix this by getting rid of the special casing for REQ_NOWAIT, and just
use the normal SRCU protected table lookup. Get rid of the bio based
table locking helpers at the same time, as they are now unused.
Cc: stable@vger.kernel.org
Fixes:
|
||
![]() |
6cdbb0907a |
- Update DM crypt to allocate compound pages if possible.
- Fix DM crypt target's crypt_ctr_cipher_new return value on invalid AEAD cipher. - Fix DM flakey testing target's write bio corruption feature to corrupt the data of a cloned bio instead of the original. - Add random_read_corrupt and random_write_corrupt features to DM flakey target. - Fix ABBA deadlock in DM thin metadata by resetting associated bufio client rather than destroying and recreating it. - A couple other small DM thinp cleanups. - Update DM core to support disabling block core IO stats accounting and optimize away code that isn't needed if stats are disabled. - Other small DM core cleanups. - Improve DM integrity target to not require so much memory on 32 bit systems. Also only allocate the recalculate buffer as needed (and increasingly reduce its size on allocation failure). - Update DM integrity to use %*ph for printing hexdump of a small buffer. Also update DM integrity documentation. - Various DM core ioctl interface hardening. Now more careful about alignment of structures and processing of input passed to the kernel from userspace. Also disallow the creation of DM devices named "control", "." or ".." - Eliminate GFP_NOIO workarounds for __vmalloc and kvmalloc in DM core's ioctl and bufio code. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmScyjAACgkQxSPxCi2d A1pilwgAucNIAB6uN4ke4WZrMVxSFntUkqDTkCs2Ycw+W4Tf1Mtrj/4WeBzFdJaA oZMK04LUGaFFXn+halsCDzB354yT9C7V/KfXW8pCM1c9BRz4e8272i2HSN4WwD5n BU4gVaOV5BwxynfF3Z5siRraad1AwmdoRGGsqzVRESAKaObXU//1tnO42UhxRVhn nzFqhIm0xRcLAd8xIBlMsZQGIloicdDP9wZdWzTEDspQiwR2dFRmH9bUF8OmsS+h KwhtDty7aZO+4gJ1ccBImijzQCmbAo7dmFhDfoLXaA5Jt6UwTXMeBHm4aUPMnvQe NVXoRZJodDemwM642Q/Tx1SpsX6QmA== =4R7u -----END PGP SIGNATURE----- Merge tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Update DM crypt to allocate compound pages if possible - Fix DM crypt target's crypt_ctr_cipher_new return value on invalid AEAD cipher - Fix DM flakey testing target's write bio corruption feature to corrupt the data of a cloned bio instead of the original - Add random_read_corrupt and random_write_corrupt features to DM flakey target - Fix ABBA deadlock in DM thin metadata by resetting associated bufio client rather than destroying and recreating it - A couple other small DM thinp cleanups - Update DM core to support disabling block core IO stats accounting and optimize away code that isn't needed if stats are disabled - Other small DM core cleanups - Improve DM integrity target to not require so much memory on 32 bit systems. Also only allocate the recalculate buffer as needed (and increasingly reduce its size on allocation failure) - Update DM integrity to use %*ph for printing hexdump of a small buffer. Also update DM integrity documentation - Various DM core ioctl interface hardening. Now more careful about alignment of structures and processing of input passed to the kernel from userspace. Also disallow the creation of DM devices named "control", "." or ".." - Eliminate GFP_NOIO workarounds for __vmalloc and kvmalloc in DM core's ioctl and bufio code * tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits) dm: get rid of GFP_NOIO workarounds for __vmalloc and kvmalloc dm integrity: scale down the recalculate buffer if memory allocation fails dm integrity: only allocate recalculate buffer when needed dm integrity: reduce vmalloc space footprint on 32-bit architectures dm ioctl: Refuse to create device named "." or ".." dm ioctl: Refuse to create device named "control" dm ioctl: Avoid double-fetch of version dm ioctl: structs and parameter strings must not overlap dm ioctl: Avoid pointer arithmetic overflow dm ioctl: Check dm_target_spec is sufficiently aligned Documentation: dm-integrity: Document an example of how the tunables relate. Documentation: dm-integrity: Document default values. Documentation: dm-integrity: Document the meaning of "buffer". Documentation: dm-integrity: Fix minor grammatical error. dm integrity: Use %*ph for printing hexdump of a small buffer dm thin: disable discards for thin-pool if no_discard_passdown dm: remove stale/redundant dm_internal_{suspend,resume} prototypes in dm.h dm: skip dm-stats work in alloc_io() unless needed dm: avoid needless dm_io access if all IO accounting is disabled dm: support turning off block-core's io stats accounting ... |
||
![]() |
ca7ce08d6a |
SCSI misc on 20230629
Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi, lpfc, qla2xxx). We have a couple of major core changes impacting other systems: Command Duration Limits, which spills into block and ATA and block level Persistent Reservation Operations, which touches block, nvme, target and dm (both of which are added with merge commits containing a cover letter explaining what's going on). Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZJ19cSYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishfZpAQCQBuWR ELcOhsaG5KzO6xLWcH8mjsOoxffKvazZjTKXlAD5ATEv7++E250oKS3t+yfjae5I Lc195MlDju85ItUQgfk= =U9ik -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi, lpfc, qla2xxx). We have a couple of major core changes impacting other systems: - Command Duration Limits, which spills into block and ATA - block level Persistent Reservation Operations, which touches block, nvme, target and dm Both of these are added with merge commits containing a cover letter explaining what's going on" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits) scsi: core: Improve warning message in scsi_device_block() scsi: core: Replace scsi_target_block() with scsi_block_targets() scsi: core: Don't wait for quiesce in scsi_device_block() scsi: core: Don't wait for quiesce in scsi_stop_queue() scsi: core: Merge scsi_internal_device_block() and device_block() scsi: sg: Increase number of devices scsi: bsg: Increase number of devices scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue scsi: ufs: ufs-pci: Add support for Intel Arrow Lake scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT scsi: ufs: wb: Add explicit flush_threshold sysfs attribute scsi: ufs: ufs-qcom: Switch to the new ICE API scsi: ufs: dt-bindings: qcom: Add ICE phandle scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR scsi: ufs: core: Remove dedicated hwq for dev command scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes ... |
||
![]() |
72dc6db7e3 |
workqueue: Ordered workqueue creation cleanups
For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This pull request clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoKnA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGc5SAQDOtjML7Cx9AYzbY5+nYc0wTebRRTXGeOu7A3Xy j50rVgEAjHgvHLIdmeYmVhCeHOSN4q7Wn5AOwaIqZalOhfLyKQk= =hs79 -----END PGP SIGNATURE----- Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull ordered workqueue creation updates from Tejun Heo: "For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior" * tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues scsi: NCR5380: Use default @max_active for hostdata->work_q media: coda: Use alloc_ordered_workqueue() to create ordered workqueues crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues wifi: mwifiex: Use default @max_active for workqueues wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues greybus: Use alloc_ordered_workqueue() to create ordered workqueues powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues |
||
![]() |
a0433f8cae |
for-6.5/block-2023-06-23
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8dwQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpilGD/9Yys1oxIXJpRf00fzrylAlBthRxMjFQVWw zAut106hAQiBHvU8IkmGA3MvEFVHxtzwYhHI7IR8K3aZBIqscweCqmVI9JyogJw9 U9Twnzel47VmuKdM94FeoN+hbj1fP8EWTjzmy67/zEEfFCdmHvNlMi3lSrGYIpFy 39LxTB99Y4UarM5PtWbes37GYYljzMSWKuo4AfBkvq1eQa+sZ0Vq2xAABKq3UM7f apqhgHtkJooRePDP0eQp+kAyyVMgW2jIK+oIdJDxNF3CKTu2w40RzaYz6fp+jVSU H4R/xS59GW4/xql+VBJDh/qJg9K62DPPYjlW8BmSR8+IjvfFpsyH3/MacE50CD3P 20fs/Mnj49H79fDrQEHJI53cOOb2EmUitbwLbvOcColNTPpt8loBtdQxjF2RMU8R Nyort9DJPFclYCxky1LYg1CNEC2Ln4Zy/jD47wPvqRmOQphOoVlV/hPnOEqvjaZC 49Vn70W2DeE9cXvYI7ha+XIg6/oj+Gs3iusEbV08Ci7EAtXgI+ZUUsQ97K8UNiUh h2lqSJtuI7lBpYP9sf+BeCch5UCC+xGYyTdoM5f58lehWBBPtbs0g7S9RyRyOYxe n+yxEUo3dAGzJ/xsKAjinbZfeWIpr0b1TkAh4w3Cq/BKzRr9Bp8lBAxYuancbQ+Y 1ADPteUOTA== =zP4Y -----END PGP SIGNATURE----- Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner) - bcache updates via Coly: - Fix a race at init time (Mingzhe Zou) - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye) - use page pinning in the block layer for dio (David) - convert old block dio code to page pinning (David, Christoph) - cleanups for pktcdvd (Andy) - cleanups for rnbd (Guoqing) - use the unchecked __bio_add_page() for the initial single page additions (Johannes) - fix overflows in the Amiga partition handling code (Michael) - improve mq-deadline zoned device support (Bart) - keep passthrough requests out of the IO schedulers (Christoph, Ming) - improve support for flush requests, making them less special to deal with (Christoph) - add bdev holder ops and shutdown methods (Christoph) - fix the name_to_dev_t() situation and use cases (Christoph) - decouple the block open flags from fmode_t (Christoph) - ublk updates and cleanups, including adding user copy support (Ming) - BFQ sanity checking (Bart) - convert brd from radix to xarray (Pankaj) - constify various structures (Thomas, Ivan) - more fine grained persistent reservation ioctl capability checks (Jingbo) - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan, Jordy, Li, Min, Yu, Zhong, Waiman) * tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits) scsi/sg: don't grab scsi host module reference ext4: Fix warning in blkdev_put() block: don't return -EINVAL for not found names in devt_from_devname cdrom: Fix spectre-v1 gadget block: Improve kernel-doc headers blk-mq: don't insert passthrough request into sw queue bsg: make bsg_class a static const structure ublk: make ublk_chr_class a static const structure aoe: make aoe_class a static const structure block/rnbd: make all 'class' structures const block: fix the exclusive open mask in disk_scan_partitions block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: fix signed int overflow in Amiga partition support block: add capacity validation in bdev_add_partition() block: fine-granular CAP_SYS_ADMIN for Persistent Reservation block: disallow Persistent Reservation on partitions reiserfs: fix blkdev_put() warning from release_journal_dev() block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() block: document the holder argument to blkdev_get_by_path ... |
||
![]() |
c4f512d255 |
dm: skip dm-stats work in alloc_io() unless needed
Don't dm_stats_record_start() if dm_stats_used() is false. Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
06eed768ea |
dm: avoid needless dm_io access if all IO accounting is disabled
Update dm_io_acct() to eliminate most dm_io struct accesses if both block core's IO stats and dm-stats are disabled. Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
526d10061b |
dm: support turning off block-core's io stats accounting
Commit
|
||
![]() |
be04c14a1b |
dm: use op specific max_sectors when splitting abnormal io
Split abnormal IO in terms of the corresponding operation specific max_sectors (max_discard_sectors, max_secure_erase_sectors or max_write_zeroes_sectors). This fixes a significant dm-thinp discard performance regression that was introduced with commit |
||
![]() |
2760904d89 |
dm: don't lock fs when the map is NULL during suspend or resume
As described in commit |
||
![]() |
05bdb99653 |
block: replace fmode_t with a block-specific type for block open flags
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
2736e8eeb0 |
block: use the holder as indication for exclusive opens
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
ae220766d8 |
block: remove the unused mode argument to ->release
The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
d32e2bf837 |
block: pass a gendisk to ->open
->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
0718afd47f |
block: introduce holder ops
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
![]() |
57bbf99ce9 |
dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND ========== When multiple work items are queued to a workqueue, their execution order doesn't match the queueing order. They may get executed in any order and simultaneously. When fully serialized execution - one by one in the queueing order - is needed, an ordered workqueue should be used which can be created with alloc_ordered_workqueue(). However, alloc_ordered_workqueue() was a later addition. Before it, an ordered workqueue could be obtained by creating an UNBOUND workqueue with @max_active==1. This originally was an implementation side-effect which was broken by |
||
![]() |
7907ad748b |
Merge patch series "Use block pr_ops in LIO"
Mike Christie <michael.christie@oracle.com> says: The patches in this thread allow us to use the block pr_ops with LIO's target_core_iblock module to support cluster applications in VMs. They were built over Linus's tree. They also apply over linux-next and Martin's tree and Jens's trees. Currently, to use windows clustering or linux clustering (pacemaker + cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you have to use tcmu or pscsi or use a cluster aware FS/framework for the LIO pr file. Setting up a cluster FS/framework is pain and waste when your real backend device is already a distributed device, and pscsi and tcmu are nice for specific use cases, but iblock gives you the best performance and allows you to use stacked devices like dm-multipath. So these patches allow iblock to work like pscsi/tcmu where they can pass a PR command to the backend module. And then iblock will use the pr_ops to pass the PR command to the real devices similar to what we do for unmap today. The patches are separated in the following groups: Patch 1 - 2: - Add block layer callouts for reading reservations and rename reservation error code. Patch 3 - 5: - SCSI support for new callouts. Patch 6: - DM support for new callouts. Patch 7 - 13: - NVMe support for new callouts. Patch 14 - 18: - LIO support for new callouts. This patchset has been tested with the libiscsi PGR ops and with window's failover cluster verification test. Note that for scsi backend devices we need this patchset: https://lore.kernel.org/linux-scsi/20230123221046.125483-1-michael.christie@oracle.com/T/#m4834a643ffb5bac2529d65d40906d3cfbdd9b1b7 to handle UAs. To reduce the size of this patchset that's being done separately to make reviewing easier. And to make merging easier this patchset and the one above do not have any conflicts so can be merged in different trees. Link: https://lore.kernel.org/r/20230407200551.12660-1-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> |
||
![]() |
f7995089c5 |
dm: unexport dm_get_queue_limits()
There are no dm_get_queue_limits() callers outside of DM core and there shouldn't be. Also, remove its BUG_ON(!atomic_read(&md->holders)) to micro-optimize __process_abnormal_io(). Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
13f6facf3f |
dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE
Introduce max_write_zeroes_granularity and max_secure_erase_granularity flags in the dm_target struct. If a target sets these then DM core will split IO of these operation types accordingly (in terms of max_write_zeroes_sectors and max_secure_erase_sectors respectively). Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
8a8da082e9 |
dm: Add support for block PR read keys/reservation
This adds support in dm for the block PR read keys and read reservation callouts. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-7-michael.christie@oracle.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> |
||
![]() |
06961c487a |
dm: split discards further if target sets max_discard_granularity
The block core (bio_split_discard) will already split discards based on the 'discard_granularity' and 'max_discard_sectors' queue_limits. But the DM thin target also needs to ensure that it doesn't receive a discard that spans a 'max_discard_sectors' boundary. Introduce a dm_target 'max_discard_granularity' flag that if set will cause DM core to split discard bios relative to 'max_discard_sectors'. This treats 'discard_granularity' as a "min_discard_granularity" and 'max_discard_sectors' as a "max_discard_granularity". Requested-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
666eed4676 |
dm: fix __send_duplicate_bios() to always allow for splitting IO
Commit |
||
![]() |
f7b58a69fa |
dm: fix improper splitting for abnormal bios
"Abnormal" bios include discards, write zeroes and secure erase. By no longer passing the calculated 'len' pointer, commit |
||
![]() |
5ad4fe9613 |
- Fix DM thin to work as a swap device by using 'limit_swap_bios' DM
target flag (initially added to allow swap to dm-crypt) to throttle the amount of outstanding swap bios. - Fix DM crypt soft lockup warnings by calling cond_resched() from the cpu intensive loop in dmcrypt_write(). - Fix DM crypt to not access an uninitialized tasklet. This fix allows for consistent handling of IO completion, by _not_ needlessly punting to a workqueue when tasklets are not needed. - Fix DM core's alloc_dev() initialization for DM stats to check for and propagate alloc_percpu() failure. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmQd2p0ACgkQxSPxCi2d A1qvXAf/WCMNXRbFhO35QqukBqS7sUOMfWl1hIEdABRu+3Ul1KHBWzXVYWuWgebw kr79V3LZG63cLvhreCy64X/0tXLZa0c0AGWZI6rJ/QAozSCs9R8BqOrJnB5GT1o9 /lvmOL31MloMnIKArWseIQViNM97gEHmFpuj0saqitcvNTjjipzxq/wOyhmDQwnE 8rxJpKSHBJXs9X/VyM9FTWxtijTQw3c8wxJJo7eV6TTuLyrErm46tyI1cBQ4vDoa ogMVWVrf51uTsqL6DqGenDc+kO7CH5lipIJij1bTtKgs3aBNlaiZQC1nPkMST9Ue hpH61ixAg+bsWi4/xLFafCl6QAGMlA== =71ya -----END PGP SIGNATURE----- Merge tag 'for-6.3/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM thin to work as a swap device by using 'limit_swap_bios' DM target flag (initially added to allow swap to dm-crypt) to throttle the amount of outstanding swap bios. - Fix DM crypt soft lockup warnings by calling cond_resched() from the cpu intensive loop in dmcrypt_write(). - Fix DM crypt to not access an uninitialized tasklet. This fix allows for consistent handling of IO completion, by _not_ needlessly punting to a workqueue when tasklets are not needed. - Fix DM core's alloc_dev() initialization for DM stats to check for and propagate alloc_percpu() failure. * tag 'for-6.3/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm stats: check for and propagate alloc_percpu failure dm crypt: avoid accessing uninitialized tasklet dm crypt: add cond_resched() to dmcrypt_write() dm thin: fix deadlock when swapping to thin device |
||
![]() |
d3aa3e060c |
dm stats: check for and propagate alloc_percpu failure
Check alloc_precpu()'s return value and return an error from
dm_stats_init() if it fails. Update alloc_dev() to fail if
dm_stats_init() does.
Otherwise, a NULL pointer dereference will occur in dm_stats_cleanup()
even if dm-stats isn't being actively used.
Fixes:
|
||
![]() |
5f27571382 |
block: count 'ios' and 'sectors' when io is done for bio-based device
While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.
Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.
Fixes:
|
||
![]() |
d695e44157 |
dm: remove unnecessary (void*) conversion in event_callback()
Pointer variables of void * type do not require type cast. Signed-off-by: XU pengfei <xupengfei@nfschina.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
f77692d65d |
dm: add cond_resched() to dm_wq_requeue_work()
Otherwise the while() loop in dm_wq_requeue_work() can result in a
"dead loop" on systems that have preemption disabled. This is
particularly problematic on single cpu systems.
Fixes:
|
||
![]() |
0ca44fcef2 |
dm: add cond_resched() to dm_wq_work()
Otherwise the while() loop in dm_wq_work() can result in a "dead loop" on systems that have preemption disabled. This is particularly problematic on single cpu systems. Cc: stable@vger.kernel.org Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
![]() |
0b22ff5360 |
dm: remove flush_scheduled_work() during local_exit()
Commit |