linux-yocto/fs/btrfs
Josef Bacik 5111b14836 btrfs: adjust subpage bit start based on sectorsize
[ Upstream commit e08e49d986 ]

When running machines with 64k page size and a 16k nodesize we started
seeing tree log corruption in production.  This turned out to be because
we were not writing out dirty blocks sometimes, so this in fact affects
all metadata writes.

When writing out a subpage EB we scan the subpage bitmap for a dirty
range.  If the range isn't dirty we do

	bit_start++;

to move onto the next bit.  The problem is the bitmap is based on the
number of sectors that an EB has.  So in this case, we have a 64k
pagesize, 16k nodesize, but a 4k sectorsize.  This means our bitmap is 4
bits for every node.  With a 64k page size we end up with 4 nodes per
page.

To make this easier this is how everything looks

[0         16k       32k       48k     ] logical address
[0         4         8         12      ] radix tree offset
[               64k page               ] folio
[ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers
[ | | | |  | | | |   | | | |   | | | | ] bitmap

Now we use all of our addressing based on fs_info->sectorsize_bits, so
as you can see the above our 16k eb->start turns into radix entry 4.

When we find a dirty range for our eb, we correctly do bit_start +=
sectors_per_node, because if we start at bit 0, the next bit for the
next eb is 4, to correspond to eb->start 16k.

However if our range is clean, we will do bit_start++, which will now
put us offset from our radix tree entries.

In our case, assume that the first time we check the bitmap the block is
not dirty, we increment bit_start so now it == 1, and then we loop
around and check again.  This time it is dirty, and we go to find that
start using the following equation

	start = folio_start + bit_start * fs_info->sectorsize;

so in the case above, eb->start 0 is now dirty, and we calculate start
as

	0 + 1 * fs_info->sectorsize = 4096
	4096 >> 12 = 1

Now we're looking up the radix tree for 1, and we won't find an eb.
What's worse is now we're using bit_start == 1, so we do bit_start +=
sectors_per_node, which is now 5.  If that eb is dirty we will run into
the same thing, we will look at an offset that is not populated in the
radix tree, and now we're skipping the writeout of dirty extent buffers.

The best fix for this is to not use sectorsize_bits to address nodes,
but that's a larger change.  Since this is a fs corruption problem fix
it simply by always using sectors_per_node to increment the start bit.

Fixes: c4aec299fa ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-09 18:54:19 +02:00
..
tests btrfs: tests: allocate dummy fs_info and root in test_find_delalloc() 2024-08-29 17:30:37 +02:00
acl.c
async-thread.c
async-thread.h
backref.c btrfs: fix information leak in btrfs_ioctl_logical_to_ino() 2024-05-02 16:29:28 +02:00
backref.h
block-group.c btrfs: zoned: do not remove unwritten non-data block group 2025-08-28 16:26:04 +02:00
block-group.h btrfs: add and use helper to check if block group is used 2024-02-23 09:12:28 +01:00
block-rsv.c btrfs: calculate the right space for delayed refs when updating global reserve 2024-09-30 16:23:55 +02:00
block-rsv.h btrfs: calculate the right space for delayed refs when updating global reserve 2024-09-30 16:23:55 +02:00
btrfs_inode.h btrfs: fix race between setting last_dir_index_offset and inode logging 2025-09-09 18:54:12 +02:00
check-integrity.c
check-integrity.h
compression.c btrfs: fix extent map use-after-free when adding pages to compressed bio 2024-09-04 13:25:00 +02:00
compression.h
ctree.c btrfs: abort transaction on unexpected eb generation at btrfs_copy_root() 2025-08-28 16:26:11 +02:00
ctree.h btrfs: rename and export __btrfs_cow_block() 2025-01-09 13:30:03 +01:00
delalloc-space.c btrfs: don't reserve space for checksums when writing to nocow files 2024-02-23 09:12:29 +01:00
delalloc-space.h
delayed-inode.c btrfs: change BUG_ON to assertion when checking for delayed_node root 2024-08-29 17:30:37 +02:00
delayed-inode.h btrfs: fix infinite directory reads 2024-01-31 16:17:05 -08:00
delayed-ref.c btrfs: reinitialize delayed ref list after deleting it from the list 2024-11-14 13:15:17 +01:00
delayed-ref.h btrfs: calculate the right space for delayed refs when updating global reserve 2024-09-30 16:23:55 +02:00
dev-replace.c btrfs: dev-replace: properly validate device names 2024-03-06 14:45:10 +00:00
dev-replace.h
dir-item.c btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item() 2024-11-01 01:56:06 +01:00
discard.c btrfs: make btrfs_discard_workfn() block_group ref explicit 2025-06-04 14:40:04 +02:00
discard.h
disk-io.c btrfs: handle csum tree error with rescue=ibadroots correctly 2025-07-06 10:57:56 +02:00
disk-io.h btrfs: fix double free of anonymous device after snapshot creation failure 2024-03-06 14:45:10 +00:00
export.c btrfs: export: handle invalid inode or root reference in btrfs_get_parent() 2024-04-13 13:05:01 +02:00
export.h
extent_io.c btrfs: adjust subpage bit start based on sectorsize 2025-09-09 18:54:19 +02:00
extent_io.h
extent_map.c btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range() 2024-06-21 14:35:38 +02:00
extent_map.h
extent-io-tree.c
extent-io-tree.h
extent-tree.c btrfs: don't BUG_ON() when 0 reference count at btrfs_lookup_extent_info() 2025-05-22 14:10:09 +02:00
file-item.c btrfs: mark the len field in struct btrfs_ordered_sum as unsigned 2024-01-10 17:10:35 +01:00
file.c btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range() 2025-05-02 07:46:54 +02:00
free-space-cache.c btrfs: zoned: properly take lock to read/update block group's zoned variables 2024-08-29 17:30:15 +02:00
free-space-cache.h
free-space-tree.c btrfs: fix assertion when building free space tree 2025-07-17 18:32:14 +02:00
free-space-tree.h
inode-item.c btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
inode-item.h btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
inode.c btrfs: fix race between setting last_dir_index_offset and inode logging 2025-09-09 18:54:12 +02:00
ioctl.c btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations 2024-12-14 19:53:56 +01:00
Kconfig
locking.c
locking.h
lzo.c
Makefile
misc.h
ordered-data.c btrfs: fix qgroup reservation leak on failure to allocate ordered extent 2025-08-28 16:26:10 +02:00
ordered-data.h btrfs: mark the len field in struct btrfs_ordered_sum as unsigned 2024-01-10 17:10:35 +01:00
orphan.c
print-tree.c btrfs: avoid using fixed char array size for tree names 2024-08-14 13:52:59 +02:00
print-tree.h
props.c
props.h
qgroup.c btrfs: qgroup: fix race between quota disable and quota rescan ioctl 2025-08-28 16:26:11 +02:00
qgroup.h btrfs: fix qgroup_free_reserved_data int overflow 2024-01-10 17:10:35 +01:00
raid56.c
raid56.h
rcu-string.h
ref-verify.c btrfs: ref-verify: fix use-after-free after invalid ref action 2024-12-14 19:54:10 +01:00
ref-verify.h
reflink.c btrfs: replace sb::s_blocksize by fs_info::sectorsize 2024-08-29 17:30:42 +02:00
reflink.h
relocation.c btrfs: do not allow relocation of partially dropped subvolumes 2025-08-28 16:26:04 +02:00
root-tree.c btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations 2024-12-14 19:53:56 +01:00
scrub.c btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning 2024-07-11 12:47:10 +02:00
send.c btrfs: send: use fallocate for hole punching with send stream v2 2025-08-28 16:26:11 +02:00
send.h
space-info.c btrfs: zoned: fix zone_unusable accounting on making block group read-write again 2024-08-11 12:36:00 +02:00
space-info.h btrfs: zoned: fix zone_unusable accounting on making block group read-write again 2024-08-11 12:36:00 +02:00
struct-funcs.c
subpage.c
subpage.h
super.c btrfs: correctly escape subvol in btrfs_show_options() 2025-04-25 10:43:53 +02:00
sysfs.c btrfs: sysfs: fix direct super block member reads 2025-01-02 10:30:55 +01:00
sysfs.h
transaction.c btrfs: fix use-after-free when attempting to join an aborted transaction 2025-02-21 13:49:29 +01:00
transaction.h btrfs: fix race between direct IO write and fsync when using same fd 2024-09-12 11:10:29 +02:00
tree-checker.c btrfs: tree-checker: reject inline extent items with 0 ref count 2024-12-27 13:52:59 +01:00
tree-checker.h
tree-defrag.c
tree-log.c btrfs: avoid load/store tearing races when checking if an inode was logged 2025-09-09 18:54:12 +02:00
tree-log.h btrfs: use struct fscrypt_str instead of struct qstr 2023-10-10 22:00:36 +02:00
tree-mod-log.c
tree-mod-log.h
ulist.c
ulist.h
uuid-tree.c
verity.c
volumes.c btrfs: update superblock's device bytes_used when dropping chunk 2025-07-06 10:58:01 +02:00
volumes.h btrfs: add a helper to read the superblock metadata_uuid 2023-09-23 11:11:08 +02:00
xattr.c
xattr.h
zlib.c
zoned.c btrfs: zoned: use filesystem size not disk size for reclaim decision 2025-08-28 16:26:04 +02:00
zoned.h
zstd.c