[PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing

From: Baokun Li
Date: Wed Aug 17 2022 - 09:16:23 EST


We got a issue: the ext4 writeback process was stuck in do_writepages and
do_writepages kept retrying. However, '-ENOMEM' is returned each time, even
if there is still free memory on the current machine.

We find that the direct cause of this issue is that the bg_inode_table_hi
in the group descriptor is written to an incorrect value, which causes the
inode block found through the inode table to exceed the end_ block。Then,
sb_getblk always returns null, __ext4_get_inode_loc returns `-ENOMEM`,
and do_writepages keeps retrying.

The root cause is that the GDT is overwritten when the backup superblock
is updated in the online resizing process of the disk. The prerequisite is
that the block size of the disk is 1024, bigalloc and meta_bg are enabled,
and sparse_super is disabled.

Therefore, the check on inode_table is added to __ext4_get_inode_loc by
referring to the check on inode_bitmap in ext4_read_inode_bitmap to avoid
infinite loops in similar cases. In addition, the offset of the backup
super block in the group in the above case is also corrected to avoid some
strange problems caused by the GDT being overwritten.

Baokun Li (2):
ext4: fix GDT corruption after online resizing with bigalloc enable
and blocksize is 1024
ext4: add inode table check in __ext4_get_inode_loc to aovid possible
infinite loop

fs/ext4/inode.c | 10 +++++++++-
fs/ext4/resize.c | 6 +++++-
2 files changed, 14 insertions(+), 2 deletions(-)

--
2.31.1