Re: [PATCH] ocfs2: fix stale extent map cache during COW operations

From: Heming Zhao
Date: Sat Oct 11 2025 - 00:50:34 EST


Hi Deepanshu and Joseph,

On 10/9/25 22:29, Deepanshu Kartikey wrote:

Hi Joseph,

Thank you for the review. You are absolutely right - the cache clearing at the end of ocfs2_refcount_cow_hunk() should handle the COW path correctly.

After further investigation with the syzbot reproducer and extensive debugging, I found the real issue is in the FITRIM/move_extents code path. The bug occurs when:

1. copy_file_range() creates a reflinked extent with flags=0x2 (OCFS2_EXT_REFCOUNTED)
2. ioctl(FITRIM) is called, which triggers ocfs2_move_extents()
3. In __ocfs2_move_extents_range(), the while loop:
- Calls ocfs2_get_clusters() which reads extent with flags=0x2 and caches it
- Then calls ocfs2_move_extent() or ocfs2_defrag_extent()
- Both eventually call __ocfs2_move_extent() which contains:
replace_rec.e_flags = ext_flags & ~OCFS2_EXT_REFCOUNTED;
- This clears the refcount flag and writes to disk with flags=0x0
4. However, the extent map cache is NOT cleared after the move operation
5. Cache still contains stale flags=0x2 while disk has flags=0x0
6. Later, when write() triggers COW, ocfs2_refcount_cal_cow_clusters() reads:
- From cache: flags=0x2 (stale)
- From disk extent tree: flags=0x0 (correct)
7. The mismatch triggers: BUG_ON(!(rec->e_flags & OCFS2_EXT_REFCOUNTED))

The proper fix should be in __ocfs2_move_extents_range() to clear the extent cache after each move/defrag operation completes. I will send a v2 patch with this fix.

Thanks,
Deepanshu


let's look at the syzbot page [1].
the following analysis is based on the c code from "2025/10/03 12:11" [2].
(btw, syzbot never calls __ocfs2_move_extents_range().)

The test code mainly involves 9 steps:
1. create img data and mount
2. one time open() the file (0x200000000080ul), return fd: r[0]
3. two times open the file (0x200000000280ul), return fds: r[1] r[2]
4. call ioctl F_SETFL 0 on r[1]
5. write r[2] with "0x0000000000000000" len:0xfea0ul //clean data job
6. call r[3] = dup(r[1])
7. do copy_file_range(), copy from r[1] to r[3] len=0xd8c2ul
//creates OCFS2_EXT_REFCOUNTED and create extent cache.
//check ocfs2_remap_file_range() => ocfs2_reflink_remap_extent()
8. trim r[0]
9. write on r[1] //crash.

the root cause is that, in step <9>, it calls ocfs2_refcount_cow():
- the input parameter di_bh is created by the caller via
ocfs2_prepare_inode_for_write() => ocfs2_inode_lock_for_extent_tree() =>
ocfs2_inode_lock_update(), which reads file data from disk.
The extent is without OCFS2_EXT_REFCOUNTED flag because r[1] & r[2] point
to the same file, and step <5> cleanup the file data.
- ocfs2_refcount_cow() then calls ocfs2_get_clusters to retrieve the extent
from cache, which does contain OCFS2_EXT_REFCOUNTED (cooked by step <7>).
- this difference leads to it calling ocfs2_refcount_cow_hunk(), which
triggers a BUG_ON().
I suspect step <7> needs some time to write back the COW data, and syzbot
starts step <9> too quickly before the write-back job start.

how to fix:
the v1 patch is reasonable, but the commit log needs to be revised.

for Joseph's question: (I copied here)
At the end of ocfs2_refcount_cow_hunk(), it has:

/*
* truncate the extent map here since no matter whether we meet with
* any error during the action, we shouldn't trust cached extent map
* any more.
*/
ocfs2_extent_map_trunc(inode, cow_start);

It seems the cached extent record has already been forgotten. So how
does the above step 3 happen?

my answer:
the crash only happens on the first call to ocfs2_refcount_cow_hunk().
ocfs2_extent_map_trunc() does the cleanup later, but the malicious
extent block is cooked before ocfs2_refcount_cow_hunk() is called.

[1] https://syzkaller.appspot.com/bug?extid=6fdd8fa3380730a4b22c
[2] https://syzkaller.appspot.com/text?tag=ReproC&x=163c9214580000

- Heming