Re: [PATCH] f2fs: avoid deadlock in gc thread under low memory

From: Michal Hocko
Date: Wed Apr 13 2022 - 07:35:11 EST


On Wed 13-04-22 19:20:06, Wu Yan wrote:
> On 4/13/22 17:46, Michal Hocko wrote:
> > On Wed 13-04-22 16:44:32, Rokudo Yan wrote:
> > > There is a potential deadlock in gc thread may happen
> > > under low memory as below:
> > >
> > > gc_thread_func
> > > -f2fs_gc
> > > -do_garbage_collect
> > > -gc_data_segment
> > > -move_data_block
> > > -set_page_writeback(fio.encrypted_page);
> > > -f2fs_submit_page_write
> > > as f2fs_submit_page_write try to do io merge when possible, so the
> > > encrypted_page is marked PG_writeback but may not submit to block
> > > layer immediately, if system enter low memory when gc thread try
> > > to move next data block, it may do direct reclaim and enter fs layer
> > > as below:
> > > -move_data_block
> > > -f2fs_grab_cache_page(index=?, for_write=false)
> > > -grab_cache_page
> > > -find_or_create_page
> > > -pagecache_get_page
> > > -__page_cache_alloc -- __GFP_FS is set
> > > -alloc_pages_node
> > > -__alloc_pages
> > > -__alloc_pages_slowpath
> > > -__alloc_pages_direct_reclaim
> > > -__perform_reclaim
> > > -try_to_free_pages
> > > -do_try_to_free_pages
> > > -shrink_zones
> > > -mem_cgroup_soft_limit_reclaim
> > > -mem_cgroup_soft_reclaim
> > > -mem_cgroup_shrink_node
> > > -shrink_node_memcg
> > > -shrink_list
> > > -shrink_inactive_list
> > > -shrink_page_list
> > > -wait_on_page_writeback -- the page is marked
> > > writeback during previous move_data_block call
> >
> > This is a memcg reclaim path and you would have to have __GFP_ACCOUNT in
> > the gfp mask to hit it from the page allocator. I am not really familiar
> > with f2fs but I doubt it is using this flag.
> >
> > On the other hand the memory is charged to a memcg when the newly
> > allocated page is added to the page cache. That wouldn't trigger the
> > soft reclaim path but that is not really necessary because even the
> > regular memcg reclaim would trigger wait_on_page_writeback for cgroup
> > v1.
> >
> > Also are you sure that the mapping's gfp mask has __GFP_FS set for this
> > allocation? f2fs_iget uses GFP_NOFS like mask for some inode types.
> >
> > All that being said, you will need to change the above call chain but it
> > would be worth double checking the dead lock is real.
>
> Hi, Michal
>
> 1. The issue is occur when do monkey test in Android Device with 4GB RAM +
> 3GB zram, and memory cgroup v1 enabled.
>
> 2. full memory dump has caught when the issue occur and the dead lock has
> confirmed from dump. We can see the mapping->gfp_mask is 0x14200ca,
> so both __GFP_ACCOUNT(0x1000000) and __GFP_FS(0x80) set

This is rather surprising, I have to say because page cache is charged
explicitly (__filemap_add_folio). Are you testing with the upstream
kernel or could this be a non-upstream change possibly?

> crash-arm64> struct inode.i_mapping 0xFFFFFFDFD578EEA0
> i_mapping = 0xffffffdfd578f028,
> crash-arm64> struct address_space.host,gfp_mask -x 0xffffffdfd578f028
> host = 0xffffffdfd578eea0,
> gfp_mask = 0x14200ca,

Anyway, if the __GFP_FS is set then the deadlock is possible even
without __GFP_ACCOUNT.
--
Michal Hocko
SUSE Labs