Re: [PATCH] mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty()

From: Michal Hocko
Date: Mon Jan 30 2023 - 03:49:24 EST


On Mon 30-01-23 09:16:13, Kefeng Wang wrote:
>
>
> On 2023/1/30 5:48, Andrew Morton wrote:
> > On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> >
> > > As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
> >
> > Merged in 2017.
> >
> > > hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
> > > could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
> > > occurs a NULL pointer dereference, let's do not record the foreign
> > > writebacks for folio memcg is null in mem_cgroup_track_foreign() to
> > > fix it.
> > >
> > > Reported-by: Ma Wupeng <mawupeng1@xxxxxxxxxx>
> > > Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
> >
> > Merged in 2019.
> >
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -1688,10 +1688,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct folio *folio,
> > > static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
> > > struct bdi_writeback *wb)
> > > {
> > > + struct mem_cgroup *memcg;
> > > +
> > > if (mem_cgroup_disabled())
> > > return;
> > > - if (unlikely(&folio_memcg(folio)->css != wb->memcg_css))
> > > + memcg = folio_memcg(folio);
> > > + if (unlikely(memcg && &memcg->css != wb->memcg_css))
> > > mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
> > > }
> >
> > Has this null deref actually been observed, or is this from code
> > inspection? (This is why it's nice to include the Link: after a
> > Reported-by!)
>
> It does occurs in our internal test and report by wupeng(based v5.10),
>
> BUG: KASAN: user-memory-access in
> mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
> Read of size 8 at addr 0000000000001000 by task syz-executor.2/28325
>
> CPU: 2 PID: 28325 Comm: syz-executor.2 Tainted: G W
> 5.10.0-03333-g48e46a146cbc-dirty #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> ...
> mem_cgroup_track_foreign_dirty_slowpath+0xc0/0x480 mm/memcontrol.c:4708
> mem_cgroup_track_foreign_dirty include/linux/memcontrol.h:1880 [inline]
> account_page_dirtied+0x9a0/0xa90 mm/page-writeback.c:2436
> __set_page_dirty+0x1f8/0x4b0 fs/buffer.c:608
> __set_page_dirty_buffers+0x3d0/0x550 fs/buffer.c:668
> set_page_dirty+0x158/0x500 mm/page-writeback.c:2575
> filemap_page_mkwrite+0x3dc/0x490 mm/filemap.c:3224
> do_page_mkwrite+0xc4/0x3d0 mm/memory.c:2786
> wp_page_shared+0x14c/0x980 mm/memory.c:3118
> do_wp_page+0x930/0xbc0 mm/memory.c:3219
> handle_pte_fault+0x5e0/0x630 mm/memory.c:4570
> __handle_mm_fault+0x41c/0x910 mm/memory.c:4690
> handle_mm_fault+0x25c/0x484 mm/memory.c:4788
> __do_page_fault arch/arm64/mm/fault.c:440 [inline]
> do_page_fault+0x3ac/0x9d4 arch/arm64/mm/fault.c:539

Just to make sure I understand. The page has been hwpoisoned, uncharged
but stayed in the page cache so a next page fault on the address has blowned
up?

Say we address the NULL memcg case. What is the resulting behavior?
Doesn't userspace access a poisoned page and get a silend memory
corruption?
--
Michal Hocko
SUSE Labs