Re: dcache_readdir NULL inode oops

From: Jan Glauber
Date: Wed Nov 21 2018 - 08:19:13 EST


On Tue, Nov 20, 2018 at 07:03:17PM +0000, Will Deacon wrote:
> On Tue, Nov 20, 2018 at 06:28:54PM +0000, Will Deacon wrote:
> > On Sat, Nov 10, 2018 at 11:17:03AM +0000, Jan Glauber wrote:
> > > On Fri, Nov 09, 2018 at 03:58:56PM +0000, Will Deacon wrote:
> > > > On Fri, Nov 09, 2018 at 02:37:51PM +0000, Jan Glauber wrote:
> > > > > I'm seeing the following oops reproducible with upstream kernel on arm64
> > > > > (ThunderX2):
> > > >
> > > > [...]
> > > >
> > > > > It happens after 1-3 hours of running 'stress-ng --dev 128'. This testcase
> > > > > does a scandir of /dev and then calls random stuff like ioctl, lseek,
> > > > > open/close etc. on the entries. I assume no files are deleted under /dev
> > > > > during the testcase.
> > > > >
> > > > > The NULL pointer is the inode pointer of next. The next dentry->d_flags is
> > > > > DCACHE_RCUACCESS when this happens.
> > > > >
> > > > > Any hints on how to further debug this?
> > > >
> > > > Can you reproduce the issue with vanilla -rc1 and do you have a "known good"
> > > > kernel?
> > >
> > > I can try out -rc1, but IIRC this wasn't bisectible as the bug was present at
> > > least back to 4.14. I need to double check that as there were other issues
> > > that are resolved now so I may confuse things here. I've defintely seen
> > > the same bug with 4.18.
> > >
> > > Unfortunately I lost access to the machine as our data center seems to be
> > > moving currently so it might take some days until I can try -rc1.
> >
> > Ok, I've just managed to reproduce this in a KVM guest running v4.20-rc3 on
> > both the host and the guest, so if anybody has any ideas of things to try then
> > I'm happy to give them a shot. In the meantime, I'll try again with a bunch of
> > debug checks enabled.

Hi Will,

good that you can reproduce the issue. I've verified that the issue is
indeed reproducible with 4.14.

>
> Weee, I eventually hit a use-after-free from KASAN. See below.

I ran KASAN (and all the other debug stuff) but didn't trigger anything
in the host.

--Jan