Re: [BUG] ext4/block null pointer crashes in linux-next

From: valdis . kletnieks
Date: Wed Oct 17 2018 - 11:47:48 EST


On Tue, 16 Oct 2018 14:25:13 -0400, Dennis Zhou said:

> > > grep execve /root/rpm-exec-strace
> > > execve("/usr/bin/rpm", ["rpm", "-Uvh", "--force", "dracut-049-4.git20181010.fc30.x8"...], 0x7ffc9d967d80 /* 33 vars */) = 0

> > Thanks for testing and reporting this! Do you mind sending me your
> > reproducer?

See above. An 'rpm' command blows it up....

> I've spent some time thinking about this, and this is my guess at what
> is happening without seeing your reproducer. The system is under memory
> pressure and a new cgroup is being created. The cgroup allocation fails
> causing the request_list code to fallback and walk up the blkg tree.
> There is special handling for the root cgroup, but I missed that case
> and it fails there I believe.

Hmm... I boot to single-user, do a cd, and run 'rpm -Uvh --force' on an RPM
that was already installed. (I originally hit this with 'dnf', but running 'dnf
update' wouldn't trigger a crash if the system was up to date. To make a
bisect workable, I ended up using RPM to re-install an already installed
package or 3 triggered it as well.

That's a consistent reproducer for me. rpm does an execve() (actually,
it does 5), and one of them goes kablam. I've also managed to hit it
once doing an 'rm'.

And my laptop has 16G of ram. Shouldn't be any memory pressure at all in
single-user mode. So it looks like you fixed a bug, but not the one I was hitting.

> In addition to sending me the reproducer and your config, can you please
> try the patch below?

Tried the patch, didn't make a difference. So there's at least one more bug
out there to find. :)

Config attached.

Attachment: config-next-20181016.gz
Description: config-next-20181016.gz

Attachment: pgp6JZhRjlrlf.pgp
Description: PGP signature