Re: [git pull] vfs.git sysv pile

From: Fabio M. De Francesco
Date: Thu Mar 16 2023 - 06:30:48 EST


On giovedì 16 marzo 2023 10:00:35 CET Jan Kara wrote:
> On Wed 15-03-23 19:08:57, Fabio M. De Francesco wrote:
> > On mercoledì 1 marzo 2023 15:14:16 CET Al Viro wrote:
> > > On Wed, Mar 01, 2023 at 02:00:18PM +0100, Jan Kara wrote:
> > > > On Wed 01-03-23 12:20:56, Fabio M. De Francesco wrote:
> > > > > On venerdì 24 febbraio 2023 04:26:57 CET Al Viro wrote:
> > > > > > Fabio's "switch to kmap_local_page()" patchset (originally
after
> > > > > > the
> > > > > >
> > > > > > ext2 counterpart, with a lot of cleaning up done to it; as the
> > > > > > matter
> >
> > of
> >
> > > > > > fact, ext2 side is in need of similar cleanups - calling
conventions
> > > > > > there
> > > > > > are bloody awful).
> >
> > [snip]
> >
> > > I think I've pushed a demo patchset to vfs.git at some point back in
> > > January... Yep - see #work.ext2 in there; completely untested, though.
> >
> > The following commits from the VFS tree, #work.ext2 look good to me.
> >
> > f5b399373756 ("ext2: use offset_in_page() instead of open-coding it as
> > subtraction")
> > c7248e221fb5 ("ext2_get_page(): saner type")
> > 470e54a09898 ("ext2_put_page(): accept any pointer within the page")
> > 15abcc147cf7 ("ext2_{set_link,delete_entry}(): don't bother with
page_addr")
> > 16a5ee2027b7 ("ext2_find_entry()/ext2_dotdot(): callers don't need
page_addr
> > anymore")
> >
> > Reviewed-by: Fabio M. De Francesco <fmdefrancesco@xxxxxxxxx>
>
> Thanks!
>
> > I could only read the code but I could not test it in the same QEMU/KVM
> > x86_32 VM where I test all my HIGHMEM related work.
> >
> > Btrfs as well as all the other filesystems I converted to
kmap_local_page()
> > don't make the processes in the VM to crash, whereas the xfstests on ext2
> > trigger the OOM killer at random tests (only sometimes they exit
> > gracefully).
> >
> > FYI, I tried to run the tests with 6GB of RAM, booting a kernel with
> > HIGHMEM64GB enabled. I cannot add my "Tested-by" tag.
>
> Hum, interesting. Reading your previous emails this didn't seem to happen
> before applying this series, did it?
>
I wrote too many messages but was probably not able to explain the facts
properly. Please let me summarize...

1) When testing ext2 with "./check -g quick" in a QEMU/KVM x86_32 VM, 6GB RAM,
booting a Vanilla kernel 6.3.0-rc1 with HIGHMEM64GB enabled, the OOM Killer
kicks in at random tests _with_ and _without_ Al's patches.

2) The only case which does never trigger the OOM Killer is running the tests
on ext2 formatted filesystems in loop disks with the stock openSUSE kernel
which is the 6.2.1-1-pae.

3) The same "./check -g quick" on 6.3.0-rc1 runs always to completion with
other filesystems. I ran xfstests several times on Btrfs and I had no
problems.

4) I cannot git-bisect this issue with ext2 because I cannot trust the results
on any particular Kernel version. I mean that I cannot mark any specific
version neither "good" or "bad" because it happens that the same "good"
version instead make xfstests crash at the next run.

My conclusion is that we probably have some kind of race that makes the random
tests crash at random runs of random Kernel versions between (at least) SUSE
6.2.1 and Vanilla current.

But it may be very well the case that I'm doing something stupid (e.g., with
QEMU configuration or setup_disks or I can't imagine whatever else) and that
I'm unable to see where I make mistakes. After all, I'm still a newcomer with
little experience :-)

Therefore, I'd suggest that someone else try to test ext2 in an x86_32 VM.
However, I'm 99.5% sure that Al's patches are good by the mere inspection of
his code.

I hope that this summary contains everything that may help.

However, I remain available to provide any further information and to give my
contribution if you ask me for specific tasks.

For my part I have no idea how to investigate what is happening. In these
months I have run the VM hundreds of times on the most disparate filesystems
to test my conversions to kmap_local_page() and I have never seen anything
like this happen.

Thanks,

Fabio

>
Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR