Re: [git pull] vfs.git sysv pile

From: Jan Kara
Date: Mon Mar 20 2023 - 08:47:56 EST


On Mon 20-03-23 12:18:38, Fabio M. De Francesco wrote:
> On giovedì 16 marzo 2023 11:30:21 CET Fabio M. De Francesco wrote:
> > On giovedì 16 marzo 2023 10:00:35 CET Jan Kara wrote:
> > > On Wed 15-03-23 19:08:57, Fabio M. De Francesco wrote:
> > > > On mercoledì 1 marzo 2023 15:14:16 CET Al Viro wrote:
> > > > > On Wed, Mar 01, 2023 at 02:00:18PM +0100, Jan Kara wrote:
> > > > > > On Wed 01-03-23 12:20:56, Fabio M. De Francesco wrote:
> > > > > > > On venerdì 24 febbraio 2023 04:26:57 CET Al Viro wrote:
> > > > > > > > Fabio's "switch to kmap_local_page()" patchset (originally
> >
> > after
> >
> > > > > > > > the
> > > > > > > >
> > > > > > > > ext2 counterpart, with a lot of cleaning up done to it; as the
> > > > > > > > matter
> > > >
> > > > of
> > > >
> > > > > > > > fact, ext2 side is in need of similar cleanups - calling
> >
> > conventions
> >
> > > > > > > > there
> > > > > > > > are bloody awful).
> > > >
> > > > [snip]
> > > >
> > > > > I think I've pushed a demo patchset to vfs.git at some point back in
> > > > > January... Yep - see #work.ext2 in there; completely untested, though.
> > > >
> > > > The following commits from the VFS tree, #work.ext2 look good to me.
> > > >
> > > > f5b399373756 ("ext2: use offset_in_page() instead of open-coding it as
> > > > subtraction")
> > > > c7248e221fb5 ("ext2_get_page(): saner type")
> > > > 470e54a09898 ("ext2_put_page(): accept any pointer within the page")
> > > > 15abcc147cf7 ("ext2_{set_link,delete_entry}(): don't bother with
> >
> > page_addr")
> >
> > > > 16a5ee2027b7 ("ext2_find_entry()/ext2_dotdot(): callers don't need
> >
> > page_addr
> >
> > > > anymore")
> > > >
> > > > Reviewed-by: Fabio M. De Francesco <fmdefrancesco@xxxxxxxxx>
> > >
> > > Thanks!
> > >
> > > > I could only read the code but I could not test it in the same QEMU/KVM
> > > > x86_32 VM where I test all my HIGHMEM related work.
> > > >
> > > > Btrfs as well as all the other filesystems I converted to
> >
> > kmap_local_page()
> >
> > > > don't make the processes in the VM to crash, whereas the xfstests on
> ext2
> > > > trigger the OOM killer at random tests (only sometimes they exit
> > > > gracefully).
> > > >
> > > > FYI, I tried to run the tests with 6GB of RAM, booting a kernel with
> > > > HIGHMEM64GB enabled. I cannot add my "Tested-by" tag.
> > >
> > > Hum, interesting. Reading your previous emails this didn't seem to happen
> > > before applying this series, did it?
> >
> > I wrote too many messages but was probably not able to explain the facts
> > properly. Please let me summarize...
> >
> > 1) When testing ext2 with "./check -g quick" in a QEMU/KVM x86_32 VM, 6GB
> RAM,
> > booting a Vanilla kernel 6.3.0-rc1 with HIGHMEM64GB enabled, the OOM Killer
> > kicks in at random tests _with_ and _without_ Al's patches.
> >
> > 2) The only case which does never trigger the OOM Killer is running the
> tests
> > on ext2 formatted filesystems in loop disks with the stock openSUSE kernel
> > which is the 6.2.1-1-pae.
> >
> > 3) The same "./check -g quick" on 6.3.0-rc1 runs always to completion with
> > other filesystems. I ran xfstests several times on Btrfs and I had no
> > problems.
> >
> > 4) I cannot git-bisect this issue with ext2 because I cannot trust the
> results
> > on any particular Kernel version. I mean that I cannot mark any specific
> > version neither "good" or "bad" because it happens that the same "good"
> > version instead make xfstests crash at the next run.
> >
> > My conclusion is that we probably have some kind of race that makes the
> random
> > tests crash at random runs of random Kernel versions between (at least) SUSE
> > 6.2.1 and Vanilla current.
> >
> > But it may be very well the case that I'm doing something stupid (e.g., with
> > QEMU configuration or setup_disks or I can't imagine whatever else) and that
> > I'm unable to see where I make mistakes. After all, I'm still a newcomer
> with
> > little experience :-)
> >
> > Therefore, I'd suggest that someone else try to test ext2 in an x86_32 VM.
> > However, I'm 99.5% sure that Al's patches are good by the mere inspection of
> > his code.
> >
> > I hope that this summary contains everything that may help.
> >
> > However, I remain available to provide any further information and to give
> my
> > contribution if you ask me for specific tasks.
> >
> > For my part I have no idea how to investigate what is happening. In these
> > months I have run the VM hundreds of times on the most disparate filesystems
> > to test my conversions to kmap_local_page() and I have never seen anything
> > like this happen.
> >
> > Thanks,
> >
> > Fabio
> >
> >
> > Honza
> >
> > > --
> > > Jan Kara <jack@xxxxxxxx>
> > > SUSE Labs, CR
>
> I can't yet figure out which conditions lead to trigger the OOM Killer to kill
> the XFCE Desktop Environment, and the xfstests (which I usually run into the
> latter). After all, reserving 6GB of main memory to a QEMU/KVM x86_32 VM had
> always been more than adequate.
>
> So, I thought I'd better ignore that 6GB for a 32 bit architecture are a
> notable amount of RAM and squeezed some more from the host until I went to
> reserve 8GB. I know that this is not what who is able to find out what
> consumes so much main memory would do, but wanted to get the output from the
> tests, one way or the other... :-(
>
> OK, I could finally run my tests to completion and had no crashes at all. I
> ran "./check -g quick" on one "test" + three "scratch" loop devices formatted
> with "mkfs.ext2 -c". I ran three times _with_ and then three times _without_
> Al's following patches cloned from his vfs tree, #work.ext2 branch:
>
> f5b399373756 ("ext2: use offset_in_page() instead of open-coding it as
> subtraction")
> c7248e221fb5 ("ext2_get_page(): saner type")
> 470e54a09898 ("ext2_put_page(): accept any pointer within the page")
> 15abcc147cf7 ("ext2_{set_link,delete_entry}(): don't bother with page_addr")
> 16a5ee2027b7 ("ext2_find_entry()/ext2_dotdot(): callers don't need
>
> All the six tests were no longer killed by the Kernel :-)
>
> I got 144 failures on 597 tests, regardless of the above listed patches.
>
> My final conclusion is that these patches don't introduce regressions. I see
> several tests that produce memory leaks but, I want to stress it again, the
> failing tests are always the same with and without the patches.
>
> therefore, I think that now I can safely add my tag to all five patches listed
> above...
>
> Tested-by: Fabio M. De Francesco <fmdefrancesco@xxxxxxxxx>

Thanks for the effort! Al, will you submit these patches or should I just
pull your branch into my tree?

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR