Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

From: Ian Kent
Date: Sun Dec 04 2016 - 21:50:39 EST


On Sun, 2016-12-04 at 10:18 +0800, Ian Kent wrote:
> On Sat, 2016-12-03 at 23:29 +0000, Al Viro wrote:
> >
> > On Sat, Dec 03, 2016 at 05:13:22AM +0000, Al Viro wrote:
> > >
> > >
> > > * path_has_submounts() is broken.ÂÂAt the very least, it's
> > > AB-BA between mount_lock and rename_lock.ÂÂI would suggest trying to
> > > put read_seqlock_excl(&mount_lock) around the call of d_walk() in there,
> > > and using __lookup_mnt() in the callback (without retries on the
> > > mount_lock,
> > > of course - read_seqlock_excl done on the outside is enough).ÂÂI'm not
> > > sure
> > > if it won't cause trouble with contention, though; that needs testing.ÂÂAs
> > > it is, that function is broken in #work.autofs, same as it is in -mm and
> > > -next.
> > Fix for path_has_submounts() (as above) force-pushed.ÂÂIt does
> > need testing and profiling, obviously.
> I'll run my tests against it and re-run with oprofile if all goes well.
>
> The submount-test I use should show contention if it's a problem but I'm not
> sure the number of mounts used will be sufficient to show up scale problems.
>
> Basically each case of the test (there are two) runs for 100 iterations using
> 10
> processes with timing set to promote expire to mount contention, mainly to
> test
> for expire to mount races.
>
> If I don't see contention I might need to analyse further whether the test has
> adequate coverage.

I have run my usual tests on a build of vfs.get#work.autofs.

Both tests ran without problem multiple times (even though they should be
deterministic experience had shown they sometimes aren't).

I also did a system wide oprofile run of an additional run of the submount-test.

I hadn't noticed that the submount-test runs are shorter that I have come to
expect. This might be due to changes to the behaviour of the expire and mount
timing over time. ÂBut I always check that I'm seeing expire and mount behaviour
that should lead to contention during the test run and that looked as expected.

The run time was about 55 minutes for each of the two cases I test whereas I had
come to expect a run time of around 70 minutes each. It's been quite a while
since I've actually paid attention to this and a lot has changed in the VFS
since.

It might be due to Neil Browns' changes to satisfy path walks in rcu-walk mode
where possible rather than always dropping into ref-walk mode as I didn't
profile those changes when they were implemented because I didn't notice
unexpectedly different run times when testing the changes.

I was going to talk about why the autofs usage of path_has_submounts() should
not be problematic but that would be tedious for everyone, instead I'll just say
that, clearly it is possible to abuse path_has_submounts() by calling it on
mount points that have a large number of directories, but autofs shouldn't do
that and the profile verifies this.

I'm not sure how accurate the profile is (I don't do it very often). There were
about 400000 out of 22000000 samples missed.

And hopefully the report output is readable enough to be useful after posting
....

Anyway, a system wide opreport (such as it is) showed this:

CPU: Intel Haswell microarchitecture, speed 3700 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samplesÂÂ%ÂÂÂÂÂÂÂÂimage nameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂapp nameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂsymbol name
3897186ÂÂ17.6585ÂÂlibc-2.22.soÂÂÂÂÂÂÂÂÂÂÂÂÂoperfÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__strcmp_sse2_unaligned
3125714ÂÂ14.1629ÂÂlibstdc++.so.6.0.21ÂÂÂÂÂÂoperfÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂstd::_Rb_tree_increment(std::_Rb_tree_node_base const*)
1500262ÂÂÂ6.7978ÂÂlibsqlite3.so.0.8.6ÂÂÂÂÂÂevolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/lib64/libsqlite3.so.0.8.6
856262ÂÂÂÂ3.8798ÂÂoperfÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂoperfÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/bin/operf
749603ÂÂÂÂ3.3965ÂÂlibglib-2.0.so.0.4600.2ÂÂevolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/lib64/libglib-2.0.so.0.4600.2
560812ÂÂÂÂ2.5411ÂÂlibdbus-1.so.3.14.8ÂÂÂÂÂÂdbus-daemonÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/lib64/libdbus-1.so.3.14.8
378227ÂÂÂÂ1.7138ÂÂsystemdÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂsystemdÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/lib/systemd/systemd
301175ÂÂÂÂ1.3647ÂÂlibcamel-1.2.so.54.0.0ÂÂÂevolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/lib64/libcamel-1.2.so.54.0.0
223545ÂÂÂÂ1.0129ÂÂdbus-daemonÂÂÂÂÂÂÂÂÂÂÂÂÂÂdbus-daemonÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/bin/dbus-daemon
187783ÂÂÂÂ0.8509ÂÂlibc-2.22.soÂÂÂÂÂÂÂÂÂÂÂÂÂdbus-daemonÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__strcmp_sse2_unaligned
157564ÂÂÂÂ0.7139ÂÂlibc-2.22.soÂÂÂÂÂÂÂÂÂÂÂÂÂevolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__strcmp_sse2_unaligned
157218ÂÂÂÂ0.7124ÂÂlibc-2.22.soÂÂÂÂÂÂÂÂÂÂÂÂÂevolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ_int_malloc
156089ÂÂÂÂ0.7073ÂÂlibgobject-2.0.so.0.4600.2 evolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/lib64/libgobject-2.0.so.0.4600.2
116277ÂÂÂÂ0.5269ÂÂlibc-2.22.soÂÂÂÂÂÂÂÂÂÂÂÂÂevolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ_int_free
116275ÂÂÂÂ0.5269ÂÂlibxul.soÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂfirefoxÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/usr/lib64/firefox/libxul.so
104278ÂÂÂÂ0.4725ÂÂlibc-2.22.soÂÂÂÂÂÂÂÂÂÂÂÂÂevolutionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ__GI_____strtoull_l_internal
92937ÂÂÂÂÂ0.4211ÂÂlibc-2.22.soÂÂÂÂÂÂÂÂÂÂÂÂÂpsÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ_IO_vfscanf
...
101ÂÂÂÂÂÂ4.6e-04ÂÂvmlinux-4.9.0-rc7ÂÂÂÂÂÂÂÂopendirÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂpath_is_mountpoint
...
23ÂÂÂÂÂÂÂ1.0e-04ÂÂvmlinux-4.9.0-rc7ÂÂÂÂÂÂÂÂopendirÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂpath_has_submounts
...

where opendir is a program the test uses to trigger a mount and do various
checks on the result.

Not sure if this is enough or is what's needed, let me know if there's something
else specific I should do.

Ian