Re: [PATCH v2] fuse: In fuse_flush only wait if someone wants the return code

From: Miklos Szeredi
Date: Wed Oct 26 2022 - 05:02:11 EST


On Fri, 30 Sept 2022 at 18:10, Tycho Andersen <tycho@tycho.pizza> wrote:
>
> On Fri, Sep 30, 2022 at 04:41:37PM +0200, Miklos Szeredi wrote:
> > On Fri, 30 Sept 2022 at 16:01, Tycho Andersen <tycho@tycho.pizza> wrote:
> > >
> > > On Fri, Sep 30, 2022 at 03:35:16PM +0200, Miklos Szeredi wrote:
> > > > On Thu, 29 Sept 2022 at 18:40, Tycho Andersen <tycho@tycho.pizza> wrote:
> > > > >
> > > > > If a fuse filesystem is mounted inside a container, there is a problem
> > > > > during pid namespace destruction. The scenario is:
> > > > >
> > > > > 1. task (a thread in the fuse server, with a fuse file open) starts
> > > > > exiting, does exit_signals(), goes into fuse_flush() -> wait
> > > >
> > > > Can't the same happen through
> > > >
> > > > fuse_flush -> fuse_sync_writes -> fuse_set_nowrite -> wait
> > > >
> > > > ?
> > >
> > > Looks like yes, though I haven't seen this in the wild, I guess
> > > because there aren't multiple writers most of the time the user code
> > > that causes this.
> > >
> > > I'm not exactly sure how to fix this. Reading through 3be5a52b30aa
> > > ("fuse: support writable mmap"), we don't want to allow multiple
> > > writes since that may do allocations, which could cause deadlocks. But
> > > in this case we have no reliable way to wait (besides a busy loop, I
> > > suppose).
> > >
> > > Maybe just a check for PF_EXITING and a pr_warn() with "echo 1 >
> > > /sys/fs/fuse/connections/$N/abort" or something?
> >
> > AFAICS it should be perfectly normal (and trivial to trigger) for an
> > exiting process to have its dirty pages flushed through fuse_flush().
>
> Agreed.
>
> > We could do that asynchronously as well, generally there are no
> > promises about dirty pages being synced as part of the process exiting
> > . But ordering between dirty page flushing and sending the FUSE_FLUSH
> > request should be kept. Which needs more complexity, unfortunately.
>
> How can we wait in fuse_set_nowrite()? Or are you suggesting we just
> do a fuse_flush_writepages() in the async part and hope for the best?

I was thinking along the lines of calling schedule_work() in the
exiting case to do the flush.

Thanks,
Miklos