Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create

From: Dharmendra Hans
Date: Fri May 06 2022 - 01:34:23 EST


On Thu, May 5, 2022 at 7:56 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> On Thu, May 05, 2022 at 10:21:21AM +0530, Dharmendra Hans wrote:
> > On Wed, May 4, 2022 at 8:17 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > >
> > > On Wed, May 04, 2022 at 09:56:49AM +0530, Dharmendra Hans wrote:
> > > > On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > > > > > From: Dharmendra Singh <dsingh@xxxxxxx>
> > > > > >
> > > > > > When we go for creating a file (O_CREAT), we trigger
> > > > > > a lookup to FUSE USER SPACE. It is very much likely
> > > > > > that file does not exist yet as O_CREAT is passed to
> > > > > > open(). This lookup can be avoided and can be performed
> > > > > > as part of create call into libfuse.
> > > > > >
> > > > > > This lookup + create in single call to libfuse and finally
> > > > > > to USER SPACE has been named as atomic create. It is expected
> > > > > > that USER SPACE create the file, open it and fills in the
> > > > > > attributes which are then used to make inode stand/revalidate
> > > > > > in the kernel cache. Also if file was newly created(does not
> > > > > > exist yet by this time) in USER SPACE then it should be indicated
> > > > > > in `struct fuse_file_info` by setting a bit which is again used by
> > > > > > libfuse to send some flags back to fuse kernel to indicate that
> > > > > > that file was newly created. These flags are used by kernel to
> > > > > > indicate changes in parent directory.
> > > > >
> > > > > Reading the existing code a little bit more and trying to understand
> > > > > existing semantics. And that will help me unerstand what new is being
> > > > > done.
> > > > >
> > > > > So current fuse_atomic_open() does following.
> > > > >
> > > > > A. Looks up dentry (if d_in_lookup() is set).
> > > > > B. If dentry is positive or O_CREAT is not set, return.
> > > > > C. If server supports atomic create + open, use that to create file and
> > > > > open it as well.
> > > > > D. If server does not support atomic create + open, just create file
> > > > > using "mknod" and return. VFS will take care of opening the file.
> > > > >
> > > > > Now with this patch, new flow is.
> > > > >
> > > > > A. Look up dentry if d_in_lookup() is set as well as either file is not
> > > > > being created or fc->no_atomic_create is set. This basiclally means
> > > > > skip lookup if atomic_create is supported and file is being created.
> > > > >
> > > > > B. Remains same. if dentry is positive or O_CREATE is not set, return.
> > > > >
> > > > > C. If server supports new atomic_create(), use that.
> > > > >
> > > > > D. If not, if server supports atomic create + open, use that
> > > > >
> > > > > E. If not, fall back to mknod and do not open file.
> > > > >
> > > > > So to me this new functionality is basically atomic "lookup + create +
> > > > > open"?
> > > > >
> > > > > Or may be not. I see we check "fc->no_create" and fallback to mknod.
> > > > >
> > > > > if (fc->no_create)
> > > > > goto mknod;
> > > > >
> > > > > So fc->no_create is representing both old atomic "create + open" as well
> > > > > as new "lookup + create + open" ?
> > > > >
> > > > > It might be obvious to you, but it is not to me. So will be great if
> > > > > you shed some light on this.
> > > > >
> > > >
> > > > I think you got it right now. New atomic create does what you
> > > > mentioned as new flow. It does lookup + create + open in single call
> > > > (being called as atomic create) to USER SPACE.mknod is a special case
> > >
> > > Ok, naming is little confusing. I think we will have to put it in
> > > commit message and where you define FUSE_ATOMIC_CREATE that what's
> > > the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> > > ATOMIC w.r.t what?
> >
> > Sure, I would update the commit message to make the distinction clear
> > between the two. This operation is atomic w.r.t to USER SPACE FUSE
> > implementations. i.e USER SPACE would be performing all these
> > operations in a single call to it.
>
> I think even FUSE_CREAT is doing same thing. Creating file, opening and
> doing lookup and sending all the data. So that's not the difference
> between the two, IMHO. And that's why I am getting confused with the
> naming.
>
> From user space file server perspective, only extra operation seems
> to be that it sends a flag in response telling the client whether
> file was actually created or it already existed. So to me it just
> sounds little extension of existing FUSE_CREATE command and that's
> why I thought calling it FUSE_CREATE_EXT is probably better naming.

The difference between the two is of atomicity from top to bottom i.e
single call from fuse kernel to user space file server. Generally
FUSE_CREATE goes upto libfuse low level API as atomic (check
fuse_lib_create()) and it gets separated there into two calls (create
+ getattr). This is not what we want as it results in two network
trips each for create and lookup to the file server.

>
> >
> >
> > > May be atomic here means that "lookup + create + open" is a single operation.
> > > But then even FUSE_CREATE is atomic because "creat + open" is a single
> > > operation.
> > >
> > > In fact FUSE_CREATE does lookup anyway and returns all the information
> > > in fuse_entry_out.
> > >
> > > IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> > > later also carries information in reply whether file was actually created
> > > or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> > > already and it was created indeed. Is that right?
> >
> > FUSE_CREATE is atomic but upto level of libfuse. Libfuse separates it
> > into two calls, create and lookup separately into USER SPACE FUSE
> > implementation.
>
> I am not sure what do you mean by "libfuse separates it into two calls,
> create and lookup separately". I guess you are referring to lo_create()
> in example/passthrough_ll.c which first creates and opens file and
> then looks it up and replies.
>
> fd = openat(lo_fd(req, parent), name,
> (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
> err = lo_do_lookup(req, parent, name, &e);
> fuse_reply_create(req, &e, fi);
>
> I am looking at your proposal for atomic_create implementation here.
>
> https://github.com/libfuse/libfuse/pull/673/commits/88cd25b2857f2bb213d01afbcfd666787d1e6893#diff-a36385ec8fb753d6f4492a5f0d3c6a5750bd370b50df6ef0610efdcd3f8880ffR787
>
> It is doing exactly same thing as lo_create(), except one difference that
> it is checking first if file exists. It essentially is doing this.
>
> A. newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
> B. fd = openat(lo_fd(req, parent), name,
> (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
> C. err = lo_do_lookup(req, parent, name, &e);
> D. fuse_reply_create(req, &e, fi);
>
> So what do you mean by libfuse makes it two calls.
>
> And I think above implementation is racy. What if filesystem is
> shared and another client creates the file between calls to
> A and B. You will think you created the file but some other
> client created it.
>
> So if intent is to know whether we created the file or not, then
> you should probably do openat() with O_EXCL flag. If that succeeds
> you know you created the file. If it fails with -EEXIST, then you
> know file is already there. That's what virtiofs does.
>
> Anyway, coming back to the point. IMHO, from server perspective,
> there is no atomicity difference between FUSE_CREATE and
> FUSE_ATOMIC_CREATE. Only difference seems to be to send addditional
> information back to the client to tell it whether file was created
> or not.
>
> In fact for shared filesystem this is probably a problem. What if
> guest's cache is stale and it does not know about the file. A client
> B creates the file and we think we did not create the file. And we
> will return with FOPEN_FILE_CREATED = 0. And in that case client
> will not call fuse_dir_changed(). But that seems wrong in case
> of shared filesystems. I am concerned about virtiofs which can
> be shared between different guests.
>
> Miklos, WDYT?
>
> May be it is not a huge concern. If one guest drops a file, other guest
> will not invalidate its dir attrs till timeout happens. Case of shared
> filesystem is very tricky with fuse. And sometimes it is not clear
> to me what kind of coherency matters.
>
> So in this case say I am booted with cache=auto, if guest B drops a file
> bar/foo.txt and guest A does open(bar/foo.txt, O_CREAT), then should
> guest A invalidate the attrs of bar/ right away or it will be invalidated
> anyway after a second.
>
> Anyway.., my core point is that difference between FUSE_CREATE and
> FUSE_ATOMIC_CREATE is just one flag FOPEN_FILE_CREATED which tells
> client whether file was actually created or not. And that is used
> to determine whether to invalidate parent dir attributes or not. It
> does not have anything extra in terms of ATOMICITY as far as I can
> see and that's what confuses me.

You checked the wrong code. pasthrough_ll is not production code, we
do not use it at all, just using it to test these patches here. There
is a main patch which implements atomic create in libfuse for low
level api(). It is in same pull request, check it here
https://github.com/libfuse/libfuse/pull/673/commits/f86fe92bef7bb529ef1617e077d69a39eb26bc9f

What this FUSE_ATOMIC_CREATE(fuse_lib_atomic_create()) does in libfuse
is it make a single call into fuse_operations where file server would
be doing all 'lookup + create + open' in one call itself whereas
FUSE_CREATE gets separated into two calls 'create + lookup' which
eventually are two rpcs to the user space file server for a single
operation. Now you can see FUSE_CREATE is not atomic from file
systems point of view(i.e from user space file server perspective)
but from fuse kernel point of view only. We can make existing libfuse
code i.e fuse_lib_create() to call new atomic create but then the
interface becomes messy and it might start a trend to twist libfuse.
We want to keep libfuse APIs neat and clean. And do not break existing
systems. No flag from fuse kernel to libfuse to decide which code to
traverse. Therefore a new call for all this work.

>
>
>
> > This FUSE_ATOMIC_CREATE does all these ops in a single call to FUSE
> > implementations. We do not want to break any existing FUSE
> > implementations, therefore it is being introduced as a new feature. I
> > forgot to include links to libfuse patches as well. That would have
> > made it much clearer. Here is the link to libfuse patch for this call
> > https://github.com/libfuse/libfuse/pull/673.
> >
> > >
> > > I see FOPEN_FILE_CREATED is being used to avoid calling
> > > fuse_dir_changed(). That sounds like a separate optimization and probably
> > > should be in a separate patch.
> >
> > FUSE_ATOMIC_CREATE needs to send back info about if file was actually
> > created or not (This is suggestion from Miklos) to correctly convey if
> > the parent dir is really changing or not. I included this as part of
> > this patch itself instead of having it as a separate patch.
>
> This needs little more thought w.r.t shared filesystems.
>
> >
> > > IOW, I think this patch should be broken in to multiple pieces. First
> > > piece seems to be avoiding lookup() and given the way it is implemented,
> > > looks like we can avoid lookup() even by using existing FUSE_CREATE
> > > command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?
> >
> > Its not only about changing fuse kernel code but USER SPACE
> > implementations also. If we change the you are suggesting we would be
> > required to twist many things at libfuse and FUSE low level API. So to
> > keep things simple and not to break any existing implementations we
> > have kept it as new call (we now pass `struct stat` to USER SPACE FUSE
> > to filled in).
> >
> > > And once that is done, a separate patch should probably should explain
> > > the problem and say fuse_dir_changed() call can be avoided if we knew
> > > if file was actually created or it was already existing there. And that's
> > > when one need to introduce a new command. Given this is just an extension
> > > of existing FUSE_CREATE command and returns additiona info about
> > > FOPEN_FILE_CREATED, we probably should simply call it FUSE_CREATE_EXT
> > > and explain how this operation is different from FUSE_CREATE.
> >
> > As explained above, we are not doing this way as we have kept in mind
> > all existing libfuse APIs as well.
> >
>
> I am not seeing what will break in existing passthrough_ll.c if I
> simply used FUSE_CREATE for a file which already exists.
>
> fd = openat(lo_fd(req, parent), name,
> (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
>
> This call should still succeed even if file already exists. (until and
> unless called it with O_EXCL and in that case failing is the correct
> behavior).

It's not passthrogh_ll but low level libfuse API. Passthrough_ll is
just another user of low level API. It has been used just to test
these patches. We do not use passthrough_ll at all for any other
purpose.