Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create

From: Vivek Goyal
Date: Fri May 06 2022 - 13:07:51 EST


On Fri, May 06, 2022 at 06:41:17PM +0200, Bernd Schubert wrote:
>
>
> On 5/6/22 16:12, Vivek Goyal wrote:
>
> [...]
>
> > On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:
>
> >
> > Ok, looks like your fuse file server is talking to a another file
> > server on network and that's why you are mentioning two network trips.
> >
> > Let us differentiate between two things first.
> >
> > A. FUSE protocol semantics
> > B. Implementation of FUSE protocl by libfuse.
> >
> > I think I am stressing on A and you are stressing on B. I just want
> > to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE
> > from fuse protocol point of view. Again look at from kernel's point of
> > view and don't worry about libfuse is going to implement it.
> > Implementations can vary.
>
> Agreed, I don't think we need to bring in network for the kernel to libfuse
> API.
>
> >
> > From kernel's perspective FUSE_CREATE is supposed to create + open a
> > file. It is possible file already exists. Look at include/fuse_lowlevel.h
> > description for create().
> >
> > /**
> > * Create and open a file
> > *
> > * If the file does not exist, first create it with the specified
> > * mode, and then open it.
> > */
> >
> > I notice that fuse is offering a high level API as well as low level
> > API. I primarily know about low level API. To me these are just two
> > different implementation but things don't change how kernel sends
> > fuse messages and what it expects from server in return.
> >
> > Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference
> > is that in reply message file server will also indicate if file was
> > actually created or not. Is that right?
> >
> > And I am focussing on this FUSE API apsect. I am least concerned at
> > this point of time who libfuse decides to actually implement FUSE_CREATE
> > or FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse
> > server (instead of two) and that's performance optimization in libfuse.
> > Kernel does not care how many calls did you make in file server to
> > implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that
> > create and open the file.
> >
> > So while you might do things in more atomic manner in file server and
> > cut down on network traffic, kernel fuse API does not care. All it cares
> > about is create + open a file.
> >
> > Anyway, from kernel's perspective, I think you should be able to
> > just use FUSE_CREATE and still be do "lookup + create + open".
> > FUSE_ATOMIC_CREATE is just allows one additional optimization so
> > that you know whether to invalidate parent dir's attrs or not.
> >
> > In fact kernel is not putting any atomicity requirements as well on
> > file server. And that's why I think this new command should probably
> > be called FUSE_CREATE_EXT because it just sends back additional
> > info.
> >
> > All the atomicity stuff you have been describing is that you are
> > trying to do some optimizations in libfuse implementation to implement
> > FUSE_ATOMIC_CREATE so that you send less number of commands over
> > network. That's a good idea but fuse kernel API does not require you
> > do these atomically, AFAICS.
> >
> > Given I know little bit of fuse low level API, If I were to implement
> > this in virtiofs/passthrough_ll.c, I probably will do following.
> >
> > A. Check if caller provided O_EXCL flag.
> > B. openat(O_CREAT | O_EXCL)
> > C. If success, we created the file. Set file_created = 1.
> >
> > D. If error and error != -EEXIST, send error back to client.
> > E. If error and error == -EEXIST, if caller did provide O_EXCL flag,
> > return error.
> > F. openat() returned -EEXIST and caller did not provide O_EXCL flag,
> > that means file already exists. Set file_created = 0.
> > G. Do lookup() etc to create internal lo_inode and stat() of file.
> > H. Send response back to client using fuse_reply_create().
> > This is one sample implementation for fuse lowlevel API. There could
> > be other ways to implement. But all that is libfuse + filesystem
> > specific and kernel does not care how many operations you use to
> > complete and what's the atomicity etc. Of course less number of
> > operations you do better it is.
> >
> > Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATE
> > descritpion (fuse_lowlevel.h) already mentions that "If the file does not
> > exist, first create it with the specified mode and then open it". That
> > means intent of protocol is that file could already be there as well.
> > So I think we probably should implement this optimization (in kernel)
> > using FUSE_CREATE command and then add FUSE_CREATE_EXT to add optimization
> > about knowing whether file was actually created or not.
> >
> > W.r.t libfuse optimizations, I am not sure why can't you do optimizations
> > with FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If
> > are you worried that some existing filesystems will break, I think
> > you can create an internal helper say fuse_create_atomic() and then
> > use that if filesystem offers it. IOW, libfuse will have two
> > ways to implement FUSE_CREATE. And if filesystem offers a new way which
> > cuts down on network traffic, libfuse uses more efficient method. We
> > should not have to change kernel FUSE API just because libfuse can
> > do create + open operation more efficiently.
>
> Ah right, I like this. As I had written before, the first patch version was
> using FUSE_CREATE and I was worried to break something. Yes, it should be
> possible split into lookup+create on the libfuse side. That being said,
> libfuse will need to know which version it is - there might be an old kernel
> sending the non-optimized version - libfuse should not do another lookup
> then.

I am confused about one thing. For FUSE_CREATE command, how does it
matter whether kernel has done lookup() before sending FUSE_CREATE. All
FUSE_CREATE seems to say that crate a file (if it does not exist already)
and then open it and return file handle as well as inode attributes. It
does not say anything about whether a LOOKUP has already been done
by kernel or not.

It looks like you are assuming that if FUSE_CREATE is coming, that
means client has already done FUSE_LOOKUP. So there is something we
are not on same page about.

I looked at fuse_lowlevel API and passthrough_ll.c and there is no
assumption whether FUSE_LOOKUP has already been called by client
before calling FUSE_CREATE. Similarly, I looked at virtiofs code
and I can't see any such assumption there as well.

https://github.com/qemu/qemu/blob/master/tools/virtiofsd/passthrough_ll.c

So I am sort of lost. May be you can help me understsand this.

> Now there is 'fi.flags = arg->flags', but these are already taken by
> open/fcntl flags - I would not feel comfortable to overload these. At best,
> struct fuse_create_in currently had a padding field, we could convert these
> to something like 'ext_fuse_open_flags' and then use it for fuse internal
> things. Difficulty here is that I don't know if all kernel implementations
> zero the struct (BSD, MacOS), so I guess we would need to negotiate at
> startup/init time and would need another main feature flag? And with that
> I'm not be sure anymore if the result would be actually more simple than
> what we have right now for the first patch.

If FUSE_CREATE indeed has a dependency on FUSE_LOOKUP have been called
before that, then I agree that we can't implement new semantics with
FUSE_CREATE and we will have to introduce a new op say
FUSE_ATOMIC_CREATE/FUSE_LOOKUP_CREATE/FUSE_CREATE_EXT.

But looking at fuse API, I don't see FUSE_CREATE ever guaranteeing that
a FUSE_LOOKUP has been done before this.

Thanks
Vivek