Re: Please add the zuf tree to linux-next

From: Boaz Harrosh
Date: Thu Nov 14 2019 - 11:04:33 EST

Next message: Guenter Roeck: "Re: [PATCH v3 1/3] watchdog: sama5d4_wdt: cleanup the bit definitions"
Previous message: Valentin Schneider: "Re: [PATCH v2] sched/topology, cpuset: Account for housekeeping CPUs to avoid empty cpumasks"
In reply to: Miklos Szeredi: "Re: Please add the zuf tree to linux-next"
Next in thread: Miklos Szeredi: "Re: Please add the zuf tree to linux-next"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 14/11/2019 16:56, Miklos Szeredi wrote:
> On Thu, Nov 14, 2019 at 3:02 PM Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote:
>
>> At the last LSF. Steven from Red-Hat asked me to talk with Miklos about the fuse vs zufs.
>> We had a long talk where I have explained to him in detail How we do the mounting, how
>> Kernel owns the multy-devices. How we do the PMEM API and our IO API in general. How
>> we do pigi-back operations to minimize latencies. How we do DAX and mmap. At the end of the
>> talk he said to me that he understands how this is very different from FUSE and he wished
>> me "good luck".
>>
>> Miklos - you have seen both projects; do you think that All these new subsystems from ZUFS
>> can have a comfortable place under FUSE, including the new IO API?
>
> It is quite true that ZUFS includes a lot of innovative ideas to
> improve the performance of a certain class of userspace filesystems.
> I think most, if not all of those ideas could be applied to the fuse
> implementation as well,

This is not so:

- The way we do the mount is very different. It is not the Server that does
The mount but the Kernel. So auto bind mount works (same device different dir)
- The way zuf owns the devices in the Kernel, and supports multi-devices.
And has support for pmem devices as well as what we call t2 (regular) block
devices. And the all API for transfer between them. (The all md.* thing).
Proper locking of devices.
- The way we are true zero-copy both pmem and t2.
- The way we are DAX both pwrite and mmap.
- The way we are NUMA aware both Kernel and Server.
- The way we use shared memory pools that are deep in the protocol between
Server and Kernel for zero copy of meta-data as well as protocol buffers.
- The way we do pigy-back of operations to save round-trips.
- The way we use cookies in Kernel of all Server objects so there are no
i_ino hash tables or look-ups.
- The way we use a single Server with loadable FS modules. That the ZUSD comes
with the distro and only the FS-pluging comes from Vendor. So Kernel=Server API
is in sync.
- The way ZUFS supports root filesystem.
- The way ZUFS supports VM-FS to SHARE same p-memory as HOST-FS
- The way we do Zero-copy IO, both pmem and bdevs

> but I can understand why this hasn't been
> done. Fuse is in serious need of a cleanup, which I've started to do,
> but it's not there yet...
>

This will not be wise. It will be a complete FULL zuf code drop into the
current fuse code base (fuse is BTW bigger then zuf). I think this is the
Last thing fuse needs.

I know for a fact that the code of fuse+zuf will be bigger and slower than
those two Separate.

zufs is built from the ground up, built on all those subsystems as
building blocks. Putting all these things into fuse will actually be like
putting a pyramid on its head.

> One of the major issues that I brought up when originally reviewing
> ZUFS (but forgot to discuss at LSF) is about the userspace API. I
> think it would make sense to reuse FUSE protocol definition and extend
> it where needed. That does not mean ZUFS would need to be 100%
> backward compatible with FUSE, it would just mean that we'd have a
> common userspace API and each implementation could implement a subset
> of features.

This is easy to say. But believe me it is not possible. The shared structures
are maybe 20% and not 80% as the theory might feel about it. The projects are
really structured differently.

I have looked at it long and hard, Many times. I do not know how to this.
If I knew how I would.

These codes and systems do very different things. It will need tones of
if()s and operation changes. Sometimes you do a copy/paste of ext4 into
ffs2 and so on. Because the combination is not always the best and the
easiest.

> I think this would be an immediate and significant
> boon for ZUFS, since it would give it an already existing user/tester
> base that it otherwise needs to build up. It would also allow
> filesystem implementation to be more easily switchable between the
> kernel frameworks in case that's necessary.
>

Thanks Miklos for your input. I have looked at this problems many times.
This is not something that is interesting for me. Because these two projects
come to solve different things.

And it is not so easy to do as it sounds. There are fundamental difference
between the projects. For example in fuse main() belongs to the FS. That needs
to supply its own mount application. In ZUFS we do the regular Kernel's /sbin/mount.
Also ZUS User-mode server has a huge facility for allocating pages, mlocking,
per-cpu counters per-cpu variables, NUMA memory management. Thread management.
The API with zuf is very very particular about tons of things. Involving threads
and special files and mmap calls, and shared memory with Kernel. This will not be so
easily interchangeable.

> Thanks,
> Miklos
>

Sometimes a fresh new code is much easier more maintainable and faster / more capable
then a do-it-all blob of code.
I am not sure if you actually looked at the code both Kernel and Server. This is not so easy
as it sounds. Even after a deep fuse cleanup.

Yes perhaps we could share some core code, like what sits in zuf-core.c and the relay object
but not more then that.

Thanks
Boaz

Next message: Guenter Roeck: "Re: [PATCH v3 1/3] watchdog: sama5d4_wdt: cleanup the bit definitions"
Previous message: Valentin Schneider: "Re: [PATCH v2] sched/topology, cpuset: Account for housekeeping CPUs to avoid empty cpumasks"
In reply to: Miklos Szeredi: "Re: Please add the zuf tree to linux-next"
Next in thread: Miklos Szeredi: "Re: Please add the zuf tree to linux-next"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]