Re: [PATCH 10/16] fuse: Implement writepages callback

From: Maxim Patlasov
Date: Fri Aug 09 2013 - 11:02:12 EST

Hi Miklos,

08/06/2013 08:25 PM, Miklos Szeredi ÐÐÑÐÑ:
On Fri, Aug 2, 2013 at 5:40 PM, Maxim Patlasov <mpatlasov@xxxxxxxxxxxxx> wrote:
07/19/2013 08:50 PM, Miklos Szeredi ÐÐÑÐÑ:

On Sat, Jun 29, 2013 at 09:45:29PM +0400, Maxim Patlasov wrote:
From: Pavel Emelyanov <xemul@xxxxxxxxxx>

The .writepages one is required to make each writeback request carry more
one page on it. The patch enables optimized behaviour unconditionally,
i.e. mmap-ed writes will benefit from the patch even if
I rewrote this a bit, so we won't have to do the thing in two passes,
makes it simpler and more robust. Waiting for page writeback here is
anyway, see comment above fuse_page_mkwrite(). BTW we had a race there
fuse_page_mkwrite() didn't take the page lock. I've also fixed that up
pushed a series containing these patches up to implementing ->writepages()


Passed some trivial testing but more is needed.

Thanks a lot for efforts. The approach you implemented looks promising, but
it introduces the following assumption: a page cannot become dirty before we
have a chance to wait on fuse writeback holding the page locked. This is
already true for mmap-ed writes (due to your fixes) and it seems doable for
cached writes as well (like we do in fuse_perform_write). But the assumption
seems to be broken in case of direct read from local fs (e.g. ext4) to a
memory region mmap-ed to a file on fuse fs. See how dio_bio_submit() marks
pages dirty by bio_set_pages_dirty(). I can't see any solution for this
use-case. Do you?
Hmm. Direct IO on an mmaped file will do get_user_pages() which will
do the necessary page fault magic and ->page_mkwrite() will be called.
At least AFAICS.

Yes, I agree.

The page cannot become dirty through a memory mapping without first
switching the pte from read-only to read-write first. Page accounting
logic relies on this too. The other way the page can become dirty is
through write(2) on the fs. But we do get notified about that too.

Yes, that's correct, but I don't understand why you disregard two other cases of marking page dirty (both related to direct AIO read from a file to a memory region mmap-ed to a fuse file):

1. dio_bio_submit() -->
bio_set_pages_dirty() -->

2. dio_bio_complete() -->
bio_check_pages_dirty() -->
bio_dirty_fn() -->
bio_set_pages_dirty() -->

As soon as a page became dirty through a memory mapping (exactly as you explained), nothing would prevent it to be written-back. And fuse will call end_page_writeback almost immediately after copying the real page to a temporary one. Then dio_bio_submit may re-dirty page speculatively w/o notifying fuse. And again, since then nothing would prevent it to be written-back once more. Hence we can end up in more then one temporary page in fuse write-back. And similar concern for dio_bio_complete() re-dirty.

This make me think that we do need fuse_page_is_writeback() in fuse_writepages_fill(). But it shouldn't be harmful because it will no-op practically always due to waiting for fuse writeback in ->page_mkwrite() and in course of handling write(2).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at