Re: [PATCH] mm: save current->journal_info before calling fault/page_mkwrite

From: Andrew Morton
Date: Wed Dec 13 2017 - 21:30:41 EST


On Thu, 14 Dec 2017 10:20:18 +0800 "Yan, Zheng" <zyan@xxxxxxxxxx> wrote:

> >> + /*
> >> + * If the fault happens during write_iter() copies data from
> >> + * userspace, filesystem may have set current->journal_info.
> >> + * If the userspace memory is mapped to a file on another
> >> + * filesystem, fault handler of the later filesystem may want
> >> + * to access/modify current->journal_info.
> >> + */
> >> + current->journal_info = NULL;
> >> ret = vma->vm_ops->fault(vmf);
> >> + /* Restore original journal_info */
> >> + current->journal_info = old_journal_info;
> >> if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
> >> VM_FAULT_DONE_COW)))
> >> return ret;
> >
> > Can you explain why you chose these two sites? Rather than, for
> > example, way up in handle_mm_fault()?
>
> I think they are the only two places that code can enter another filesystem

hm. Maybe. At this point in time. I'm feeling that doing the
save/restore at the highest level is better. It's cheap.

> >
> > It's hard to believe that a fault handler will alter ->journal_info if
> > it is handling a read fault, so perhaps we only need to do this for a
> > write fault? Although such an optimization probably isn't worthwhile.
> > The whole thing is only about three instructions.
>
> ceph uses current->journal_info for both read/write operations. I think btrfs also read current->journal_info during read-only operation. (I mentioned this in my previous reply)

Quite a lot of filesystems use ->journal_info. Arguably it should be
the fs's responsibility to restore the old journal_info value after
having used it. But that's a ton of changes :(