Re: [PATCH 2/2] Release mmap_sem when page fault blocks on disk transfer.

From: Michel Lespinasse
Date: Fri Oct 01 2010 - 19:06:44 EST

Next message: Peter Zijlstra: "Re: [PATCH 3/7] Add IRQ_TIME_ACCOUNTING, finer accounting of irqtime -v3"
Previous message: Mark Brown: "Re: [MeeGo-Dev][PATCH v3] Topcliff: Update PCH_GPIO driver to2.6.35"
In reply to: Linus Torvalds: "Re: [PATCH 2/2] Release mmap_sem when page fault blocks on disk transfer."
Next in thread: Linus Torvalds: "Re: [PATCH 2/2] Release mmap_sem when page fault blocks on disk transfer."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Oct 1, 2010 at 8:31 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> Also, I think the "RELEASE" naming is too much about the
> implementation, not about the context. I think it would be more
> sensible to call it "ALLOW_RETRY" or "ATOMIC" or something like this,
> and not make it about releasing the page lock so much as about what
> you want to happen.
>
> Because quite frankly, I could imagine other reasons to allow page fault retry.
>
> (Similarly, I would rename VM_FAULT_RELEASED to VM_FAULT_RETRY. Again:
> name things for the _concept_, not for some odd implementation issue)

All right, I changed for your names and I think they do help. There is
still one annoyance though (and this is why I had not made this purely
about retry in the first iteration): the up_read(mmap_sem) and the
wait_on_page_locked(page) still happen within filemap_fault(). I think
ideally we would prefer to move this into do_page_fault so that the
interface could *really* be about retry; however we can't easily do
that because the struct page is not exposed at that level.

>
>> - if (fault & VM_FAULT_MAJOR) {
>> - tsk->maj_flt++;
>> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, 0,
>> - regs, address);
>> - } else {
>> - tsk->min_flt++;
>> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, 0,
>> - regs, address);
>> + if (release_flag) { /* Did not go through a retry */
>> + if (fault & VM_FAULT_MAJOR) {
>
> I really don't know if this is correct. What if you have two major
> faults due to the retry? What if the first one is a minor fault, but
> when we retry it's a major fault because the page got released? The
> nesting of the conditionals doesn't seem to make conceptual sense.
>
> I dunno. I can see what you're doing ("only do statistics for the
> first return"), but at the same time it just feels a bit icky.

In a way filemap_fault() already has that problem - during a minor
fault, the page could go away before we have a chance to lock it, and
the fault would still be counted as minor. So I just took that
property (first find_get_page() determines if we call the fault minor
or major) and extended it into the retry case.

One reasonable alternative, I think, would be to always count the
fault as major if we had to go through the retry path. The main
difference this would make, I think, is if two threads hit the exact
same page before we get a chance to load it from disk - in which case
they would both get counted as major faults, vs the current accounting
that would charge one as major and the other one as minor.

>> - lock_page(page);
>> + /* Lock the page. */
>> + if (!trylock_page(page)) {
>> + if (!(vmf->flags & FAULT_FLAG_RELEASE))
>> + __lock_page(page);
>> + else {
>> + /*
>> + * Caller passed FAULT_FLAG_RELEASE flag.
>> + * This indicates it has read-acquired mmap_sem,
>> + * and requests that it be released if we have to
>> + * wait for the page to be transferred from disk.
>> + * Caller will then retry starting with the
>> + * mmap_sem read-acquire.
>> + */
>> + up_read(&vma->vm_mm->mmap_sem);
>> + wait_on_page_locked(page);
>> + page_cache_release(page);
>> + return ret | VM_FAULT_RELEASED;
>> + }
>> + }
>
> I'd much rather see this abstracted out (preferably together with the
> "did it get truncated" logic) into a small helper function of its own.
> The main reason I say that is because I hate your propensity for
> putting the comments deep inside the code. I think any code that needs
> big comments at a deep indentation is fundamentally flawed.

To be clear, is it about the helper function or about the comment
location ? I think the code block is actually short and simple, so
maybe if I just moved the comment up to the /* Lock the page */
location it'd also look that way ?

> You had the same thing in the x86 fault path. I really think it's
> wrong. Needing a comment _inside_ a conditional is just nasty. You
> shouldn't explain what just happened, you should explain what is
> _going_ to happen, an why you do a test in the first place.

That's probably an habit I picked up on another project. Thanks for
pointing it out, I'll try to avoid doing this in linux code.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Peter Zijlstra: "Re: [PATCH 3/7] Add IRQ_TIME_ACCOUNTING, finer accounting of irqtime -v3"
Previous message: Mark Brown: "Re: [MeeGo-Dev][PATCH v3] Topcliff: Update PCH_GPIO driver to2.6.35"
In reply to: Linus Torvalds: "Re: [PATCH 2/2] Release mmap_sem when page fault blocks on disk transfer."
Next in thread: Linus Torvalds: "Re: [PATCH 2/2] Release mmap_sem when page fault blocks on disk transfer."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]