Re: [RFC/PATCH] Use mmu_gather for fork() instead of flush_tlb_mm()

From: Benjamin Herrenschmidt
Date: Mon Jul 09 2007 - 08:32:20 EST


On Mon, 2007-07-09 at 20:12 +1000, Nick Piggin wrote:
> Benjamin Herrenschmidt wrote:
> > On Mon, 2007-07-09 at 19:29 +1000, Nick Piggin wrote:
> >
> >>They could just #define one to the other though, there are only a
> >>small
> >>number of them. Is there a downside to not making them distinct? i386
> >>for example probably would just keep doing a tlb flush for fork and
> >>not
> >>want to worry about touching the tlb gather stuff.
> >
> >
> > But the tlb gather stuff just does ... a flush_tlb_mm() on x86 :-)
>
> But it still does the get_cpu of the mmu gather data structure and
> has to look in there and touch the cacheline. You're also having to
> do more work when unlocking/relocking the ptl etc.

Hrm... true. I forgot about the cost of get_cpu. Do you think it will by
measurable at all in practice ? I doubt it but heh...

The place where I see a possible issue is indeed when dropping the lock,
in things like copy_pte_range, we would want to flush the batch in order
to be able to schedule.

That means we would end up probably doing flush_tlb_mm() once for every
lock drop instead of just once on x86, unless there's a smart way to
deal with that case... After all, when we do such lock dropping, we
don't actually need to dismiss the batch, the only reason we do so is to
re-enable preempt, because we may be migrated to another CPU.

But I wonder if it's worth bothering.... we drop the lock when have
need_resched() or there is contention on the lock. In both of these
cases, I doubt the added flush will matter noticeably...

If you think it will, then we could probably make the implementation a
bit more subtle, and allow to "put" a current batch (unblock
preemption), and only actually complete/flush it if a context switch
happens. It's not totally trivial to do with the current APIs though
mostly because of the passing of start/end when completing the batch.

Technically, on x86, I believe we don't even need to do anything but the
-last- flush in fact. So we could just add a pair of tlb_pause/resume
for the lock dropping :-)

But if we're going to do a spearate API, then what would you have it
look like ? It would have all of the same issues no ?

I suppose best is to do a few tests to see if there's any measurable
performance regression with my patch on x86 (btw, it may not build with
hugetlbfs, I forgot a #include). Do you have some test gear around ? I
lack x86 hardware myself...

I'm also interested in the possible impact on ia64. I wonder if they can
benefit from more targetted flushing in fork()

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/