Re: [RFC] high system time & lock contention running large mixed workload

From: KOSAKI Motohiro
Date: Tue Dec 01 2009 - 07:23:45 EST


(cc to some related person)

> The cause was determined to be the unconditional call to
> page_referenced() for every mapped page encountered in
> shrink_active_list(). page_referenced() takes the anon_vma->lock and
> calls page_referenced_one() for each vma. page_referenced_one() then
> calls page_check_address() which takes the pte_lockptr spinlock. If
> several CPUs are doing this at the same time there is a lot of
> pte_lockptr spinlock contention with the anon_vma->lock held. This
> causes contention on the anon_vma->lock, stalling in the fo and very
> high system time.
>
> Before the splitLRU patch shrink_active_list() would only call
> page_referenced() when reclaim_mapped got set. reclaim_mapped only got
> set when the priority worked its way from 12 all the way to 7. This
> prevented page_referenced() from being called from shrink_active_list()
> until the system was really struggling to reclaim memory.
>
> On way to prevent this is to change page_check_address() to execute a
> spin_trylock(ptl) when it was called by shrink_active_list() and simply
> fail if it could not get the pte_lockptr spinlock. This will make
> shrink_active_list() consider the page not referenced and allow the
> anon_vma->lock to be dropped much quicker.
>
> The attached patch does just that, thoughts???

At first look,

- We have to fix this issue certenally.
- But your patch is a bit risky.

Your patch treat trylock(pte-lock) failure as no accessced. but
generally lock contention imply to have contention peer. iow, the page
have reference bit typically. then, next shrink_inactive_list() move it
active list again. that's suboptimal result.

However, we can't treat lock-contention as page-is-referenced simply. if it does,
the system easily go into OOM.

So,
if (priority < DEF_PRIORITY - 2)
page_referenced()
else
page_refenced_trylock()

is better?
On typical workload, almost vmscan only use DEF_PRIORITY. then,
if priority==DEF_PRIORITY situation don't cause heavy lock contention,
the system don't need to mind the contention. anyway we can't avoid
contention if the system have heavy memory pressure.

btw, current shrink_active_list() have unnecessary page_mapping_inuse() call.
it prevent to drop page reference bit from unmapped cache page. it mean
we protect unmapped cache page than mapped page. it is strange.

Unfortunately, I don't have enough development time today. I'll
working on tommorow.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/