Re: [PATCH v3 3/7] mm/lru: replace pgdat lru_lock with lruvec lock

From: Matthew Wilcox
Date: Fri Nov 15 2019 - 23:38:32 EST


On Sat, Nov 16, 2019 at 11:15:02AM +0800, Alex Shi wrote:
> This is the main patch to replace per node lru_lock with per memcg
> lruvec lock. It also fold the irqsave flags into lruvec.

I have to say, I don't love the part where we fold the irqsave flags
into the lruvec. I know it saves us an argument, but it opens up the
possibility of mismatched expectations. eg we currently have:

static void __split_huge_page(struct page *page, struct list_head *list,
struct lruvec *lruvec, pgoff_t end)
{
...
spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->irqflags);

so if we introduce a new caller, we have to be certain that this caller
is also using lock_page_lruvec_irqsave() and not lock_page_lruvec_irq().
I can't think of a way to make the compiler enforce that, and if we don't,
then we can get some odd crashes with interrupts being unexpectedly
enabled or disabled, depending on how ->irqflags was used last.

So it makes the code more subtle. And that's not a good thing.

> +static inline struct lruvec *lock_page_lruvec_irq(struct page *page,
> + struct pglist_data *pgdat)
> +{
> + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, pgdat);
> +
> + spin_lock_irq(&lruvec->lru_lock);
> +
> + return lruvec;
> +}

...

> +static struct lruvec *lock_page_lru(struct page *page, int *isolated)
> {
> pg_data_t *pgdat = page_pgdat(page);
> + struct lruvec *lruvec = lock_page_lruvec_irq(page, pgdat);
>
> - spin_lock_irq(&pgdat->lru_lock);
> if (PageLRU(page)) {
> - struct lruvec *lruvec;
>
> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
> ClearPageLRU(page);
> del_page_from_lru_list(page, lruvec, page_lru(page));
> *isolated = 1;
> } else
> *isolated = 0;
> +
> + return lruvec;
> }

But what if the page is !PageLRU? What lruvec did we just lock?
According to the comments on mem_cgroup_page_lruvec(),

* This function is only safe when following the LRU page isolation
* and putback protocol: the LRU lock must be held, and the page must
* either be PageLRU() or the caller must have isolated/allocated it.

and now it's being called in order to find out which LRU lock to take.
So this comment needs to be updated, if it's wrong, or this patch has
a race.