Re: Memory hotplug softlock issue

From: Hugh Dickins
Date: Tue Nov 20 2018 - 20:21:44 EST


On Tue, 20 Nov 2018, Baoquan He wrote:
> On 11/20/18 at 02:38pm, Vlastimil Babka wrote:
> > On 11/20/18 6:44 AM, Hugh Dickins wrote:
> > > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated
> > >
> > > We have all assumed that it is essential to hold a page reference while
> > > waiting on a page lock: partly to guarantee that there is still a struct
> > > page when MEMORY_HOTREMOVE is configured, but also to protect against
> > > reuse of the struct page going to someone who then holds the page locked
> > > indefinitely, when the waiter can reasonably expect timely unlocking.
> > >
> > > But in fact, so long as wait_on_page_bit_common() does the put_page(),
> > > and is careful not to rely on struct page contents thereafter, there is
> > > no need to hold a reference to the page while waiting on it. That does
> >
> > So there's still a moment where refcount is elevated, but hopefully
> > short enough, right? Let's see if it survives Baoquan's stress testing.
>
> Yes, I applied Hugh's patch 8 hours ago, then our QE Ping operated on
> that machine, after many times of hot removing/adding, the endless
> looping during mirgrating is not seen any more. The test result for
> Hugh's patch is positive. I even suggested Ping increasing the memory
> pressure to "stress -m 250", it still succeeded to offline and remove.
>
> So I think this patch works to solve the issue. Thanks a lot for your
> help, all of you.

Very good to hear, thanks a lot for your quick feedback.

>
> High, will you post a formal patch in a separate thread?

Yes, I promise that I shall do so in the next few days, but not today:
some other things have to take priority.

And Vlastimil has raised an excellent point about the interaction with
PSI "thrashing": I need to read up and decide which way to go on that
(and add Johannes to the Cc when I post).

I think I shall probably post it directly to Linus (lists and other
people Cc'ed of course): not because I think it should be rushed in
too quickly, nor to sidestep Andrew, but because Linus was very closely
involved in both the PG_waiters and WQ_FLAG_BOOKMARK discussions:
it is an area of special interest to him.

Hugh