Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics

From: Xishi Qiu
Date: Tue Dec 11 2012 - 01:50:09 EST


>>> Hi Simon,

>>>
>>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a
>>> free page, the value of mce_bad_pages will be added. Then the page is marked
>>> HWPoison, but it is still managed by page buddy alocator.
>>>
>>> So if we offline it again, the value of mce_bad_pages will be added again.
>>> Assume the page is not allocated during this short time.
>>>
>>> soft_offline_page()
>>> get_any_page()
>>> "else if (is_free_buddy_page(p))" branch return 0
>>> "goto done";
>>> "atomic_long_add(1, &mce_bad_pages);"
>>>
>>> I think it would be better to move "if(PageHWPoison(page))" at the beginning of
>>> soft_offline_page(). However I don't know what do these words mean,
>>> "Synchronized using the page lock with memory_failure()"
>
> Hi Xishi,
>
> Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() and
> soft_offline_page() take the lock to avoid unpoison clear the flag behind them.
>
> Regards,
> Wanpeng Li
>

Hi Wanpeng,

As you mean, it is the necessary to get the page lock first when we check the
HWPoison flag every time, this is in order to avoid conflict, right?

So why not use a globe lock here? For example lock_memory_hotplug() is used in
online_pages() and offline_pages()?

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/