Re: [PATCH 1/1] mm/page_alloc: add a warning about high order allocations

From: Konstantin Khorenko
Date: Fri Dec 28 2018 - 09:45:42 EST


On 12/27/2018 07:50 PM, Michal Hocko wrote:
> On Thu 27-12-18 16:05:18, Konstantin Khorenko wrote:
>> On 12/26/2018 11:40 AM, Michal Hocko wrote:
>>> Appart from general comments as a reply to the cover (btw. this all
>>> should be in the changelog because this is the _why_ part of the
>>> justification which should be _always_ part of the changelog).
>>
>> Thank you, will add in the next version of the patch alltogether
>> with other changes if any.
>>
>>> On Tue 25-12-18 18:39:27, Konstantin Khorenko wrote:
>>> [...]
>>>> +config WARN_HIGH_ORDER
>>>> + bool "Enable complains about high order memory allocations"
>>>> + depends on !LOCKDEP
>>>
>>> Why?
>>
>> LOCKDEP makes structures big, so if we see a high order allocation warning
>> on a debug kernel with lockdep, it does not give us a lot - lockdep enabled
>> kernel performance is not our target.
>> i can remove !LOCKDEP dependence here, but then need to adjust default
>> warning level i think, or logs will be spammed.
>
> OK, I see but this just points to how this is not really a suitable
> solution for the problem you are looking for.

i have to admit, yes, it is inconvenient to have such a dependency.
i would be really glad to hear alternatives...

Just to mention: tracepoints are for issues investigation (when we already have any),
and i'm thinking about issues prevention. Gathering places which might
lead to problem and rework to prevent possible issues.

>>>> +static __always_inline void warn_high_order(int order, gfp_t gfp_mask)
>>>> +{
>>>> + static atomic_t warn_count = ATOMIC_INIT(32);
>>>> +
>>>> + if (order >= warn_order && !(gfp_mask & __GFP_NOWARN))
>>>> + WARN(atomic_dec_if_positive(&warn_count) >= 0,
>>>> + "order %d >= %d, gfp 0x%x\n",
>>>> + order, warn_order, gfp_mask);
>>>> +}
>>>
>>> We do have ratelimit functionality, so why cannot you use it?
>>
>> Well, my idea was to really shut up the warning after some number of messages
>> (if a node is in production and its uptime, say, a year, i don't want to see
>> many warnings in logs, first several is enough - let's fix them first).
>
> OK, but it is quite likely that the system is perfectly healthy and
> unfragmented after fresh boot when doing a large order allocations is
> perfectly fine.

You are right again, Michal. Thus just switch those early allocations to
kvmalloc() and on boot on an unfragmented system same kmalloc() will be used.
And no new warning for that place. And no any chance this place ever trigger
compaction (in case we did not notice some way this allocation might also
happen later, not on a fresh node).

> Note that it is smaller order allocations that generate
> fragmentation in general.

And again - yes, that's true. Still i don't know how to avoid memory fragmentation,
but can avoid (ok, significantly decrease the number of) big allocations which
might trigger compaction because of that fragmentation.


--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team