Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

From: Andrew Lutomirski
Date: Tue May 24 2011 - 07:55:48 EST


On Tue, May 24, 2011 at 7:24 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> On Mon, May 23, 2011 at 9:34 PM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>> On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>>> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>>>> Could you test below patch based on vanilla 2.6.38.6?
>>>> The expect result is that system hang never should happen.
>>>> I hope this is last test about hang.
>>>>
>>>> Thanks.
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 292582c..1663d24 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>>       if (scanned == 0)
>>>>               scanned = SWAP_CLUSTER_MAX;
>>>>
>>>> -       if (!down_read_trylock(&shrinker_rwsem))
>>>> -               return 1;       /* Assume we'll be able to shrink next time */
>>>> +       if (!down_read_trylock(&shrinker_rwsem)) {
>>>> +               /* Assume we'll be able to shrink next time */
>>>> +               ret = 1;
>>>> +               goto out;
>>>> +       }
>>>>
>>>>       list_for_each_entry(shrinker, &shrinker_list, list) {
>>>>               unsigned long long delta;
>>>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>>               shrinker->nr += total_scan;
>>>>       }
>>>>       up_read(&shrinker_rwsem);
>>>> +out:
>>>> +       cond_resched();
>>>>       return ret;
>>>>  }
>>>>
>>>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
>>>> *pgdat, int order, long remaining,
>>>>        * must be balanced
>>>>        */
>>>>       if (order)
>>>> -               return pgdat_balanced(pgdat, balanced, classzone_idx);
>>>> +               return !pgdat_balanced(pgdat, balanced, classzone_idx);
>>>>       else
>>>>               return !all_zones_ok;
>>>>  }
>>>
>>> So far with this patch I can't reproduce the hang or the bogus OOM.
>>>
>>> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm
>>> running 2.6.38.6, and I have exactly two patches applied.  One is the
>>> attached patch and the other is a the fpu.ko/aesni_intel.ko merger
>>> which I need to get dracut to boot my box.
>>>
>>> For fun, I also upgraded to 8GB of RAM and it still works.
>>>
>>
>> Hmm. Could you test it with enable thp and 2G RAM?
>> Isn't it a original test environment?
>> Please don't change test environment. :)
>
> The test that passed last night was an environment (hardware and
> config) that I had confirmed earlier as failing without the patch.
>
> I just re-tested my original config (from a backup -- migration,
> compaction, and thp "always" are enabled).  I get bogus OOMs but not a
> hang.  (I'm running with mem=2G right now -- I'll swap the DIMMs back
> out later on if you want.)
>
> I attached the bogus OOM (actually several that happened in sequence).
>  They look readahead-related.  There was plenty of free swap space.

Now with log actually attached.

>
> --Andy
>

Attachment: bogus_oom.txt.xz
Description: application/xz