Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

From: Trevor Cordes
Date: Fri Feb 03 2017 - 19:37:21 EST


On 2017-02-01 Michal Hocko wrote:
> On Wed 01-02-17 03:29:28, Trevor Cordes wrote:
> > On 2017-01-30 Michal Hocko wrote:
> [...]
> > > Testing with Valinall rc6 released just yesterday would be a good
> > > fit. There are some more fixes sitting on mmotm on top and maybe
> > > we want some of them in finall 4.10. Anyway all those pending
> > > changes should be merged in the next merge window - aka 4.11
> >
> > After 30 hours of running vanilla 4.10.0-rc6, the box started to go
> > bonkers at 3am, so vanilla does not fix the bug :-( But, the bug
> > hit differently this time, the box just bogged down like crazy and
> > gave really weird top output. Starting nano would take 10s, then
> > would run full speed, then when saving a file would take 5s.
> > Starting any prog not in cache took equally as long.
>
> Could you try with to_test/linus-tree/oom_hickups branch on the same
> git tree? I have cherry-picked "mm, vmscan: consider eligible zones in
> get_scan_count" which might be the missing part.

I ran to_test/linus-tree/oom_hickups branch (4.10.0-rc6+) for 50 hours
and it does NOT have the bug! No problems at all so far.

So I think whatever to_test/linus-tree/oom_hickups has that since-4.9
has that vanilla 4.10-rc6 does *not* have is indeed the fix.

For my reference, and I know you guys aren't distro-specific, what is
the best way to get this fix into Fedora 24 (currently 4.9)? Can it be
backported or made as a patch they can apply to 4.9? Or 4.10? If this
fix only goes into 4.11 then I fear we'll never see it in Fedora and us
rhbz guys will not have a stock-Fedora fix for this until F25 or F26.
Again, I'm not trying to force this out of scope, I'm just wondering
about the logistics in these situations.

Once again, thanks to all for your great work and help! P.S. I'll try
a couple of the other ideas Mel had about ramping the RAM back up, etc.