Re: [RFC] Per file OOM badness

From: Michal Hocko
Date: Fri Jan 19 2018 - 03:21:01 EST


On Thu 18-01-18 12:01:32, Eric Anholt wrote:
> Michal Hocko <mhocko@xxxxxxxxxx> writes:
>
> > On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> >> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> >> > Hi, this series is a revised version of an RFC sent by Christian König
> >> > a few years ago. The original RFC can be found at
> >> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> >> >
> >> > This is the same idea and I've just adressed his concern from the original RFC
> >> > and switched to a callback into file_ops instead of a new member in struct file.
> >>
> >> Please add the full description to the cover letter and do not make
> >> people hunt links.
> >>
> >> Here is the origin cover letter text
> >> : I'm currently working on the issue that when device drivers allocate memory on
> >> : behalf of an application the OOM killer usually doesn't knew about that unless
> >> : the application also get this memory mapped into their address space.
> >> :
> >> : This is especially annoying for graphics drivers where a lot of the VRAM
> >> : usually isn't CPU accessible and so doesn't make sense to map into the
> >> : address space of the process using it.
> >> :
> >> : The problem now is that when an application starts to use a lot of VRAM those
> >> : buffers objects sooner or later get swapped out to system memory, but when we
> >> : now run into an out of memory situation the OOM killer obviously doesn't knew
> >> : anything about that memory and so usually kills the wrong process.
> >
> > OK, but how do you attribute that memory to a particular OOM killable
> > entity? And how do you actually enforce that those resources get freed
> > on the oom killer action?
> >
> >> : The following set of patches tries to address this problem by introducing a per
> >> : file OOM badness score, which device drivers can use to give the OOM killer a
> >> : hint how many resources are bound to a file descriptor so that it can make
> >> : better decisions which process to kill.
> >
> > But files are not killable, they can be shared... In other words this
> > doesn't help the oom killer to make an educated guess at all.
>
> Maybe some more context would help the discussion?
>
> The struct file in patch 3 is the DRM fd. That's effectively "my
> process's interface to talking to the GPU" not "a single GPU resource".
> Once that file is closed, all of the process's private, idle GPU buffers
> will be immediately freed (this will be most of their allocations), and
> some will be freed once the GPU completes some work (this will be most
> of the rest of their allocations).
>
> Some GEM BOs won't be freed just by closing the fd, if they've been
> shared between processes. Those are usually about 8-24MB total in a
> process, rather than the GBs that modern apps use (or that our testcases
> like to allocate and thus trigger oomkilling of the test harness instead
> of the offending testcase...)
>
> Even if we just had the private+idle buffers being accounted in OOM
> badness, that would be a huge step forward in system reliability.

OK, in that case I would propose a different approach. We already
have rss_stat. So why do not we simply add a new counter there
MM_KERNELPAGES and consider those in oom_badness? The rule would be
that such a memory is bound to the process life time. I guess we will
find more users for this later.
--
Michal Hocko
SUSE Labs