Re: [RESEND RFC PATCH 0/3] Provide fast access to thread specific data

From: Peter Oskolkov
Date: Fri Sep 10 2021 - 15:36:53 EST


On Fri, Sep 10, 2021 at 12:12 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> On Fri, Sep 10, 2021 at 6:28 PM Peter Oskolkov <posk@xxxxxxxxxx> wrote:
> > On Fri, Sep 10, 2021 at 9:13 AM Prakash Sangappa
> > <prakash.sangappa@xxxxxxxxxx> wrote:
> > > > Do you think your sys_task_getshared can be tweaked to return an
> > > > arbitrarily-sized block of memory (subject to overall constraints)
> > > > rather than a fixed number of "options"?
> > >
> > > I suppose it could. How big of a size? We don’t want to hold on to
> > > arbitrarily large amount of pinned memory. The preference would
> > > be for the kernel to decide what is going to be shared based on
> > > what functionality/data sharing is supported. In that sense the size
> > > is pre defined not something the userspace/application can ask.
> >
> > There could be a sysctl or some other mechanism that limits the amount
> > of memory pinned per mm (or per task). Having "options" hardcoded for
> > such a generally useful feature seems limiting...
>
> That seems like it'll just create trouble a few years down the line
> when the arbitrarily-chosen limit that nobody is monitoring blows up
> in someone's production environment.
>
> If this area is used for specific per-thread items, then the kernel
> should be able to enforce that you only allocate as much space as is
> needed for all threads of the process (based on the maximum number
> that have ever been running in parallel in the process), right? Which
> would probably work best if the kernel managed those allocations.

This sounds, again, as if the kernel should be aware of the kind of
items being allocated; having a more generic mechanism of allocating
pinned memory for the userspace to use at its discretion would be more
generally useful, I think. But how then the kernel/system should be
protected from a buggy or malicious process trying to grab too much?

One option would be to have a generic in-kernel mechanism for this,
but expose it to the userspace via domain-specific syscalls that do
the accounting you hint at. This sounds a bit like an over-engineered
solution, though...