Re: [DISCUSSION] proposed mctl() API

From: Matthew Wilcox
Date: Tue Jun 10 2025 - 12:26:52 EST


On Tue, Jun 10, 2025 at 05:00:47PM +0100, Usama Arif wrote:
> On 10/06/2025 16:46, Matthew Wilcox wrote:
> > On Tue, Jun 10, 2025 at 04:30:43PM +0100, Usama Arif wrote:
> >> If we have 2 workloads on the same server, For e.g. one is database where THPs
> >> just dont do well, but the other one is AI where THPs do really well. How
> >> will the kernel monitor that the database workload is performing worse
> >> and the AI one isnt?
> >
> > It can monitor the allocation/access patterns and see who's getting
> > the benefit. The two workloads are in competition for memory, and
> > we can tell which pages are hot and which cold.
> >
> > And I don't believe it's a binary anyway. I bet there are some
> > allocations where the database benefits from having THPs (I mean, I know
> > a database which invented the entire hugetlbfs subsystem so it could
> > use PMD entries and avoid one layer of TLB misses!)
> >
>
> Sure, but this is just an example. Workload owners are not going to spend time
> trying to see how each allocation works and if its hot, they put it in hugetlbfs.

No, they're not. It should be automatic. There are many deficiencies
in the kernel; this is one of them.

> Ofcourse hugetlbfs has its own drawbacks of reserving pages.

Drawback or advantage? It's a feature. You're being very strange about
this. First you want to reserve THPs for some workloads only, then when
given a way to do that you complain that ... you have to reserve hugetlb
pages. You can't possibly mean both of these things sincerely.