Re: [PATCH 3/3] mm: add vmstat statistics for madvise_[cold|pageout]

From: Michal Hocko
Date: Wed Jan 18 2023 - 16:15:08 EST


On Wed 18-01-23 09:55:38, Minchan Kim wrote:
> On Wed, Jan 18, 2023 at 06:27:02PM +0100, Michal Hocko wrote:
> > On Wed 18-01-23 09:15:34, Minchan Kim wrote:
> > > On Wed, Jan 18, 2023 at 10:11:46AM +0100, Michal Hocko wrote:
> > > > On Tue 17-01-23 15:16:32, Minchan Kim wrote:
> > > > > madvise LRU manipulation APIs need to scan address ranges to find
> > > > > present pages at page table and provides advice hints for them.
> > > > >
> > > > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted]
> > > > > shows the proactive reclaim efficiency so this patch addes those
> > > > > two statistics in vmstat.
> > > >
> > > > Please describe the usecase for those new counters.
> > >
> > > I wanted to know the proactive reclaim efficieny using MADV_COLD/MDDV_PAGEOUT.
> > > Userspace has several policy which when/which vmas need to be hinted by the call
> > > and they are evolving. I needed to know how effectively their policy works since
> > > the vma ranges are huge(i.e., nr_hinted/nr_scanned).
> >
> > I can see how that can be an interesting information but is there
> > anything actionable about that beyond debugging purposes? In other words
> > isn't this something that could be done by tracing instead?
>
> That's the statictis for telemetry. With those stat, we are collecting
> various vmstat fields(i.e., pgsteal/pgscan) from real field devices
> and thought those two stats would be good fit along with other reclaim
> statistics in vmstat since we can know how much proactive madvise policy
> could make system healthier(e.g., less kswapd scan, less allocstall
> and so on).
>
> >
> > Also how are you going to identify specific madvise calls when they can
> > interleave arbitrarily?
>
> I guess you are talking about how we could separate MADV_PAGEOUT and
> MADV_COLD from vmstat. That's valid question. I thought for the start,
> adds just umbrella stat like this and if we want to break down, we need
> to introudce sysfs likewise slab.

No, not really. MADV_COLD is about aging. There is no actual reclaim
going on so pgscan/steal metrics do not make any sense. I am asking
about potential different concurrent MADV_PAGEOUT happening. From what
you've said earlier (how effectively policy works) I have understood you
want to find out how a specific MADV_PAGEOUT effective is. But there
maybe different callers of this applied to all sorts of different memory
mappings and therefore the efficiency might be really different. As
there is no clear way to tell one from the other I am really questioning
whether this global stat is actually useful.

--
Michal Hocko
SUSE Labs