Re: [PATCH v13 3/6] mm/vmstat: manage per-CPU stats from CPU context when NOHZ full

From: Frederic Weisbecker
Date: Tue Jan 10 2023 - 11:12:37 EST


On Tue, Jan 10, 2023 at 11:19:01PM +0800, Hillf Danton wrote:
> On Tue, 10 Jan 2023 08:50:28 -0300 Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > On Tue, Jan 10, 2023 at 10:43:56AM +0800, Hillf Danton wrote:
> > > On 9 Jan 2023 11:12:49 -0300 Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > > >
> > > > Yes, but if you do not return to userspace, then the per-CPU vm
> > > > statistics can be dirty indefinitely.
> > >
> > > Could you specify the reasons for failing to return to userspace,
> > > given it is undesired intereference for the shepherd to queue work
> > > on the isolated CPUs.
> >
> > Any system call that takes longer than the threshold to sync vmstats.
>
> Which ones?
>
> If schedule() occurs during syscall because of acquiring mutex for instance
> then anything on the isolated runqueue, including workqueue worker shepherd
> wakes up, can burn CPU cycles without undesired intereference produced.

The above confuses me. How others tasks would help with syscalls that take too long too
service?

> >
> > Or a long running kernel thread, for example:
>
> It is a buggyyyy example.
> >
> > https://stackoverflow.com/questions/65111483/long-running-kthread-and-synchronize-net

I can imagine a CPU spending most of its time processing networking packets
through interrupts/softirq within ksoftirqd/NAPI while another CPU process
these packets in userspace.

In this case the CPU handling the kernel part can theoretically never go to
idle/user. nohz_full isn't optimized toward such job but there is nothing
to prevent it from doing such job.

Thanks.