Re: [PATCH v2 0/5] Defer throttle when task exits to user

From: Aaron Lu
Date: Thu Jul 03 2025 - 02:35:31 EST


Hi Benjamin,

Thanks for taking a look.

On Wed, Jul 02, 2025 at 03:00:56PM -0700, Benjamin Segall wrote:
> Aaron Lu <ziqianlu@xxxxxxxxxxxxx> writes:
>
> > For pelt clock, I chose to keep the current behavior to freeze it on
> > cfs_rq's throttle time. The assumption is that tasks running in kernel
> > mode should not last too long, freezing the cfs_rq's pelt clock can keep
> > its load and its corresponding sched_entity's weight. Hopefully, this can
> > result in a stable situation for the remaining running tasks to quickly
> > finish their jobs in kernel mode.
>
> I suppose the way that this would go wrong would be CPU 1 using up all
> of the quota, and then a task waking up on CPU 2 and trying to run in
> the kernel for a while. I suspect pelt time needs to also keep running
> until all the tasks are asleep (and that's what we have been running at
> google with the version based on separate accounting, so we haven't
> accidentally done a large scale test of letting it pause).
>

Got it, I'll rework this part to keep pelt clock ticking and only stop
it when all tasks of a throttled cfs_rq get either throttled or blocked.
I will paste a diff on top of this series and if the diff looks OK, I'll
fold it to patch3 in next version.

> Otherwise it does look ok, so long as we're ok with increasing distribute
> time again.

Good to know this!

About not strictly limiting quota, I suppose that is a trade off and
having a system that is operating properly is better than a system with
task hung or even deadlock.

Best regards,
Aaron