Re: [PATCH] schedstats additions

From: Nick Piggin
Date: Wed Sep 08 2004 - 19:07:59 EST


Rick Lindsley wrote:
I have a patch here to provide more useful statistics for me. Basically
it moves a lot more of the balancing information into the domains instead
of the runqueue, where it is nearly useless on multi-domain setups (eg.
SMT+SMP, SMP+NUMA).
It requires a version number bump, but that isn't much of an issue because
I think we're about the only two using it at the moment. But your tools
will need a little bit of work.
What do you think?

The idea of moving some counters from runqueues to domains is fine in
general, but I've some questions about a couple of specific changes in
your patch.

It looks to me like there are some changes in try_to_wake_up() that
aren't schedstats related, although schedstats code is among some
that is moved around. Is there some code there that should be
broken out separately?


There is, yes. I'll be sure to seperate it.

alb_cnt
by moving this, we won't get an accurate look at the number of
times we called active_load_balance and returned immediately
because nr_running had slipped to 0 or 1. how about we add
another counter to count that too, and/or change the name of
this one?


OK.

lb_balanced
are you sure lb_balanced[idle] can't be deduced from lb_cnt[idle]
and lb_failed[idle]?


I don't think so, because you also have the success case, which is
!balanced && !failed.

ttwu_attempts
ttwu_moved
removing these makes it harder to determine how successful
try_to_wake_up() was at moving a process. What counters would
I use to get this information if these were removed?


ttwu_cnt in the rq stats, and ttwu_wake_affine / ttwu_wake_balance
in the domain stats.

ttwu_remote
ttwu_wake_remote
so what's the one line description of what these count now?


ttwu_remote/ttwu_wake_remote are the number of times a runqueue has
woken a remote task / a remote task within that domain, respectively.
Regardless of whether or not it gets pulled onto the local CPU.

smt_cnt
sbe_cnt
how might I see how often sched_migrate_task() and sched_exec()
were called if these were deleted?


sbe_pushed should basically be the same as smt_cnt, barring rare
races with the cpus_allowed mask. I guess sbe_cnt doesn't have to
go.

lb_pulled
Rather than add another counter here, would it be as effective
to make pt_gained a domain counter? Looks like you're collecting

Yeah removing the runqueue counters for these would be good.

the same information. pt_lost would have to remain a runqueue
counter, though, since losing a task has nothing to do with a
particular domain.


Whatever domain that the pulling CPU was in, is also a fair candidate
for pt_lost. Remember, all the domains are per-CPU so any information
you can get from a per-runqueue counter you can also get from a domain
counter.

I'll make a few changes and give you another look. Thanks for the comments.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/