Re: [PATCH] cpuset semaphore depth check optimize

From: Paul Jackson
Date: Thu Sep 15 2005 - 12:46:28 EST


Roman wrote:
> Sorry, I thought it was more obvious... :)

Likely it was obvious. On the other hand, I wouldn't be in this mess
if I understood everything I should here.

Most of what you wrote this time actually made sense to me as I read
it. Thanks for taking the time.

One part left to discuss and then I will put this aside for a few
days while I focus on my day job.


Roman, replying to pj:
> > > I don't think that works at all. Consider the following sequence:
> > > 1) ...
> > > 4) Oops - we just missed doing a release.
> >
> > If you want to do it like this, just add an else before the last
> > atomic_dec().
> > The main point I was trying to make is that clearing tsk->cpuset doesn't
> > require the spinlock, as long as you check for dead tasks in
> > attach_task(). Ignore the rest if it's too confusing.

I don't think confusion is the problem here. I think your proposal was
wrong.

There are *three* ways to get to a cpuset:
1) via some task->cpuset pointer,
2) via that cpuset's path below /dev/cpuset, and
3) walking up the parent chain from a child.

In the first half of your line:

> > if (atomic_read(&cs->count) == 1 && notify_on_release(cs)) {

if per chance the cs->count was one, then for an instant no other task
was using this cpuset, and it had no children. But you can still get
to the cpuset via its /dev/cpuset path.

So by the time we get to the second half of this line where we check
for "notify_on_release(cs)", all hell could have broken loose, and
there might be 17 tasks using this self same cpuset, and 19 child
cpusets of this cpuset. These interlopers. could have arrived by
accessing the cpuset using its path below /dev/cpuset.

The flip side is just as plausible. We cannot, in any case, execute an
unguarded atomic_dec on cpuset->count, if that cpuset has been marked
notify_on_release, and if that cpuset is accessible by any of the
above three possible ways, due to the risk the decrement will put the
count to zero, and we'd miss issuing a release notifier.

Actually, even the single check that is in the code now:

if (notify_on_release(cs)) {

would be bogus if we required instruction level synchronization, but it
is racing on an attribute set by some poor schlob user, so the kernel
need only preserve ordering at the system call level, not at the machine
instruction level.

Putting that part aside, why would you make a point of stating that
"that clearing tsk->cpuset doesn't require the spinlock"? I don't
take cpuset_sem when I clear tsk->cpuset, so why would you think I'd
take this new spinlock instead?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/