Re: [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groupsdynamically

From: Paul Jackson
Date: Thu Jul 06 2006 - 20:00:30 EST


Several months ago, Srivatsa wrote:
> I couldn't test this on a multi-core machine, since I don't think we
> have one in our lab.
>
> Suresh, would you mind testing the patch on a multi-core machine, in case you
> have access to one?
>
> Basically you would need to do create a exclusive CPUset with one CPU in it
> (ensure that its sibling in the same core is not part of the same
> CPUset). As soon as you make the CPUset exclusive, you would hit some
> kind of hang. With this patch, the hang should go away.


Summary: Where do we stand with multi-core and this bug?


I don't see a reply from Suresh on whether he could test on multi-core.

I finally happened to be running on a hyper-threaded box last week,
and stumbled over this bug that Srivatsa's patch fixes. Hawkes
remembered Srivatsa's patch, I tried it, and it worked. Thanks!

But now I'm quite confused as to the situation with multi-core.

Details of my confusions, for the bored:

From Srivatsa's remark, I would have guessed that multi-core was
at risk for this bug too, but Srivatsa was hopeful that his patch
would fix that too.

Early this week, a couple of people who shall remain anonymous here
raised the question of whether we had the same problem with multi-core.
One of them believed that multi-core did have the same problem.

I got a little time on a multi-core system this morning to test it,
and while running what I -thought- was a kernel -without- Srivatsa's
patch, I could not find any problem. I made a cpuset with just a
single logical cpu in it, and marked it cpu_exclusive, and the
system did not hang.

It will be another day before I can get on that multi-core system
again to verify my findings.

I was hoping that someone could actually -read- this code and state
with confidence that one of the following held:
* it was already working ok on multi-core (a one CPU cpu_exclusive cpuset),
* it was broken, but Srivatsa's patch fixes it, or
* it's still broken, even with Srivatsa's patch.

I tried a couple of times to read the code myself, but could not
make any headway there.

So ... what's up with multi-core and this bug?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/