Re: linux SMP stability or lack thereof

Ricardo Galli Granada (gallir@atlas-iap.es)
Wed, 30 Sep 1998 16:38:51 +0200 (MET)


On Wed, 30 Sep 1998, Doug Ledford wrote:

> You aren't paying attention to what I said. Your writing this up like the
> problem has to be a driver issue somewhere, either NIC or SCSI or whatever.
> The exact EIP traces I have from this 2.0.35 SMP problem point not to any
> driver but to the core kernel code. Skip testing the drivers, it's a waste
> of time as long as the kernel can go into a deadlock with a fork(). There
> is a race in the core kernel code, and if you are getting hit by it, then
> you might as well recompile your kernel without SMP.

Please Doug, do not missunderstand me, I wanted to be polite and tried to
avoid to say "kernel 2.0.xx is SMP unsafe". This could cause tons of flame
messages in my mailbox. I am a linux fanatic, in fact, in my company we
just use linux systems (UP...) for 24x7 servers. Windows NT are relegated
to internal desktop users (mainly for MS Office and design software).

We support as much as we can to Linux community, we always bought
different Linux distributions (we bought two Slackware, two Redhat and
two SUSE distributions and also tons of Linux books) and we developed some
ISP accounting software supported just on Linux.

Unfortunately, I/we are not kernel hackers, so we can not complain to
Linus/developers to solve our SMP problem, the only thing we can do is to
help them. But again, my reports did no get any hacker answer but yours.

At least I can say that given MY configuration in MY motherboard, Linux is
SMP unstable. People can say, "it's the aic7xxx driver", altough I doubt
it, I cannot assure it (I teach Operating Systems in my university, so you
can understand my ignorance ;-)

>
> My system can reproduce the problem quite easily, but I don't pretend to be
> the least bit expert on the fork() portions of the kernel. All I can say is
> I can reliably reproduce this problem and the EIP traces say it happens
> during a fork() and that it was very likely aggravated by the changes
> between 2.0.33 and 2.0.35 that enabled the swapping of shared-COW pages.
> But, as Stephen has pointed out, if that made the problem worse (which it
> did) then the real issue was just being hidden when we weren't swapping
> those pages out. Any experts out there on the fork() code are more than
> welcome to get with me and see if we can't track the problem down, I'll put
> my machine into test mode to try and get rid of the problem.

I read some discussion about this problem (swapping COW pages), specially
to Linus "forcing" hackers to have a minimun (although un-optimal) bug
free code working. I thought it is just for 2.1.xxx and 2.2.xx. Isn't
so?

Thanks for your cooperation and help.

--
Ricardo Galli
University of Balearic Islands
ATLAS Internet Access Provider
mailto:gallir@atlas-iap.es
mailto:rgalli@acm.org
http://www.atlas-iap.es

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/