Re: kernel BUG at kernel/workqueue.c:291

From: Carsten Aulbert
Date: Tue Mar 03 2009 - 02:36:50 EST


Hi Andrew,

Andrew Morton schrieb:
>> in the mean time 43 of our nodes were struck with this error. It seems
>> that the jobs of a certain user can trigger this bug, however I have no
>> clue how to really trigger it manually.
>
> That's a lot of nodes.
Quite, at least some percentage of the whole system.
>
> Let's cc the NFS developers, see if this rpciod crash is familiar to them?

Good idea, I should have done that myself - sorry

I think we were able to pinpoint at least one user's jobs to "generate"
this, but I need to talk to him, what access patterns are used via NFS here.

Systems are running Debian Etch,

dpkg -l | awk '/(nfs|portmap)/ {print $2 "\t\t" $3}'
libnfsidmap2 0.18-0
mountnfs 1.1.3-2
nfs-common 1.0.10-6+etch.1
nfs-kernel-server 1.0.10-6+etch.1
portmap 5-26


If you need more, please let me know! So far the machines are 'on hold',
i.e. we have not yet rebooted them to be able to find out a little bit
more. If you(anyone) think we can reboot them and put back into our
scheduling queue, please let me know, the users are waiting for more cycles.

Thanks a lot

Carsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/