Re: kernel panics at google

From: kuznet@ms2.inr.ac.ru
Date: Fri Jan 21 2000 - 13:01:30 EST


Hello!

> All of this code is protected by asynchronous behavior either
> by virtue of running in BH or from the user within' BH atomic
> backlog processing.

BTW one hint. Look at chunks surrounded by TCP_DEBUG in vger.
I saw this phenomenon first time with softnet and was confused a bit,
because I found no obstacles for the same thing to occur in plain 2.3.
(Well, luckily, recently I saw crash report from sim@stormix.COM
(Simon Kirby) 8))

Moreover, it is possible even in 2.2, when timer is fired,
while socket is locked, and to the time, when it checks
for socket lock, socket is already released with different state.
Note, that poor process context code seriously believes, that
it successfully deleted this timer and has no means to know,
that timer hided on another cpu 8)8) Well, it means that timer handlers
must be fixed to rely only on true socket state, rather than
of assumption that, if timer is started it has some work to do.
I.e. packets_out in retransmit timer and (send_head!=NULL && packets_out==0)
in probe0 timer.

Fast workaround in 2.2 is to clear sock_readers under start_bh_atomic()
to catch running timers. Grr...

It is just hypothesis, seems, race of this kind cannot corrupt queue,
it will just oops with null skb.

Alexey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:25 EST