Re: [Real fix] Re: Kernel panic: can't push onto full stack

Andrea Arcangeli (andrea@e-mind.com)
Mon, 1 Mar 1999 12:15:53 +0100 (CET)


On Mon, 1 Mar 1999, Alexander Viro wrote:

>Michael sent me the lis of oopsen for the second patch. All of them
>are in the same place and it looks, uhm, interesting. Fragment in
>question:
>fs/proc/array.c::get_stat()
>static int get_stat(int pid, char * buffer)
>{
> struct task_struct *tsk;
> unsigned long vsize, eip, esp, wchan;
> long priority, nice;
> int tty_pgrp;
> sigset_t sigign, sigcatch;
> char state;
>
> read_lock(&tasklist_lock);
> tsk = find_task_by_pid(pid);
> read_unlock(&tasklist_lock); /* FIXME!! This should be done after the last use */
> if (!tsk)
> return 0;
> state = *get_task_state(tsk);
> vsize = eip = esp = 0;
> if (tsk->mm && tsk->mm != &init_mm) {
> struct vm_area_struct *vma = tsk->mm->mmap;
> while (vma) {
> vsize += vma->vm_end - vma->vm_start;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> At that point vma is 08010000. WIth obvious results.
>Aside of our story - shouldn't we grab the semaphore on tsk->mm here?

Look at ftp://e-mind.com/pub/linux/arca-tree/2.2.2_arca3.gz

I heavily fixed all races of array.c some day after 2.2.1 is been
released.

>ISTR that there were problems in proc/array.c. Andrea, could you comment
>on that? I don't think that it's the cause of problem - it rather looks

I really dubit it's the cause of the problem. I instead thing that some
(SMP) race in af_unix.c or garbage.c is corrupting memory (or that Michael
is getting hw problems...). When I'll be able to reproduce the crash here
I'll can tell you more... ;). If somebody has test-suite to reproduce the
crash, please let me know.

>initially I've misinterpreted it). It came from inet_recvmsg(). AFAICS in
>unix_gc() we never mess with queues of AF_INET sockets, so it sounds, erm,
>interesting.

unix_gc should be recalled only at unix-socket destruction if I understand
well the code...

> Michael, could you grep through your logs for EIP: and send me the
>stuff around it (gzipped, if possible ;-)?
> Alan, you wrote about other reports - were they on l-k too? When
>did it happen? I couldn't find them in archives ;-/

AFIK Tomasz was able to reproduce the panic in 2.2.2 and is the first one
that reported the problem to me.

Tomasz, could you try out the patches on the list (you can find them in
the thread with this subject), precisely the second patch from Alexander +
my cleanup over it and try if you'll get machine lockups as the other guy
is getting frequently? The patch should fix the panic problem of the unix
garbage collector.

Thanks.

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/