Re: Help: vfs problem?

From: Russell King (rmk@arm.linux.org.uk)
Date: Sun May 21 2000 - 08:10:40 EST


Russell King writes:
> One of the more common oopsen (cut down to the bare essentials) is
> from within __remove_from_free_list:
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000024
> pc : [<c0068638>] lr : [<c0068fd4>] within __remove_from_free_list
> Process hdparm (pid: 126, stackpage=c3cdb000)
> Function entered at [<c0068604>] from [<c0068fd4>] __remove_from_free_list
> Function entered at [<c0068f7c>] from [<c006fedc>] getblk
> Function entered at [<c006fbb0>] from [<c0066810>] block_read
> Function entered at [<c00666d0>] from [<c002de00>] sys_read
>
> The NULL pointer in question is bh->b_next_free, which is part of the
> following line in fs/buffer.c:__remove_from_free_list:
>
> bh->b_next_free->b_prev_free = bh->b_prev_free;
>
> Is this a known problem with 2.3.99-pre8, or could it be a possible problem?
> Is anyone willing to say outright that this could not be a generic kernel bug?
> I'd like to know this just so that I don't spend the rest of today chasing
> an already known about-to-be-fixed bug.

Further to this, I can now confirm that the bh free lists are getting
corrupted outside the timeframe when the busmaster dma is active. I have
discovered this by adding a "check the free lists" function which basically
does (note that this is not a SMP machine):

        disable all interrupts
        for each size of list
          if list is non-empty
            for each bh on the list
              if prev pointer is null, then complain
              if prev->next pointer is not this bh, then complain
              if next pointer is null, then complain
              if next->prev pointer is not this bh, then complain
        restore interrupt state

This code is called at several point:
1. in ide_build_dmatable, just after ide_build_sglist returns
2. just before ide_build_dmatable returns
3. at the end of ide_destroy_dmatable

This starts to pick up bad buffer head free links at point 1. Therefore,
they were correct last time we visited point 3, ie, after the last BM-DMA
transfer.

This is causing me to think that there is some rare kernel bug that is being
triggered, rather than a hardware problem (unless the BM-DMA is proceeding
after it should have ended).
   _____
  |_____| ------------------------------------------------- ---+---+-
  | | Russell King rmk@arm.linux.org.uk --- ---
  | | | | http://www.arm.linux.org.uk/~rmk/aboutme.html / / |
  | +-+-+ --- -+-
  / | THE developer of ARM Linux |+| /|\
 / | | | --- |
    +-+-+ ------------------------------------------------- /\\\ |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:19 EST