kiobuf / vm bug

From: Gerd Knorr (kraxel@bytesex.org)
Date: Thu Nov 15 2001 - 11:55:31 EST


  Hi,

I think I have found a kiobuf-related bug in the VM of recent linux
kernels. 2.4.13 is fine, 2.4.14-pre1 doesn't boot my machine,
2.4.14-pre2 + newer kernels are broken.

/me runs a kernel with a few v4l-related patches and my current 0.8.x
bttv version (available from http://bytesex.org/patches/ +
http://bytesex.org/bttv/).

With this kernel I can trigger the following BUG():

ksymoops 2.4.3 on i686 2.4.15-pre4. Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.15-pre4/ (default)
     -m /boot/System.map-2.4.15-pre4 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Error (expand_objects): cannot stat(/modules/ide-disk.o) for ide-disk
Error (expand_objects): cannot stat(/modules/ide-probe-mod.o) for ide-probe-mod
Error (expand_objects): cannot stat(/modules/ide-mod.o) for ide-mod
Error (expand_objects): cannot stat(/modules/ext2.o) for ext2
Warning (map_ksym_to_module): cannot match loaded module ext2 to a unique module object. Trace may not be reliable.
kernel BUG at page_alloc.c:84!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0129e5b>] Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 0000001f ebx: c1f1e300 ecx: c01e22a0 edx: 00003528
esi: c1f1e300 edi: f94e2000 ebp: 00000000 esp: f3da7ebc
ds: 0018 es: 0018 ss: 0018
Process setiathome (pid: 738, stackpage=f3da7000)
Stack: c01c0ee7 00000054 c1f1e300 00000000 f94e2000 00000000 f2ee69a0 f2ee69c4
       00000000 f94e2060 f94e2000 c012a6f3 c0121120 f2ee69ec f2ee69a0 f94cdfa0
       f94bd6d8 f94e2000 f2ee69ec f94c6b4e f2ee69ec f7ffb800 f2ee69ec f7ffb800
Call Trace: [<c012a6f3>] [<c0121120>] [<f94cdfa0>] [<f94bd6d8>] [<f94c6b4e>]
   [<f94cdfa0>] [<f94c3b3d>] [<f94cdfa0>] [<f94cdfa0>] [<f94c3c7c>] [<f94cdfa0>]
   [<c0107f9d>] [<f94cdfa0>] [<c0108106>]
Code: 0f 0b 83 c4 08 8b 46 18 a8 80 74 11 6a 56 68 e7 0e 1c c0 e8

>>EIP; c0129e5a <__free_pages_ok+aa/29c> <=====
Trace; c012a6f2 <__free_pages+1a/1c>
Trace; c0121120 <unmap_kiobuf+34/48>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94bd6d8 <.bss.end+4e1a/????>
Trace; f94c6b4e <END_OF_CODE+e290/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94c3b3c <END_OF_CODE+b27e/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94c3c7c <END_OF_CODE+b3be/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; c0107f9c <handle_IRQ_event+30/5c>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; c0108106 <do_IRQ+6a/a8>
Code; c0129e5a <__free_pages_ok+aa/29c>
00000000 <_EIP>:
Code; c0129e5a <__free_pages_ok+aa/29c> <=====
   0: 0f 0b ud2a <=====
Code; c0129e5c <__free_pages_ok+ac/29c>
   2: 83 c4 08 add $0x8,%esp
Code; c0129e5e <__free_pages_ok+ae/29c>
   5: 8b 46 18 mov 0x18(%esi),%eax
Code; c0129e62 <__free_pages_ok+b2/29c>
   8: a8 80 test $0x80,%al
Code; c0129e64 <__free_pages_ok+b4/29c>
   a: 74 11 je 1d <_EIP+0x1d> c0129e76 <__free_pages_ok+c6/29c>
Code; c0129e66 <__free_pages_ok+b6/29c>
   c: 6a 56 push $0x56
Code; c0129e68 <__free_pages_ok+b8/29c>
   e: 68 e7 0e 1c c0 push $0xc01c0ee7
Code; c0129e6c <__free_pages_ok+bc/29c>
  13: e8 00 00 00 00 call 18 <_EIP+0x18> c0129e72 <__free_pages_ok+c2/29c>

 <0>Kernel panic: Aiee, killing interrupt handler!
cpu: 0, clocks: 1002240, slice: 501120

2 warnings and 4 errors issued. Results may not be reliable.

The Oops seems to be triggered by the following actions:

(1) the application maps /dev/video0. bttv 0.8.x simply returns some
    shared anonymous memory to as mapping.
(2) the application asks the driver to capture a frame. bttv will lock
    down the anonymous memory using kiobufs for I/O and prepare
    everything for DMA xfer.
(3) The applications exits for some reason, i.e. the anonymous memory
    will be unmapped while the DMA transfer is active and the pages are
    locked down for I/O.
(4) The DMA xfer is done and bttv's irq handler cleans up everything.
    This includes calling unlock_kiovec+unmap_kiobuf for the locked
    pages. The unmap_kiobuf call at this point triggeres the Oops
    listed above ...

Anyone has a idea what is going wrong here?

  Gerd

-- 
Gerd Knorr <kraxel@bytesex.org>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Nov 15 2001 - 21:00:43 EST