Re: [patch] ramdisk blocksize

Bradley D. LaRonde (brad@ltc.com)
Sat, 21 Aug 1999 23:32:51 -0400


Posting this for Mike:

> Please test the ramdisk driver heavily with only this patch applyed
> against 2.3.15-pre1. Try to generate ramdisk images using the ramdisk
> itself. Make sure to always unmount before accessing the ramdisk via
> /dev/ram*. If you get in troubles give me a way to reproduce please ;).
> <patch snipped>

Both initrd and regular ramdisk usage appear to work with the patch applied,
but booting with initrd gives the following error (then goes on, apparently
working fine):

kernel BUG at buffer.c:696!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01236de>]
EFLAGS: 00010092
eax: 0000001c ebx: c7ceb060 ecx: c018dc68 edx: c018dc70
esi: c01a9a90 edi: 00000001 ebp: 00000000 esp: c009df7c
ds: 0018 es: 0018 ss: 0018
Process kflushd (pid: 3, stackpage=c009d000)
Stack: 000002b8 c0144230 c7ceb060 00000001 c01a9a90 ffffffff c018f38c
c014ee07
c01a9a90 00000001 c0182d9b 00000001 c014eee3 00000001 00000246
c018f38c
c01435f4 00000000 c0125fe6 c01b3190 00000f00 c0e9dfb4 c037d0cc
00000912
Call Trace: [<c0144230>] [<c014ee07>] [<c0182d9b>] [<c014eee3>] [<c01435f4>]
[<c0125fe6>] [<c01065b7>]
Code: 0f 0b 83 c4 0c c3 57 56 53 8b 7c 24 10 8b 4c 24 14 85 c9 74

ksymoops says:
EIP: c01236de <end_buffer_io_bad+42/48>
Trace: c0144230 <end_that_request_first+80/c4>
Trace: c014ee07 <end_request+17/34>
Trace: c0182d9b <head_vals.697+21af/35f4>
Trace: c014eee3 <rd_request+bf/cc>
Trace: c01435f4 <unplug_device+38/3c>
Trace: c0125fe6 <bdflush+19e/1fc>
Trace: c01065b7 <kernel_thread+23/30>
Code: c01236de <end_buffer_io_bad+42/48> 00000000 <_EIP>: <===
Code: c01236de <end_buffer_io_bad+42/48> 0: 0f 0b ud2a
<===
Code: c01236e0 <end_buffer_io_bad+44/48> 2: 83 c4 0c addl
$0xc,%esp
Code: c01236e3 <end_buffer_io_bad+47/48> 5: c3 ret
Code: c01236e4 <end_buffer_io_async+0/144> 6: 57
pushl %edi
Code: c01236e5 <end_buffer_io_async+1/144> 7: 56
pushl %esi
Code: c01236e6 <end_buffer_io_async+2/144> 8: 53
pushl %ebx
Code: c01236e7 <end_buffer_io_async+3/144> 9: 8b 7c 24 10 movl
0x10(%esp,1),%edi
Code: c01236eb <end_buffer_io_async+7/144> d: 8b 4c 24 14 movl
0x14(%esp,1),%ecx
Code: c01236ef <end_buffer_io_async+b/144> 11: 85 c9
testl %ecx,%ecx
Code: c01236f1 <end_buffer_io_async+d/144> 13: 74 00 je
c01236f3 <end_buffer_io_async+f/144>

The error message occurs about 20 seconds after the command shell comes up,
even if no commands are issued. The filesystem appears to be fine, both
before and after the message. Here are the details of what's booted:

Clean copy of 2.3.15-pre1 with only your second patch applied. Without the
patch, the kernel still dies with "No init found." Minimal options
configured, compiled without error. Booted with loadlin, using the SuSE
rescue disk image as initrd:
loadlin testkern root=/dev/ram initrd=rescue init=/bin/sh

gets the above behavior. Booting same kernel with hard drive as root:
loadlin testkern root=/dev/hdb2 init=/bin/sh

does not get the error. dd'ing an image to /dev/ram* and mounting it works
as expected, the files are there and no error. Creating a new filesystem on
/dev/ram*, mounting it, copying files to it, unmounting, and dd'ing from
/dev/ram* to create an image file appears to work properly, no error
message.

In both cases, I do get some odd kernel messages during boot in the ide
device detection:
hdm: probing with STATUS(0xa1) instead of ALTSTATUS(0xff)
hdm: probing with STATUS(0xa1) instead of ALTSTATUS(0xff)
hdm: no response (status = 0xa1), resetting drive
hdm: probing with STATUS(0xa1) instead of ALTSTATUS(0xff)
hdm: no response (status = 0xa1)

but that happens without the patch, too, so I'm assuming it's not related.
And no, I don't have 7 ide ports, so I don't know what it's looking at hdm
for.

Mike K.

> Ok, I just have a preliminary patch that try to fix the potential data
> corruption that can happens in 2.3.15-pre1 and previous 2.3.x kernels (and
> that will automagically fix the ramdisk driver without changing its
> internals).
>
> The corruption bug (that has nothing to do with the ramdisk driver) is the
> use of truncate_inode_page() to shrink the icache. If an inode is not
> in-use and it's hashed in the icache, it can have dirty or protected pages
> allocated in its page cache.
>
> So when we shrink the icache so we need to release also all the page-cache
> pages that belongs to such inode, we can't simply mark all the
> page-cache-overlapped-buffers as clean in flushpage. Otherwise we'll lose
> data-writes and this will lead to data corruption on disk.
>
> Previously (in 2.2.x) it was possible to use truncate_inode_pages without
> differences (both for shrink the icache and for truncate(2)/unlink), since
> the page cache was only there for reads, and both writes and protected
> buffers was placed in the buffer cache. This is not possible anymore since
> now the pagecache has dirty or protected data in it.
>
> NOTE: with my patch applyed the blockdevice writes are still not
> synchronized with filesystem writes (this avoids us having to hash in the
> buffer-hashtable the page-cache-overlapped-buffers). So if you read from
> the blockdevice layer you shouldn't expect to read the last uptodate data
> and if you write to the blockdevice layer your writes can be lost. So just
> choose if to use a blockdevice in raw mode or with a filesystem on the top
> of it, before start using it ;).
>
> But with the patch applyed it should be guaranteeed that if you unmount a
> ramdisk, and _then_ you read the ramdisk from the blockdevice layer,
> you'll read the right data (the page-cache will be correctly converted to
> regular buffers and not to orphaned-lost buffers).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/