Bug in 2.4 kernel?: UFS file read problem

From: Clay Claiborne (cjc@cosmoseng.com)
Date: Fri Jun 09 2000 - 01:23:20 EST

Next message: Kai Harrekilde-Petersen: "Re: [uPatch] Graceful failure?"
Previous message: Christoph Hellwig: "[PATCH] clean up Config.in files"
Next in thread: Alan Cox: "Re: Bug in 2.4 kernel?: UFS file read problem"
Reply: Alan Cox: "Re: Bug in 2.4 kernel?: UFS file read problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Perhaps you have seen my earlier postings on this problem. I have been
tasked with mount r/w DEC OSF partitions from a Linux box. The problem
that I ran into was the failure to be able to read files pass the direct
blocks (96K), and scrambled data on writes.

The good news is that I have solved my immediate problem.
The bad new is that I may have uncovered a bug in the 2.4.0 test kernel.

UFS support is available in the 2.2 kernel but I have been working with
the 2.3.99preX, and more recently the 2.4.0-test1 kernel because they
had the necessary support for the OSF partition type. I am neither a C
programmer or a kernel hacker, so this has hindered me in figuring out
what is going on At first I assumed that there was something weird about
the DEC OSF file system that the Linux implementation of ufs wasn't
getting. But there is nothing weird about the way DEC is doing things.
Its all very straight forward inode organization. The more I looked into
it, the more it seemed like the ufs code was just broken. Now I am
convinced it is.

I think I've got the problem narrowed down to the bh->b_data pointer in
fs/ufs/in.ode.c moving around when it shouldn't. I think there is a
"failure to communicate" between the ufs fs presentation of the inode
and the VFS reading of it and something is fouling the reading of the
1st indirect block. Then I see that fs/inode.c has been extensively
reworked in the 2.3.X series, so I wonder ...

I patch the 2.2.15 tree with support for the OSF partition (just dropped
the code in from 2.4.0-test1) and build a kernel. Voila, my problem
disappears! I can read a 97MB file right off the DEC OSF.

Conclusion: Something is broken that was working, and it may effect more
than the ufs filesystem. I will investigate further but some other
people might want to look at this.

On another subject I've also experienced 'unresolved symbol' problems
with the ne2k-pci module in 2.3.99preX and 2.4.0-test1 kernels.

I haven't copied extensive materials from my earlier correspondence on
the ufs problem below. I hope they can be useful in helping to sort this
out.

--
Clay J. Claiborne, Jr., President
Cosmos Engineering Company
1550 South Dunsmuir Ave.
Los Angeles, CA 90019
(323) 930-2540   (323) 930-1393 Fax
http:www.CosmosEng.com
Email: cjc@cosmoseng.com
====================================================
General Dynamics has given me a contract to build a Linux server that
can mount a Digital Unix SCSI II drive and make the files available to
Windows workstations for reading and deleting via samba. They have
provided me with three sample drives, all 9.1 GB Quantum Atlas II’s with
50 pin SCSI interface.
This is a commercial contract so there is money for you if you can help
me solve this Beyond that I think it would be good for Linux. . General
Dynamics is looking for Linux solutions because the government has
mandated it. I want to show them that Linux can deliver.
That being said, on to the technical details and problems.
My workstation is a basic Pentium III w’ 128MB Ram & 18GB LVD system
drive. It is running RH6.2 and for this job I am working with the
2.3.99pre8 kernel.
The sample drives appear to have the OSF partition type and the UFS file
system. (DEC OSF/1 ?). With  CONFIG_UFS_FS, CONFIG_UFS_FS_WRITE and
CONFIG_OSF_PARTITION set in the kernel,
I can mount the disk with a command like:
 mount –t ufs –o ufstype=sun /dev/sdb3    /du1
Then I can read the dir & inode table (ls –l) and can read short files
okay, but long files get truncated at 98304 bytes. ( 192 * 512 = 98304
).  This has been the case on two different drives and two different
files, and are the only samples I have that are longer than 98304.
The command:
 cp –v /du2/oilstock .
returns the error:
cp: /du2/oilstock.tar: Input/output error
and in /var/log/messages:
May 30 05:43:46 GD-DU kernel: attempt to access beyond end of device
May 30 05:43:46 GD-DU kernel: 08:23: rw=0, want=536934401, limit=8890760
May 30 05:43:46 GD-DU kernel: attempt to access beyond end of device
May 30 05:43:46 GD-DU kernel: 08:23: rw=0, want=536934402, limit=8890760
..
Those ‘want’ numbers look way out of line to me. Is something getting
screwed up in the way the block numbers are being read?
I can remount the drive rw. This doesn’t change the above read problem.
However I can write to the ufs drive, and write and read back long
files, like my sample linux-2.3.99pre8.tar.gz (20MB)
So I have a problem reading long files that are native (but not long
files written by Linux).
Looking at ufs_fs.h it appears  UFS_BLOCK_SIZE = 8192. What does
UFS_NDADDR = 12 mean? Does it mean 12 block addresses directly in
the inode because 12 * 8192 = 98304, which allows the rather tidy
answer that the system is getting the direct addressing right but
mis-reading the indirect addresses written on the DEC system, while
reading its own writing of those addresses fine.
Could this problem be related to the fact that the fs and files were
written on a 64 bit system (Alpha, I assume) were as I am working on a
32 bit system (i386)? If that is the case how do I fix it?
Any light you can throw on this problem will be much appreciated.
--
I added the following observations to my earlier letter and posted it to
linux-kernel:
Does this sound like I'm on the right track?
Looking at ufs_fs.h it appears  UFS_BLOCK_SIZE = 8192. What does
UFS_NDADDR = 12 mean? Does it mean 12 block addresses directly in
the inode because 12 * 8192 = 98304, which allows the rather tidy
answer that the system is getting the direct addressing right but
mis-reading the indirect addresses written on the DEC system, while
reading its own writing of those addresses fine.
Could this problem be related to the fact that the fs and files were
written on a 64 bit system (Alpha, I assume) where as I am working on a
32 bit system (i386)? If that is the case how do I fix it?
Clay
-----
Thanks again. I did a grep of the source tree for UFS_NDADDR  and turned
up inode.c. 12 block
addresses are listed in the inode, with 3 indirect addresses. So I
figure I'm on the right track
too. The indirect addressing is being mishandled some home like "bytes
swapped filesystems" or
maybe a signed - insigned integer problem?
Next clue to investigate:
May 30 05:43:46 GD-DU kernel: attempt to access beyond end of device
May 30 05:43:46 GD-DU kernel: 08:23: rw=0, want=536934401, limit=8890760
May 30 05:43:46 GD-DU kernel: attempt to access beyond end of device
May 30 05:43:46 GD-DU kernel: 08:23: rw=0, want=536934402, limit=8890760
Those numbers 536934401, 536934402, etc Obviously they don't refer to
any blocks on this disk,
but there do they come from? I think that if I can understand how they
were generated, I will
know exactly what the problem is.
536934401 =
2000 F801 hex
100000000000001111100000000001
1308676867 =
4E00 D303 hex
1001110000000001101001100000011
Clay
---------------
More info on my problem:
Upon further study my original declaration that I could write files to
the ufs
partition and read them back was in error. It is an artifact of some
level of
buffering in ram because such correct read back does not survive a
reboot.
i.e.:
       cp linux-2.3.99pre8.tar.gz  /du2
       cp -v /du2/linux-2.3.99-pre8.tar.gz temp1.tgz
       tar -tzvf temp1.tgz   - okay
reboot
      cp -v /du2/linux-2.3.99-pre8.tar.gz temp2.tgz
       tar -tzvf temp2.tgz   - NOT OKAY
       tar -tzvf temp1.tgz   - okay
In fact it appears that a file written to the ufs and then read back
will have
the first 2KB truncated - which is to say that the first direct addr in
the
inode points to a block of data that was 2KB down in the original file,
and
only the first 1KB of each 8KB block contains valid data, the other 7KB
is
nulls.
How is a proper read back possible before a reboot? This is a 20MB file.
Is it
possible for a file the large to remain in cache on a 128MB machine?  Or
is it
just a proper inode table that is being cached?
I've developed a technique for isolating a specific inode on the raw
partition, and this has helped considerably. I realized that since I
could
write to the partition, that I could make changes and look for the
change.
I know that the uid is 4th from the start of the inode, so i change the
files
owner and look for the change. The commands look like this:
    od -w4  -Ax -x -j 0 -N 10000 /dev/sdc3 >DU2-help-root
    chown xfs /du2/Configure.help
    sync
    od -w4  -Ax -x -j 0 -N 10000 /dev/sdc3 >DU2-help-xfs
    diff  DU2-help-root DU2-help-xfs
>From the output of the diff I determine the inodes offset and then:
       od -w4  -Ax -x -j 33792 -N 128 /dev/sdc3|less
Here's my inode. Up until now I only knew inodes as elusive structures
that
lived on the disk somewhere. I knew vaguely that "the inode is the file"
as
SUN might say, but I never knew quite what they did or how they worked.
Well
all that is changing now. Now I'm printing inodes out, saving them to
files
(I'll note here that if you save all your inodes to files you won't have
room
for anything else) and coloring them in.
Anyway, from a study of its inode, this is what I've discovered about
the big
file - oilstock.tar that won't copy more that 96K:
1) Fortunately the first 12 blocks are sequential. These are the direct
address blocks. If the offset of the first block is 800 then
dd if=/dev/sdc3 of=crudeoil.tar bs=1k count=96 skip=800
will create a file that is identical to one produced by
cp /du2/oilstock.tar .
2.) The 13th addr in the inode is the 1st indirect block. I can go to
that
block by using that address as a simple 32 bit offset of 1K blocks from
the
beginning of the dev. I can take the first 32 bit address I find there
and use
it as the offset of 1K blocks form the beginning of dev and pick up my
data
were it left off. All very simple straight forward such. No byte swaps,
no
shifts, no new math. Fortunately also oilstock.tar is a ball of text
files so
its easy to see if the puzzle pieces are all there and in the right
order.
So now I can read more of the unreadable file, atleast with dd and od.
My problem is that I can't read the source code well enough to
understand what
Linux is doing and where it is getting it wrong.
This where I need your help
--------------------------------------
I modified fs/ufs/inode.c as follows to print out some variables - this
section starts around line 80 _
  #define ufs_inode_bmap(inode, nr) \
          (SWAB32((inode)->u.ufs_i.i_u1.i_data[(nr) \
           >> uspi->s_fpbshift]) + ((nr) & uspi->s_fpbmask))
  static inline unsigned int ufs_block_bmap (struct buffer_head * bh,
unsigned nr,
          struct ufs_sb_private_info * uspi, unsigned swab)
  {
          unsigned int tmp, d1, d2, d3, d4;
          UFSD(("ENTER, nr %u\n", nr))
          if (!bh)
                  return 0;
          tmp = SWAB32(((u32 *) bh->b_data)[nr >> uspi->s_fpbshift]) \
  + (nr & uspi->s_fpbmask);
          d1 = SWAB32(((u32 *) bh->b_data)[0 >> uspi->s_fpbshift]) \
  + (0 & uspi->s_fpbmask);
          d2 = SWAB32(((u32 *) bh->b_data)[8 >> uspi->s_fpbshift]) \
  + (8 & uspi->s_fpbmask);
          d3 = SWAB32(((u32 *) bh->b_data)[16 >> uspi->s_fpbshift]) \
  + (16 & uspi->s_fpbmask);
          d4 = SWAB32(((u32 *) bh->b_data)[ 24 >> uspi->s_fpbshift]) \
  + (24 & uspi->s_fpbmask);
          printk("bh->b_data = %d \n", bh->b_data);
          printk("d1=0  %d d2=8 %d d3=16 %d d4=24 %d \n", d1, d2, d3,
d4);
          printk("  s_fpbshift %d  _fpbmask %u \n", \
   SWAB32(uspi->s_fpbshift), SWAB32(uspi->s_fpbmask));
          UFSD(("EXIT, result %u\n", tmp))
          brelse (bh);
          return tmp;
  }
This is what I got trying to read oilstocks.tar:
  Jun  8 02:20:59 GD-DU kernel: ino 5  mode 0100644  nlink 1  uid 0  gid
15  size 91555840 blocks 0
  Jun  8 02:20:59 GD-DU kernel:   db <800 808 816 824 832 840 848 856
864 872 880 888>
  Jun  8 02:20:59 GD-DU kernel:   gen 951759770 ib <24328 48648 0>
  Jun  8 02:20:59 GD-DU kernel: (inode.c, 686), ufs_read_inode: EXIT
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 92), ufs_block_bmap: ENTER, nr
0
  Jun  8 02:21:10 GD-DU kernel: bh->b_data = -1054302208
  Jun  8 02:21:10 GD-DU kernel: d1=0  24320 d2=8 24400 d3=16 24408 d4=24
24344  <- these are the first four indirect blocks for the file!
  Jun  8 02:21:10 GD-DU kernel:   s_fpbshift 3  _fpbmask 7
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 113), ufs_block_bmap: EXIT,
result 24320
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 92), ufs_block_bmap: ENTER, nr
2040
  Jun  8 02:21:10 GD-DU kernel: bh->b_data = -1054301184
  Jun  8 02:21:10 GD-DU kernel: d1=0  39440 d2=8 39448 d3=16 39456 d4=24
39464
  Jun  8 02:21:10 GD-DU kernel:   s_fpbshift 3  _fpbmask 7
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 113), ufs_block_bmap: EXIT,
result 41480
  Jun  8 02:21:10 GD-DU kernel: TER, nr 2040
  Jun  8 02:21:10 GD-DU kernel: bh->b_data = -1054301184
  Jun  8 02:21:10 GD-DU kernel: d1=0  39440 d2=8 39448 d3=16 39456 d4=24
39464
  Jun  8 02:21:10 GD-DU kernel:   s_fpbshift 3  _fpbmask 7
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 113), ufs_block_bmap: EXIT,
result 41480
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 92), ufs_block_bmap: ENTER, nr
78
  Jun  8 02:21:10 GD-DU kernel: bh->b_data = -1054300160
  Jun  8 02:21:10 GD-DU kernel: d1=0  1308676863 d2=8 654368511 d3=16
1090560511 d4=24 956342015
  Jun  8 02:21:10 GD-DU kernel:   s_fpbshift 3  _fpbmask 7
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 113), ufs_block_bmap: EXIT,
result 352322054
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 92), ufs_block_bmap: ENTER, nr
79
  Jun  8 02:21:10 GD-DU kernel: bh->b_data = -1054302208
  Jun  8 02:21:10 GD-DU kernel: d1=0  24320 d2=8 24400 d3=16 24408 d4=24
24344
  Jun  8 02:21:10 GD-DU kernel:   s_fpbshift 3  _fpbmask 7
  Jun  8 02:21:10 GD-DU kernel: (inode.c, 113), ufs_block_bmap: EXIT,
result 24399
  Jun  8 02:21:10 GD-DU
Why does bh->b_data change? Shouldn't it stay the same?
Clay
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Kai Harrekilde-Petersen: "Re: [uPatch] Graceful failure?"
Previous message: Christoph Hellwig: "[PATCH] clean up Config.in files"
Next in thread: Alan Cox: "Re: Bug in 2.4 kernel?: UFS file read problem"
Reply: Alan Cox: "Re: Bug in 2.4 kernel?: UFS file read problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:17 EST