Hi,
I've compiled and installed linux v2.2.15pre13 using egcs v1.0.3
and libc6 v2.0.7 with make v3.77. I have a Dell optiplex 333mhz
with 128megs and a 2940U2W SCSI card with AIC7895 and a 9.1Gig
Western digital disk drive. I have applied all the aic7xxx patches
up to v5.1.28.
Here are the following problems I'm encountering
on a UMSDOS filesystem. I don't know if the ext2 filesystem will have the
same problem. BTW, this occurs whether SMP, raid0-5, and dquota is
selected or not, although I only have a uniprocessor system.
PROBLEM #1:
a). when I do a "ls -l /DOS" I get an "Umsdos: Oops releasing
pseudo root, Notify .... " message due
to the pseudo root inode equaling the real root. If "ls -d /DOS"
or "ls -lt" is done in "/", there is no Oops since it
doesn't look inside the /DOS subdirectory.
This wasn't a problem in linux 2.2.1 kernels.
Kernels 2.2.2-12 I don't know about since I had problems running
these.
Kernels 2.2.13, 2.2.14, 2.2.15pre3/pre5/pre7/pre9/pre10/pre13 have
this Oops problem.
I'm not sure if ext2 which may have a /DOS on it, if it has
this problem too. Can somebody verify this?
Partial Solution:
To track this down, I started putting printk(KERN_INFO...
messages into every UMSDOS function call upon entry into
the function. When I ran this, the Oops disappeared!!!!
Thus there is a timing problem somewhere in the filesystem
or VFS filesystem/kernel. There has been very few
changes to the UMSDOS filesystem since kernel 2.2.1, thus
the UMSDOS filesystem shouldn't be at fault.
I tried narrowing down the search by reducing the number
of printk(KERN_INFO.... in each file, but it's
indeterminate as to which one eliminates the problem.
b). Another problem that hampers finding item (a) above, is
that when printk's try to record via klogd and syslog.conf
to /var/log/kernel, not all printk's are recorded. Some
printk's will have a partial message written or a
printk will overwrite the previous printk in the middle
of the first printk message.
Thus, there seems to be a race condition whereby the message
isn't completely "flushed" to the output file before the
next one comes in. If klogd isn't started and there is
alot of printk's being queued up, waiting to be output, the
kernel will hang. This is true on bootup if tons of printk's
are output for recording vs. just a few. If klogd is started,
and there are alot of printk statements in the code for
tracing purposes to see what the code is doing, the various
printks may not even be recorded at all. This was the
case when I put alot of printk's in the dcache.c file and at
the same time in the functions for the UMSDOS filesystem
recording which functions were entered, and when files where
unlinked or rmdir'd.
Somebody suggested that I send klogd output to another
Unix machine which is stable and thus there wouldn't be
any race condition to the file on the disk - assuming
the kernel itself didn't lose it. Although this is
a good idea, the SA's I work with won't let me redirect
the 500k output file to the Sun machines to verify this
assumption. :(
I tried the following to bypass syslogd but it only
made it worse. :( :(
Does a system call instead of reading from /proc/kmsg
and writes directly to the file bypassing syslogd.
klogd -p -s -f /var/log/kernel
Doesn't do a system call but writes /proc/kmsg
directly to a file bypassing syslogd.
klogd -p -s
Does a system call instead of reading /proc/kmsg
and write via syslogd. This records a little more
but "pg kernel" thinks it's a binary file???
This uses syslog.
klogd -p -f /var/log/kernel
No expected change although this wasn't tried.
klogd -p
This one seems to write the MOST information but
still drops alot and uses syslog
NOTE: my syslog.conf has this in it
kern.* /var/log/kernel
PROBLEM #2:
c). Another race condition occurs (and which item(b) applies),
is when the "make mrproper" is run after doing a kernel build.
This is the only place it has occurred reliably for me and
where I have noticed the problem. The problem is that
I get a "Device or resource Busy" error message due to -EBUSY being
returned in file namei.c function UMSDOS_rmdir. The
"make mrproper" gets to "rm -rf include/config" and then
gives the following errors,
rm: cannot remove directory `include/config/bsd/process': Device or resource busy
rm: cannot remove directory `include/config/bsd': Device or resource busy
rm: cannot remove directory `include/config/blk/cpq': Device or resource busy
rm: cannot remove directory `include/config/blk': Device or resource busy
rm: cannot remove directory `include/config/paride': Device or resource busy
rm: cannot remove directory `include/config/chr/dev': Device or resource busy
rm: cannot remove directory `include/config/chr': Device or resource busy
rm: cannot remove directory `include/config/scsi/generic': Device or resource busy
rm: cannot remove directory `include/config/scsi': Device or resource busy
rm: cannot remove directory `include/config/ms': Device or resource busy
rm: cannot remove directory `include/config/vga': Device or resource busy
rm: cannot remove directory `include/config/mda': Device or resource busy
rm: cannot remove directory `include/config/sound': Device or resource busy
rm: cannot remove directory `include/config': Device or resource busy
make: *** [mrproper] Error 1
Now if "make mrproper" is run a second time, there is no
problem. There is no one in any of the subdirectories
which may cause this too. If upon the first "make mrproper"
one goes to these directories to see what's in them, they
are all empty since the "rm" statement removed the files.
Somehow, the dentry tree inodes *aren't* equal.
Since the function "list_empty()" is returning FALSE,
the above -EBUSY is returned. This would then point
to the dentry in /usr/src/linux/fs in the file
dcache.c. This code has changed alot and I'm running
into the same problem as identified in item (b) -
missing printks when they are inserted here.
If "rm -rf include/config" is run manually, there is
no problem. However, when run within "make" using
"make mrproper" it gives these errors. Now if the
Makefile is modified so that only this one "rm" is
done and the rest commented out, it still has this
error. Thus, the other rm's aren't influencing it
or causing a dependency.
When I try to specify a printk when "pruning" is
occuring. I do see that dentry prunes seem to be
occuring before the actual unlink of the directory:
after that particular subdirectory has been emptied out.
But, in some cases the pruning may be delayed till later
and they may prune a couple of subdirs all at once.
I'm not sure under what circumstance one or the other
is selected - the immediate one or the delayed one.
I've also noticed that some subdirs don't get pruned
nor unlinked although they are deleted from the filesystem.
I attribute this to skipping/losage of printk's being
output to the /var/log/kernel due to klogd being slow
or the kernel losing track of these messages, particularily
since the UMSDOS filesystem executes the same path for
all files/directories being removed.
QUESTION #1:
I've noticed that when files get unlinked in a directory the
directory the file is in is locked then unlocked; however,
when it comes time to rmdir the subdirectory name (after being
emptied) the parent diretory isn't locked/unlocked. Should it be???
UMSDOS filesystem maintainer QUESTION here,
Another, possible problem is that UMSDOS uses the EMD file
which is named --linux-.---. When it comes time to remove the
subdir, it also removes this file. However, the subdir isn't
locked whils't the EMD file is being removed, then the subdir
is unlocked after the parent has been locked so that the subdir
name can then be removed. This should prevent other processes
from putting files into the subdir while at the same time
the EMD file and subdir name is being removed. Possible
filesystem corruption problem here I think!!!
When symbolic/hardlinks are performed, then a semphore lock
using the function down() / up() is used instead of directory
lock/unlock. Is this the only scenario this is used for????
It's unclear as to when semaphores or createlock should be used....
Perhaps it's because a file/directory can be linked whils't
at the same time another process it trying to remove the file/subdir
and thus one needs two different forms of locking to tell them
apart. I assume the order above can be reversed and thus the
locking method(s) - whether by semaphore or createlock - are
considered equal in the way they are used so that one doesn't
have precedence over the other.
The up() and down() semaphore locks are assembly code and I
can't tell what's listening to the semaphore flag.
The sleep_on() and wake_up() semaphore locks too. There is
another one called wake_up_interruptible() which the filesystem
doesn't use. When to use this one properly vs. the other?
The same applies to interruptible_sleep_on() too.
I seem to not be able to determine if this is assembly code also.
I did fine ./kernel/sched.c which contains these functions. I
assume that when called they loop till they are done before
returning so that the calling routine (in the case the filesystem)
doesn't have to loop checking a returned flag!
So what is the basic difference between the two?
Anyway I'm out of my league at those lower levels.
QUESTION #2:
Are there any tricks/techniques to isolate the above problems
that may be faster or easier to do?
Note: the UMSDOS maintainer has been notified of the above problems
but there is nothing he can do, since it all seems to be pointing
towards the VFS/kernel code IMHO.
Thanks for any help :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:16 EST