If the last file in a directory (not the sorted ls-type directory; rather,
the actual directory list you get with opendir/readdir or find -maxdepth 1)
has a filename length of exactly 21 characters, smbfs returns an empty list.
If the last filename has 22 characters or more in the name, a varied number
of garbage entries are returned AFTER the proper directory listing.
I'm using a Samba 1.9.14 server running under Linux 1.2.13 and/or 1.3.25
and/or 1.3.32.
The key thing here is that this only happens if the LAST file in the
directory has a name 21 characters or longer; other files can have names of
varying lengths without causing any problems.
If anyone can help with this, please do! The bug seems incredibly elusive.
Here's what I've tried so far...
===
Making virtually any change I can think of will fix the problem! (or at least
hide it)
As soon as I turned on debug messages (#define DEBUG_SMB 1 in
include/linux/smb.h), it started working perfectly. My guess was that one
of the DPRINTK's was also calling some function that was changing a
variable... or something.
However, I managed to narrow things down to the following code in proc.c:
if (total_count < fpos) {
p = smb_decode_long_dirent(p, NULL,
info_level);
DPRINTK("smb_proc_readdir: skipped entry.\n");
DDPRINTK(" total_count = %d\n"
" i = %d, fpos = %d\n",
total_count, i, fpos);
}
With debug messages DISABLED, the problem of course persisted. However, if
I changed the single "DPRINTK" above (which is a macro expanded to nothing
at all without debug messages enabled) to a "printk", everything magically
works.
Similarly if I change the DDPRINTK (for debug level 2) to printk. Or change
both of them. Changing other DPRINTK's in proc.c does not have this effect!
What's worse is that even with this debug message enabled, IT IS NEVER
PRINTED IN MY TEST CASE (a simple program that uses opendir/readdir to print
the exact contents of a directory). This leads me to believe that the code
inside the above "if" statement is never executed at all - and if this is
so, why should adding a printk there change anything?
The problem with this, of course, is that it's very hard to track down the
bug since inserting a printk in the critical location to help debugging will
fix the problem!
At this point I tried enabling all debug messages in all smbfs source files
EXCEPT for the above two. The stupid thing worked perfectly again! I gave
up this avenue of testing in frustration.
Odd. So, I tried turning down the compiler optimization. I found that
changing -O2 to -O in the gcc line while compiling proc.c (and ONLY proc.c;
all others still use -O2) also fixes the problem!
I tried making 'p' volatile, in order to keep it from becoming a register
variable if that was the problem; this caused lots of "discards volatile"
warnings that I didn't bother to fix; but the 21-character bug was still
there.
All I can assume is that gcc is somehow miscompiling that function with
optimizations enabled, or that something else weird is happening. Either
that, or an awfully strange data corruption bug has worked its way into
smbfs. I'm not an expert on kernel hacking or especially smbfs, so I can't
really say for sure.
Since this is looking more and more like a compiler bug, I should mention
that I use gcc 2.7.0, binutils 2.5.2l.17, both the HJL compiled versions on
sunsite, and compile the kernel as ELF.
I would look at the assembly output of gcc but unfortunately I don't
understand "GNU-style" assembly language very well. I still don't quite
understand why we can't use the Intel-standard stuff like in DOS.
I'm forwarding this to linux-kernel and linux-gcc as well as the smbfs
maintainer (who says he's rather busy at the moment and can't help much) to
see if anyone can help.
My apologies if this is something simple; but I wanted to email about it
before I forgot all the gory details :)
Thanks in advance for any help...
Avery