Re: Stale NFS handles on 2.4.1

From: Jakob Østergaard (jakob@unthought.net)
Date: Tue Feb 13 2001 - 18:43:55 EST


On Tue, Feb 13, 2001 at 11:31:50PM +0000, Alan Cox wrote:
> > The NFS clients are getting
> > "Stale NFS handle"
> > messages every once in a while which will make a "touch somefile.o"
> > fail.
>
> If they have the previous .o handle cached and it was removed on another
> client thats quite reasonable behaviour. NFS isnt coherent

So a solution would be to
 <local node> rm -f output.o
 <remote node> g++ .... -o output.o
 <local node> touch output.o

I do the touch in the first place because a subsequent link job on
another remote node used to fail if I didn't. NFS caching magic I guess...

>
> > It's quite annoying and I didn't see it on 2.2 even after the NFS
> > patches were integrated.
>
> I wonder if its because 2.4 runs faster and caches better 8). You can
> tune the attribute cache times that may help. Are we talking 30 second
> intervals here or stuff being cached for far too long (which would imply a bug)

It runs faster indeed, and it makes the work more fun and makes the
internet go faster ! (uhhh... or maybe the internet speedup is because
of the Intel CPUs - I forgot...)

Usually how a compile goes is:

a make -j10 spawns concurrent compile jobs. Each job consists of
  "spawn on remote node" g++ ... -o somefile.o
  "on local node" touch somefile.o

After a truckload of .o files have been generated, it will start
up link jobs, on other remote nodes. I haven't tried this without
the touch trick for a long time.

Each g++ job takes from a few seconds to several minutes depending
on the file and optimization options etc. I think I mainly see this
with the fast jobs... Most .o files are ~1-4 MB and I have about 200
of them.

~200 compilers and ~12 linkers take about 4-5 minutes to complete on
the cluster of three dual machines and two-three single cpus. Producing
about 1.5G of object code in total (yes C++ symbols are HUGE).

So, the touch is _immediately_ after a compile completion. But the
.o file has not been in use on any other machine than the one running
the compiler for hours or at least many minutes (a different compile).

I'll try this without the touch trick and see how things fare...

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:23 EST