Re: Negative scalability by removal of lock_kernel()?(Was: Strange performance behavior of 2.4.0-test9)

From: Jeff V. Merkey (jmerkey@vger.timpanogas.org)
Date: Fri Oct 27 2000 - 03:17:50 EST

Next message: Jeff V. Merkey: "Re: test10-pre6"
Previous message: Matti Aarnio: "Re: 2.4.0-test9 + LFS"
In reply to: Alexander Viro: "Re: Negative scalability by removal of lock_kernel()?(Was: Strange performance behavior of 2.4.0-test9)"
Next in thread: kumon@flab.fujitsu.co.jp: "Re: Negative scalability by removal of lock_kernel()?(Was: Strange performance behavior of 2.4.0-test9)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Oct 27, 2000 at 03:13:33AM -0400, Alexander Viro wrote:
>
>
> On Fri, 27 Oct 2000, Jeff V. Merkey wrote:
>
> >
> > Linux has lots of n-sqared linear list searches all over the place, and
> > there's a ton of spots I've seen it go linear by doing fine grained
> > manipulation of lock_kernel() [like in BLOCK.C in NWFS for sending async
> > IO to ll_rw_block()]. I could see where there would be many spots
> > where playing with this would cause problems.
> >
> > 2.5 will be better.
>
> fs/locks.c is one hell of a sick puppy. Nothing new about that. I'm kinda
> curious about "n-squared" searches in other places, though - mind showing
> them?

I've noticed in 2.2.X if I hold lock_kernel() over multiple calls to
ll_rw_block() while feeding it chains of buffer heads > 100 at a time
(one would think this would make the elevator get better numbers by
feeding it a bunch of buffer heads at once rahter than feeding them in
one at a time), performance gets slower rather than faster. I did
some profiling, and ll_rw_block() and functions beneath it was using a
lot of cycles. I think there may be cases where the code that
does the merging and sorting is hitting (O)(N)**2 instead of (O)(N).
When I compare the profile numbers to the buffer head window in
between calls to ll_rw_block(), the relationship is not (O)(N).

By default I just hold lock_kernel() over the calls to ll_rw_block()
for each buffer head in 2.2.X.

2.4 does not need lock_kernel() to be called any longer since ll_rw_block()
is now multi-threaded. The overall model of intereaction has not changed
in 2.4 from 2.2 much. I just no longer have to provide the serialization
for ll_rw_block() because it's provided underneath now, but what I saw
indicates that as the request list gets full, some of the logic is
not performing at a relationship of (O)(N) # requests vs. utilization.

This would indicate somewhere down in that code, there's a spot where
we end up spinning sround a lot mucking with lists or something.

:-)

Jeff

>
> BTW, what spinlocks get contention in variant without BKL? And what about
> comparison between the BKL and non-BKL versions? If it's something like
> BKL no BKL
> 4-way 50 20
> 8-way 30 30
> - something is certainly wrong, but restoring the BKL is _not_ a win.
>
> I didn't look into recent changes in fs/locks.c, but I have quite problem
> inventing a scenario when _adding_ BKL (without reverting other changes)
> might give an absolute improvement. Well, I see a couple of really perverted
> scenarios, but... Seriously, folks, could you compare the 4 variants above
> and gather the contention data for the -test9 on your loads? That would help
> a lot.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff V. Merkey: "Re: test10-pre6"
Previous message: Matti Aarnio: "Re: 2.4.0-test9 + LFS"
In reply to: Alexander Viro: "Re: Negative scalability by removal of lock_kernel()?(Was: Strange performance behavior of 2.4.0-test9)"
Next in thread: kumon@flab.fujitsu.co.jp: "Re: Negative scalability by removal of lock_kernel()?(Was: Strange performance behavior of 2.4.0-test9)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:20 EST