Re: [Lse-tech] Re: ext3 performance bottleneck as the number of spindles gets large

From: Dave Hansen (haveblue@us.ibm.com)
Date: Wed Jun 19 2002 - 23:09:07 EST


Andrew Morton wrote:
> mgross wrote:
>>Has anyone done any work looking into the I/O scaling of Linux / ext3 per
>>spindle or per adapter? We would like to compare notes.
>
> No. ext3 scalability is very poor, I'm afraid. The fs really wasn't
> up and running until kernel 2.4.5 and we just didn't have time to
> address that issue.

Ick. That takes the prize for the highest BKL contention I've ever
seen, except for some horribly contrived torture tests of mine. I've
had data like this sent to me a few times to analyze and the only
thing I've been able to suggest up to this point is not to use ext3.

>>I've only just started to look at the ext3 code but it seems to me that replacing the
>>BKL with a per - ext3 file system lock could remove some of the contention thats
>>getting measured. What data are the BKL protecting in these ext3 functions? Could a
>>lock per FS approach work?
>
> The vague plan there is to replace lock_kernel with lock_journal
> where appropriate. But ext3 scalability work of this nature
> will be targetted at the 2.5 kernel, most probably.

I really doubt that dropping in lock_journal will help this case very
much. Every single kernel_flag entry in the lockmeter output where
Util > 0.00% is caused by ext3. The schedule entry is probably caused
by something in ext3 grabbing BKL, getting scheduled out for some
reason, then having it implicitly released in schedule(). The
schedule() contention comes from the reacquire_kernel_lock().

We used to see plenty of ext2 BKL contention, but Al Viro did a good
job fixing that early in 2.5 using a per-inode rwlock. I think that
this is the required level of lock granularity, another global lock
just won't cut it.
http://lse.sourceforge.net/lockhier/bkl_rollup.html#getblock

-- 
Dave Hansen
haveblue@us.ibm.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jun 23 2002 - 22:00:21 EST