Re: [performance bug] kernel building regression on 64 LCPUsmachine

From: Alex,Shi
Date: Sun Feb 13 2011 - 21:25:49 EST


On Sun, 2011-02-13 at 02:25 +0800, Corrado Zoccolo wrote:
> On Sat, Feb 12, 2011 at 10:21 AM, Alex,Shi <alex.shi@xxxxxxxxx> wrote:
> > On Wed, 2011-01-26 at 16:15 +0800, Li, Shaohua wrote:
> >> On Thu, Jan 20, 2011 at 11:16:56PM +0800, Vivek Goyal wrote:
> >> > On Wed, Jan 19, 2011 at 10:03:26AM +0800, Shaohua Li wrote:
> >> > > add Jan and Theodore to the loop.
> >> > >
> >> > > On Wed, 2011-01-19 at 09:55 +0800, Shi, Alex wrote:
> >> > > > Shaohua and I tested kernel building performance on latest kernel. and
> >> > > > found it is drop about 15% on our 64 LCPUs NHM-EX machine on ext4 file
> >> > > > system. We find this performance dropping is due to commit
> >> > > > 749ef9f8423054e326f. If we revert this patch or just change the
> >> > > > WRITE_SYNC back to WRITE in jbd2/commit.c file. the performance can be
> >> > > > recovered.
> >> > > >
> >> > > > iostat report show with the commit, read request merge number increased
> >> > > > and write request merge dropped. The total request size increased and
> >> > > > queue length dropped. So we tested another patch: only change WRITE_SYNC
> >> > > > to WRITE_SYNC_PLUG in jbd2/commit.c, but nothing effected.
> >> > > since WRITE_SYNC_PLUG doesn't work, this isn't a simple no-write-merge issue.
> >> > >
> >> >
> >> > Yep, it does sound like reduce write merging. But moving journal commits
> >> > back to WRITE, then fsync performance will drop as there will be idling
> >> > introduced between fsync thread and journalling thread. So that does
> >> > not sound like a good idea either.
> >> >
> >> > Secondly, in presence of mixed workload (some other sync read happening)
> >> > WRITES can get less bandwidth and sync workload much more. So by
> >> > marking journal commits as WRITES you might increase the delay there
> >> > in completion in presence of other sync workload.
> >> >
> >> > So Jan Kara's approach makes sense that if somebody is waiting on
> >> > commit then make it WRITE_SYNC otherwise make it WRITE. Not sure why
> >> > did it not work for you. Is it possible to run some traces and do
> >> > more debugging that figure out what's happening.
> >> Sorry for the long delay.
> >>
> >> Looks fedora enables ccache by default. While our kbuild test is on ext4 disk
> >> but rootfs is on ext3 where ccache cache files live. Jan's patch only covers
> >> ext4, maybe this is the reason.
> >> I changed jbd to use WRITE for journal_commit_transaction. With the change and
> >> Jan's patch, the test seems fine.
> > Let me clarify the bug situation again.
> > With the following scenarios, the regression is clear.
> > 1, ccache_dir setup at rootfs that format is ext3 on /dev/sda1; 2,
> > kbuild on /dev/sdb1 with ext4.
> > but if we disable the ccache, only do kbuild on sdb1 with ext4. There is
> > no regressions whenever with or without Jan's patch.
> > So, problem focus on the ccache scenario, (from fedora 11, ccache is
> > default setting).
> >
> > If we compare the vmstat output with or without ccache, there is too
> > many write when ccache enabled. According the result, it should to do
> > some tunning on ext3 fs.
> Is ext3 configured with data ordered or writeback?

The ext3 on sda and ext4 on sdb are both used 'ordered' mounting mode.

> I think ccache might be performing fsyncs, and this is a bad workload
> for ext3, especially in ordered mode.
> It might be that my patch introduced a regression in ext3 fsync
> performance, but I don't understand how reverting only the change in
> jbd2 (that is the ext4 specific journaling daemon) could restore it.
> The two partitions are on different disks, so each one should be
> isolated from the I/O perspective (do they share a single
> controller?).

No, sda/sdb use separated controller.

> The only interaction I see happens at the VM level,
> since changing performance of any of the two changes the rate at which
> pages can be cleaned.
>
> Corrado
> >
> >
> > vmstat average output per 10 seconds, without ccache
> > procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
> > r b swpd free buff cache si so bi bo in cs us sy id wa st
> > 26.8 0.5 0.0 63930192.3 9677.0 96544.9 0.0 0.0 2486.9 337.9 17729.9 4496.4 17.5 2.5 79.8 0.2 0.0
> >
> > vmstat average output per 10 seconds, with ccache
> > procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
> > r b swpd free buff cache si so bi bo in cs us sy id wa st
> > 2.4 40.7 0.0 64316231.0 17260.6 119533.8 0.0 0.0 2477.6 1493.1 8606.4 3565.2 2.5 1.1 83.0 13.5 0.0
> >
> >
> >>
> >> Jan,
> >> can you send a patch with similar change for ext3? So we can do more tests.
> >>
> >> Thanks,
> >> Shaohua
> >
> >
> >
> >
>
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/