Re: writeout stalls in current -git

From: Torsten Kaiser
Date: Tue Nov 06 2007 - 02:10:36 EST


On 11/6/07, David Chinner <dgc@xxxxxxx> wrote:
> On Mon, Nov 05, 2007 at 07:27:16PM +0100, Torsten Kaiser wrote:
> > On 11/5/07, David Chinner <dgc@xxxxxxx> wrote:
> > > Ok, so it's probably a side effect of the writeback changes.
> > >
> > > Attached are two patches (two because one was in a separate patchset as
> > > a standalone change) that should prevent async writeback from blocking
> > > on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first.
> > > Can you see if this fixes the problem?
> >
> > Now testing v2.6.24-rc1-650-gb55d1b1+ the fix for the missapplied raid5-patch
> > Applying your two patches ontop of that does not fix the stalls.
>
> So you are having RAID5 problems as well?

The first 2.6.24-rc1-git-kernel that I patched with your patches did
not boot for me. (Oops send in one of my previous mails) But given
that the stacktrace was not xfs related and I had seen this patch on
the lkml, I tried to fix this Oops this way.
I did not have troubles with the RAID5 otherwise.

> I'm struggling to understand what possible changed in XFS or writeback that
> would lead to stalls like this, esp. as you appear to be removing files when
> the stalls occur. Rather than vmstat, can you use something like iostat to
> show how busy your disks are? i.e. are we seeing RMW cycles in the raid5 or
> some such issue.

Will do this this evening.

> OOC, what is the 'xfs_info <mtpt>' output for your filesystem?

meta-data=/dev/mapper/root isize=256 agcount=32, agsize=4731132 blks
= sectsz=512 attr=1
data = bsize=4096 blocks=151396224, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0


> > vmstat 10 output from unmerging (uninstalling) a kernel:
> > 1 0 0 3512188 332 192644 0 0 185 12 368 735 10 3 85 1
> > -> emerge starts to remove the kernel source files
> > 3 0 0 3506624 332 192836 0 0 15 9825 2458 8307 7 12 81 0
> > 0 0 0 3507212 332 192836 0 0 0 554 630 1233 0 1 99 0
> > 0 0 0 3507292 332 192836 0 0 0 537 580 1328 0 1 99 0
> > 0 0 0 3507168 332 192836 0 0 0 633 626 1380 0 1 99 0
> > 0 0 0 3507116 332 192836 0 0 0 1510 768 2030 1 2 97 0
> > 0 0 0 3507596 332 192836 0 0 0 524 540 1544 0 0 99 0
> > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 0 0 0 3507540 332 192836 0 0 0 489 551 1293 0 0 99 0
> > 0 0 0 3507528 332 192836 0 0 0 527 510 1432 1 1 99 0
> > 0 0 0 3508052 332 192840 0 0 0 2088 910 2964 2 3 95 0
> > 0 0 0 3507888 332 192840 0 0 0 442 565 1383 1 1 99 0
> > 0 0 0 3508704 332 192840 0 0 0 497 529 1479 0 0 99 0
> > 0 0 0 3508704 332 192840 0 0 0 594 595 1458 0 0 99 0
> > 0 0 0 3511492 332 192840 0 0 0 2381 1028 2941 2 3 95 0
> > 0 0 0 3510684 332 192840 0 0 0 699 600 1390 0 0 99 0
> > 0 0 0 3511636 332 192840 0 0 0 741 661 1641 0 0 100 0
> > 0 0 0 3524020 332 192840 0 0 0 2452 1080 3910 2 3 95 0
> > 0 0 0 3524040 332 192844 0 0 0 530 617 1297 0 0 99 0
> > 0 0 0 3524128 332 192844 0 0 0 812 674 1667 0 1 99 0
> > 0 0 0 3527000 332 193672 0 0 339 721 754 1681 3 2 93 1
> > -> emerge is finished, no dirty or writeback data in /proc/meminfo
>
> At this point, can you run a "sync" and see how long that takes to
> complete?

Already tried that: http://lkml.org/lkml/2007/11/2/178
See the logs from the second unmerge in the second half of the mail.

The sync did not stop this writeout, but returned immediately.

> The only thing I can think that woul dbe written out after
> this point is inodes, but even then it seems to go on for a long,
> long time and it really doesn't seem like XFS is holding up the
> inode writes.

Yes, I completly agree that this is much to long. Thats why I included
the after-emerge-finished parts of the logs. But I still partly
suspect xfs, because the xfssyncd shows up when I hip SysRq+W.

> Another option is to use blktrace/blkparse to determine which process is
> issuing this I/O.
>
> > 0 0 0 3583780 332 195060 0 0 0 494 555 1080 0 1 99 0
> > 0 0 0 3584352 332 195060 0 0 0 99 347 559 0 0 99 0
> > 0 0 0 3585232 332 195060 0 0 0 11 301 621 0 0 99 0
> > -> disks go idle.
> >
> > So these patches do not seem to be the source of these excessive disk writes...
>
> Well, the patches I posted should prevent blocking in the places that it
> was seen, so if that does not stop the slowdowns then either the writeback
> code is not feeding us inodes fast enough or the block device below is
> having some kind of problem....

I don't think its the block device, because reading/writing larger
files do not seem to be troubled. It looks much more like an inode
problem. For example both installing and uninstalling kernel source
trees show these stalls, but during uninstalling this is much more
noticeable.

But I agree that this might not be xfs specific, as this showed up at
the same time as other people started reporting about the 100% iowait
bug. Could be that this is the same bug and the differences between
reiserfs and xfs might explain the iowait vs. idle. Or that I don't
see the 100% iowait is something else on my system...

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/