Re: Excessive stall times on ext4 in 3.9-rc2

From: Theodore Ts'o
Date: Thu Apr 11 2013 - 22:57:59 EST


On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote:
> I think it might be more enlightening if Mel traced which process in
> which funclion is holding the buffer lock. I suspect we'll find out that
> the flusher thread has submitted the buffer for IO as an async write and
> thus it takes a long time to complete in presence of reads which have
> higher priority.

That's an interesting theory. If the workload is one which is very
heavy on reads and writes, that could explain the high latency. That
would explain why those of us who are using primarily SSD's are seeing
the problems, because would be reads are nice and fast.

If that is the case, one possible solution that comes to mind would be
to mark buffer_heads that contain metadata with a flag, so that the
flusher thread can write them back at the same priority as reads.

The only problem I can see with this hypothesis is that if this is the
explanation for what Mel and Jiri are seeing, it's something that
would have been around for a long time, and would affect ext3 as well
as ext4. That isn't quite consistent, however, with Mel's observation
that this is a probablem which has gotten worse in relatively
recently.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/