Re: Terrible performance of sequential O_DIRECT 4k writes in SANenvironment. ~3 times slower then Solars 10 with the same HBA/Storage.

From: Jan Kara
Date: Tue Jan 07 2014 - 20:17:30 EST


On Tue 07-01-14 07:58:30, Christoph Hellwig wrote:
> On Mon, Jan 06, 2014 at 09:10:32PM +0100, Jan Kara wrote:
> > This is likely a problem of Linux direct IO implementation. The thing is
> > that in Linux when you are doing appending direct IO (i.e., direct IO which
> > changes file size), the IO is performed synchronously so that we have our
> > life simpler with inode size update etc. (and frankly our current locking
> > rules make inode size update on IO completion almost impossible). Since
> > appending direct IO isn't very common, we seem to get away with this
> > simplification just fine...
>
> Shouldn't be too much of a problem at least for XFS and maybe even ext4
> with the workqueue based I/O end handler. For XFS we protect size
> updates by the ilock which we already taken in that handler, not sure
> what ext4 would do there.
Well, I was specifically worried about i_mutex locking. In particular:
Before we report appending IO completion we need to update i_size.
To update i_size we need to grab i_mutex.

Now this is unpleasant because inode_dio_wait() happens under i_mutex so
the above would create lock inversion. And we cannot really do
inode_dio_done() before grabbing i_mutex as that would open interesting
races between truncate decreasing i_size and DIO increasing it.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/