Re: O_DIRECT reads appear to be cached on block device partitionfile?

From: Dave Chinner
Date: Tue Sep 14 2010 - 03:36:54 EST


On Mon, Sep 13, 2010 at 11:49:32PM -0400, Brett Russ wrote:
> Running a 2.6.31 kernel on a blade chassis system with multiple
> blades sharing common JBOD storage. The application intelligently
> divides the drives up among the blades, but one blade in particular
> is charged with monitoring. As part of this, this one monitoring
> blade can perform reads of a certain 512B sector of all disks in the
> system. This sector is often written by other blades, these writes
> are sync'd to disk. To work around the lack of cache coherency
> between the distinct blades, I'm using O_DIRECT on the monitoring
> blade such that it always reads from the media to get the latest
> copy of this sector. The basic steps are:
>
> # grab a 512B aligned buffer (use 4KB to be safe)
> posix_memalign(&ptr, getpagesize(), 512B)
> open(/dev/sdX3, O_RDONLY|O_DIRECT)
> lseek(fd, offset, SEEK_SET)
> read(fd, ptr, 512B)
>
> If I run the above on the monitoring blade, then sync an update to
> the sector in question from another blade, then re-reun the above
> code on the monitoring blade, believe it or not I appear to be
> reading stale data. If I use dd with iflag=direct, reading the same
> sector offset at the /dev/sdX3 partition file, I see the same stale
> data as seen from the code above. If, however, I instead access
> this sector offset from the /dev/sdX device file using the (offset
> of partition 3 + offset of the sector) I see the intended data,
> which makes me believe some caching occurred locally for /dev/sdX3.

What does blktrace tell you?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/