Re: Wrong DIF guard tag on ext2 write

From: Boaz Harrosh
Date: Tue Jun 01 2010 - 06:50:08 EST


On 06/01/2010 01:30 PM, Christof Schmitt wrote:
> On Mon, May 31, 2010 at 06:30:05PM +0300, Boaz Harrosh wrote:
>> On 05/31/2010 06:01 PM, James Bottomley wrote:
>>> On Mon, 2010-05-31 at 10:20 -0400, Martin K. Petersen wrote:
>>>>>>>>> "Christof" == Christof Schmitt <christof.schmitt@xxxxxxxxxx> writes:
>>>>
>>>> Christof> Since the guard tags are created in Linux, it seems that the
>>>> Christof> data attached to the write request changes between the
>>>> Christof> generation in bio_integrity_generate and the call to
>>>> Christof> sd_prep_fn.
>>>>
>>>> Yep, known bug. Page writeback locking is messed up for buffer_head
>>>> users. The extNfs folks volunteered to look into this a while back but
>>>> I don't think they have found the time yet.
>>>>
>>>>
>>>> Christof> Using ext3 or ext4 instead of ext2 does not show the problem.
>>>>
>>>> Last I looked there were still code paths in ext3 and ext4 that
>>>> permitted pages to be changed during flight. I guess you've just been
>>>> lucky.
>>>
>>> Pages have always been modifiable in flight. The OS guarantees they'll
>>> be rewritten, so the drivers can drop them if it detects the problem.
>>> This is identical to the iscsi checksum issue (iscsi adds a checksum
>>> because it doesn't trust TCP/IP and if the checksum is generated in
>>> software, there's time between generation and page transmission for the
>>> alteration to occur). The solution in the iscsi case was not to
>>> complain if the page is still marked dirty.
>>>
>>
>> And also why RAID1 and RAID4/5/6 need the data bounced. I wish VFS
>> would prevent data writing given a device queue flag that requests
>> it. So all these devices and modes could just flag the VFS/filesystems
>> that: "please don't allow concurrent writes, otherwise I need to copy data"
>>
>> From what Chris Mason has said before, all the mechanics are there, and it's
>> what btrfs is doing. Though I don't know how myself?
>
> I also tested with btrfs and invalid guard tags in writes have been
> encountered as well (again in 2.6.34). The only difference is that no
> error was reported to userspace, although this might be a
> configuration issue.
>

I think in btrfs you need a raid1/5 multi-device configuration for this
to work. If you use a single device then it is just like ext4.

BTW: you could use DM or MD and it will guard your DIF by coping the
data before IO.

> What is the best strategy to continue with the invalid guard tags on
> write requests? Should this be fixed in the filesystems?
>
> Another idea would be to pass invalid guard tags on write requests
> down to the hardware, expect an "invalid guard tag" error and report
> it to the block layer where a new checksum is generated and the
> request is issued again. Basically implement a retry through the whole
> I/O stack. But this also sounds complicated.
>

I suggest we should talk about this issue in upcoming LSF, because it does
not only affects DIF but any checksum subsystem. And it could enhance Linux
raid performance.

> --
> Christof Schmitt

Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/