Re: [PATCH 0/3] mm: Swap checksum

From: Cesar Eduardo Barros
Date: Mon May 24 2010 - 06:50:50 EST

Next message: Avi Kivity: "Re: [PATCH 3/3] mm: Swap checksum"
Previous message: Dave Chinner: "Re: [RFC] new ->perform_write fop"
Next in thread: Minchan Kim: "Re: [PATCH 0/3] mm: Swap checksum"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Em 23-05-2010 23:05, Minchan Kim escreveu:

On Mon, May 24, 2010 at 9:57 AM, Cesar Eduardo Barros<cesarb@xxxxxxxxxx> wrote:
The internal ECC of the disk will not save you - a quick Google search found
an instance of someone with silent data corruption caused by a faulty *power
supply*.[1]

And if it is silent corruption, without the checksums you will not notice it
- it will just be dismissed as "oh, Firefox just crashed again" or similar
(the same as bit flips on RAM without ECC).

When I read your comment, suddenly some thought occurred to me.
If we can't believe ECC of the disk, why do we separate error
detection logic between file system and swap disk?

I mean it make sense that put crc detection into block layer?
It can make sure any block I/O.

There are differences as to where the checksum will be stored.

For the filesystem (and for the software suspend image), you have to store the checksums in the disk itself, and the correct place (and ordering requirements) depends on the filesystem. Also, most filesystems do not currently have on-disk checksums.

For the swap, it is much simpler; the checksums can be stored in memory (since they do not matter after a reboot; the swap contents are simply discarded). This also gives better performance, since the checksums do not have to be separately written, and more flexibility, since the kernel can use whatever kind of checksum it wants and can store the checksum in whatever data structure it choses, without worrying about compatibility.

And, in fact, there is a CRC code in the block layer; it is CONFIG_BLK_DEV_INTEGRITY. However, it is not a generic solution; it needs some extra prerequisites (like a disk low-level formatted with sectors with >512 bytes).

And what's BER of disk?
Is it usual to meet the problem?

It is unusual enough that most people who meet it will not notice.

And the filesystem developers (who understand about these things more than me) seem to be trending towards adding checksums to their filesystems. The point of this patch is to meet the same level of safety as btrfs (thus the choice of crc32c, which is what btrfs uses).

In normal desktop, some app killed are not critical. If the
application is critical, maybe app have to logic fault handling.
Firefox has session restore feature and Office program has temporal
save feature.

In fact, crashing is the "best" outcome here. The worst outcome is your application silently corrupting the data you then save to disk.

On the other hand, in server, does it designed well to use swap disk
until we meet bit error of disk?

Servers are the ones which would benefit the most, as their RAM is usually very reliable (they tend to use ECC memory). Their disk subsystems, however, are also more reliable.

Desktop systems (especially "no-name" brand ones) do not gain as much, since their RAM is usually unprotected; however, they are also the ones which have better chance of having low-quality power supplies, uncommon storage media (USB flash drives as the root disk is an example), and problematic I/O subsystems.

My feel is that it seem to be rather overkill.

Yes, it is a bit overkill, except when you are using software suspend. While the software suspend image is not protected by this patch (I am already thinking of a separate patch to add checksums to it), the swapped out pages are (software suspend uses both a memory image saved to the swap partition and the normal swapped out pages).

But you do not have to use it if you think it is overkill - I even added a kernel parameter to easily disable it.

The swap checksum only protects the page against being silently corrupted
while on the disk and at least to some degree on the I/O path between the
memory and the disk. It does not protect against broken kernel-mode code
writing to the wrong address, nor against broken hardware (or hardware
misconfigured by broken drivers) doing DMA to wrong addresses. It also does
not protect against hardware errors in the RAM itself (you have ECC memory
for that).

That is, the code assumes the memory containing the checksums will not be
corrupted, because if it is, you have worse problems (and the CRC error here
would be a *good* thing, since it would make you notice something is not
quite right).

Which is high between BER of RAM and disk?
It's a just question. :)

I have no idea.

However, we can do nothing in software against RAM errors; it would kill performance too much. Against disk errors, however, we can do a lot (software RAID-1 is just one of the simplest examples).

--
Cesar Eduardo Barros
cesarb@xxxxxxxxxx
cesar.barros@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Avi Kivity: "Re: [PATCH 3/3] mm: Swap checksum"
Previous message: Dave Chinner: "Re: [RFC] new ->perform_write fop"
Next in thread: Minchan Kim: "Re: [PATCH 0/3] mm: Swap checksum"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]