Reproducible data corruption with sendfile+vsftp - splice regression?

From: Holger Hoffstaette
Date: Thu Nov 29 2007 - 18:55:35 EST



Hi -

This regular Linux user and lkml lurker just noticed data corruption in
ftp'ed files and narrowed it down to vsftpd using sendfile(). So far this
has never caused problems in the past; I have not noticed this with
2.6.22.x but may have missed it. I do remember reading about some changes
to the underlying splice stuff since .23 so that may have something to do
with it.

The scenario:

- created a file with known bit pattern on Linux server
- ftp-got this file to Windows client: file has bad crc (yes, binary)
- verified with another client: same result

I have thus far eliminated (to the best of my knowledge) NICs, switches,
cables, the Windows FTP clients, the hard disk in the server (SATA, ext3):
nothing suspicious in any logs. Box is an AMD Sempron 2600+ with 1.5 GB
RAM, added rt8169 card, Gentoo, vsftpd stable 2.0.5 - nothing fancy.
Transferring the file with samba (interestingly with sendfile enabled) and
via ftp but from /dev/shm repeatably works fine; pulling from disk creates
bad crc, every time. The file is readable and can be copied, verified etc.
over and over so I'm sure that I'm not falling prey to a false positive.
ifconfig indicates no dropped or otherwise corrupted packets.
I noticed this first with 2.6.4-rc3, but also just tried the latest stable
2.6.23.9 with the same config, with no change in behaviour. After setting
vsftpd to use_sendfile=NO, gigs can be transferred without corruption.

The data corruption is sporadic, but absolutely repeatable. The file with
the known good pattern just contains multiple lines of:

012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
..etc..

A corrupted file is missing random characters, so that the corrupted lines
looks like this (line numbers added by me):

19785: 012345678901234567890123456789012345678901234567890
19786: 01234567890123456789012345678901234567890123678901234567890
19787: 012345678901234567890123456789012345678901234567890

or:

20074: 012345678901234567890123456789012345678901234567890
20075: 01234567890123456789012345678901234567890123012345678901234567890123456789012345678901234567890
20076: 012345678901234567890123456789012345678901234567890

Again, other network or hd traffic shows no signs of gremlins; the box is
perfectly stable, and turning sendfile on or off triggers/untriggers the
corruption reliably. I will try 2.6.22.x over the weekend, and before I
bother lkml with dmesg/.config etc. I wanted to fish for initial thoughts.

thanks
Holger


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/