possible nfsv3 write corruption
From: Pallissard, Matthew
Date: Thu Feb 27 2020 - 11:28:49 EST
Forgive me if this is the wrong list.
Ok, I have this super infrequent data corruption on write that seems to be limited to nfsv3 async mounts. I have not tested nfsv4 yet. I _think_ I've narrowed down to the 5.5.0 > X >= 5.1.4 (maybe earlier) kernels. I had some users report they had random data corruption. A bit of testing shows that it's reproducible and the corruption is nearly identical every time.
I'd like to get to the bottom of this so I can guarantee that a kernel upgrade will resolve the issue.
What winds up happening is every several hundred GiB[ish] we wind up with the first half of a 64 bit segment corrupted. Here is some example output from a test. My test writes a few Gib, alternating between 64 bits of `0`'s and 64 bits of `1`'s. I then read it in and check the contents. Re-reading the file shows that it's corrupted on write, not read.
> 2020-02-14 11:04:34 crit found mis-match on word segment 11911168 / 33554432!
> 2020-02-14 11:04:34 crit found mis-match on byte 7, 188 != 255
> 2020-02-14 11:04:34 crit found mis-match on byte 6, 0 != 255
> 2020-02-14 11:04:34 crit found mis-match on byte 5, 16 != 255
> 2020-02-14 11:04:34 crit found mis-match on byte 4, 128 != 255
> 2020-02-14 11:04:34 crit 1011110000000000000100001000000011111111111111111111111111111111
> 2020-02-14 13:38:11 crit found mis-match on word segment 1982464 / 33554432!
> 2020-02-14 13:38:11 crit found mis-match on byte 7, 188 != 255
> 2020-02-14 13:38:11 crit found mis-match on byte 6, 0 != 255
> 2020-02-14 13:38:11 crit found mis-match on byte 5, 16 != 255
> 2020-02-14 13:38:11 crit found mis-match on byte 4, 128 != 255
> 2020-02-14 13:38:11 crit 1011110000000000000100001000000011111111111111111111111111111111
Knowns;
* does not appear to happen on CentOS/EL 3.10 series kernel
* does not appear to happen on a 5.5 series kernel
* I'm re-running all my tests now to confirm this.
* not hardware dependent
* not processor dependent
* I tested 3 different Intel processors
* appears to only happen on NFS v3 async mounts
* local disk and `-o sync` NFS v3 mounts have been tested
* It happens on random 64 bit segments
* It's *always* the same 4 bytes that are corrupted
* While often identical, the corrupted bytes are not always identical
* the identical corruption pattern can appear on separate computers.
* It's *always* on words that are written with `1`'s <- this is the part I find most interesting
* whether or not I explicitly call `fflush` and `sync` has no effect on the results.
* usually takes ~80-2000Gib to reproduce, sometimes higher or lower but infrequent.
* I've been writing 2GiB files
* sometimes I never hit the corruption case.
* I've yet to see more than one corrupted segment in a file.
A little bit about the build/run environments;
the hardware
CentOS 7.
CentOS glibc 2.17
clang 9 / lld
Dell PowerEdge R620
Dell PowerEdge C6320
Dell PowerEdge C6420
Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
* I did compile locally on every box. I also tested every compiled binary on every box. It didn't seem to affect the results.
* I don't have a tcpdump of this yet. I'm hoping to get that started before the end of the week.
* I read and write to the same file every time, unlinking it before writing again
* I have not tried dropping the cache between any of the steps.
* I have engaged our storage vendor to see what they have to say. They're pretty good at getting useful metrics and insight so if there is anything I should have them gather server-side please let me know.
If anyone as any insight or additional testing I can perform I would *greatly* appreciate it. I would be thrilled if this turned out to be some dumb configuration option or other operational thing performed incorrectly.
Thank you for your time.
Matt Pallissard
Attachment:
signature.asc
Description: PGP signature