another nfs puzzle

From: Kenny Simpson
Date: Tue Dec 06 2005 - 17:04:18 EST


Hi again,
I am seeing some odd behavior with O_DIRECT. If a file opened with O_DIRECT has a page mmap'd,
and the file is extended via pwrite, then the mmap'd region seems to get lost - i.e. it neither
takes up system memory, nor does it get written out.

The attached file demonstrates this.
Compile with -DABUSE to trigger the bad case.
This behavior does not happen with an ext3 partition.

ethereal shows the behavior to be a large amount of block reads, with single byte writes every so
often. Viewing the resultant file from other hosts or even the original host shows the file is
grown, but is zero-filled, not 'a'-filled.

-Kenny




__________________________________________
Yahoo! DSL ? Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com
#define _GNU_SOURCE

#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#include <stdio.h>
#include <string.h>

/*
* the 'abusive' case is where we
* grow the file via pwrite
* unmap the previous mapping,
* do the new mapping
*
* the non-abusive case we
* unmap the previous mapping
* grow the file
* do the new mapping
*
* the difference being that in the abusive case,
* we maintain a mapping during the pwrite.
*
* If we grow the file via pwrite and maintain a
* mapping, the mapped pages are never written out.
*
* If we either grow the file via ftruncate or remove
* old mappings before growing the file, then all is
* fine.
*/

#ifdef ABUSE
#define UNMAP_AFTER() munmap(mapping, size)
#define UNMAP_BEFORE() struct eat_semi
#else
#define UNMAP_AFTER() struct eat_semi
#define UNMAP_BEFORE() munmap(mapping, size)
#endif

int main(int argc, char* argv[])
{
int fd;
unsigned int const size = 4096 * 1024;
unsigned int offset = 0;

if (argc != 2) {
printf("usage: %s <filename>\n", argv[0]);
return 0;
}

fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC | O_LARGEFILE | O_DIRECT, 0644);
if (fd < 0) {
perror("open");
return 0;
}

pwrite64(fd, "", 1, offset + size);
//ftruncate64(fd, offset + size);
char* mapping = (char*)mmap64(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset);
memset(mapping, 'a', size);
UNMAP_BEFORE();

for (;;) {
offset += size;

pwrite64(fd, "", 1, offset + size);
//ftruncate64(fd, offset + size);
UNMAP_AFTER();
mapping = (char*)mmap64(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset);
memset(mapping, 'a', size);
UNMAP_BEFORE();
}

close(fd);

return 0;
}