Trying to measure performance with splice/vmsplice ....

From: Rick Sherm
Date: Fri Apr 16 2010 - 13:02:36 EST

Next message: John Villalovos: "[PATCH] Oprofile: Change CPUIDS from decimal to hex, and add somecomments"
Previous message: Lorenzo Castelli: "[PATCH] hid: Add mappings for a few keys found on Logitech MX3200"
Next in thread: Steven J. Magnani: "Re: Trying to measure performance with splice/vmsplice ...."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

I'm trying to measure the perf gain by using splice.For now I'm trying to copy a 1G file using splice.(In real scenario, the driver will DMA the data to some buffer(which is mmap'd).The app will then write the newly-DMA'd data to the disk while some other thread is crunching the same buffer.The buffer is guaranteed to not be modified.To avoid copying I was thinking of : splice-IN-mmap'd-buffer->pipe and splice-OUT-pipe->file.)

PS - I've inlined some sloppy code that I cooked up.

Case1) read from input_file and write(O_DIRECT so no buff-cache is involved but it doesn't work) to dest_file.We can talk about the buff-cache later.

(csh#)time ./splice_to_splice

0.004u 1.451s 0:02.16 67.1% 0+0k 2097152+2097152io 0pf+0w

#define KILO_BYTE (1024)
#define PIPE_SIZE (64 * KILO_BYTE)
int filedes [2];

pipe (filedes);

fd_from = open(filename_from,O_RDWR|O_LARGEFILE|O_DIRECT),0777);
fd_to = open(filename_to,(O_WRONLY|O_CREAT|O_LARGEFILE|O_DIRECT),0777);

to_write = 2048 * 512 * KILO_BYTE;

while (to_write) {
ret = splice (fd_from, &from_offset, filedes [1], NULL, PIPE_SIZE,
SPLICE_F_MORE | SPLICE_F_MOVE);
if (ret < 0) {
printf("Error: LINE:%d ret:%d\n",__LINE__,ret);
goto error;
} else {
ret = splice (filedes [0], NULL, fd_to,
&to_offset, PIPE_SIZE/*should be ret,but ...*/,
SPLICE_F_MORE | SPLICE_F_MOVE);
if (ret < 0) {
printf("Error: LINE:%d ret:%d\n",__LINE__);
goto error;
}
to_write -= ret;
}
}

Case 2) directly reading and writing:

Case2.1) copy 64K blocks

(csh#)time ./file_to_file 64
0.015u 1.066s 0:04.04 26.4% 0+0k 2097152+2097152io 0pf+0w

#define KILO_BYTE (1024)
#define MEGA_BYTE (1024 * (KILO_BYTE))
#define BUFF_SIZE (64 * MEGA_BYTE)

posix_memalign((void**)&buff,4096,BUFF_SIZE);

fd_from = open(filename_from,(O_RDWR|O_LARGEFILE|O_DIRECT),0777);
fd_to = open(filename_to,(O_WRONLY|O_CREAT|O_LARGEFILE|O_DIRECT),0777);

/* 1G file == 2048 * 512K blocks */
to_write = 2048 * 512 * KILO_BYTE;
copy_size = cmd_line_input * KILO_BYTE; /* control from cmd_line */
while (to_write) {
ret = read(fd_from, buff,copy_size);
if (ret != copy_size) {
printf("Error: LINE:%d ret:%d\n",__LINE__,ret);
goto error;
} else {
ret = write (fd_to,buff,copy_size);
if (ret != copy_size) {
printf("Error: LINE:%d ret:%d\n",__LINE__);
goto error;
}
to_write -= ret;
}
}

Case2.2) copy 512K blocks

(csh#)time ./file_to_file 512
0.004u 0.306s 0:01.86 16.1% 0+0k 2097152+2097152io 0pf+0w

Case 2.3) copy 1M blocks
time ./file_to_file 1024
0.000u 0.240s 0:01.88 12.7% 0+0k 2097152+2097152io 0pf+0w

Questions:
Q1) When using splice,why is the CPU consumption greater than read/write(case 2.1)?What does this mean?

Q2) How do I confirm that the memory bandwidth consumption does not spike up when using splice in this case? By this I mean, (node)cpu<->mem. The DMA-in/DMA-out will happen.You can't escape from that but the IOH-bus will be utilized. I want to keep the cpu(node)-mem path free(well, minimize unnecessary copies).

Q3) When using splice, even though the destination file is opened in O_DIRECT mode, the data gets cached. I verified it using vmstat.

r b swpd free buff cache
1 0 0 9358820 116576 2100904

./splice_to_splice

r b swpd free buff cache
2 0 0 7228908 116576 4198164

I see the same caching issue even if I vmsplice buffers(simple malloc'd iov) to a pipe and then splice the pipe to a file. The speed is still an issue with vmsplice too.

Q4) Also, using splice, you can only transfer 64K worth of data(PIPE_BUFFERS*PAGE_SIZE) at a time,correct?.But using stock read/write, I can go upto 1MB buffer. After that I don't see any gain. But still the reduction in system/cpu time is significant.

I would appreciate any pointers.

thanks
Rick

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: John Villalovos: "[PATCH] Oprofile: Change CPUIDS from decimal to hex, and add somecomments"
Previous message: Lorenzo Castelli: "[PATCH] hid: Add mappings for a few keys found on Logitech MX3200"
Next in thread: Steven J. Magnani: "Re: Trying to measure performance with splice/vmsplice ...."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]