[PATCH v6 0/8] Additional kmsg devices

From: Kazimierz Krosman
Date: Wed Feb 24 2016 - 06:53:36 EST


Dear All,

This is the sixth iteration of Marcin Niesluchowski's series of patches
extending kmsg interface with ability to dynamically create (and destroy)
kmsg-like devices which can be used by userspace for logging,

Changed from v5:

* ioctl that adds kmsg device with minor number higher than 255 returns
with error. This is because of registering kmsg as character device with
register_chrdev which creates 256 minor numbers for a given major.

* minor fix for selftest- changed size of message to write.
Previous value causes exceeding of LOG_LINE_MAX limits.

* dropped RFC tag in subject.

* rebased patches v5 on Linux v4.5-rc5.

Changes from v4:

* selftests are rearranged to use kselftest.h API

* a disputed patch "add predefined _PID, _TID, _COMM keywords to kmsg*
log dict" is removed - no chance it will be ever accepted. It is not
critical for this patchset as a whole.

Changes from v3:

* fixes problems spotted by kbuild test robot

* fiesed problem with inproper use of copy_from_user()

Changes from v2:

* Extracted kmsg related functions from printk.c to a new file, kmsg.c

Changes from v1:

* all occurences of '#ifdef CONFIG_PRINTK' removed from drivers/char/mem.c

* printk related code moved to kernel/printk/printk.c

* use of VMCOREINFO_STRUCT_SIZE

* selftests for kmsg added (shape of testing infrastructure based on
kdbus selftests)

We would like to share our opinion about this changes. We see following advantages
of the proposed extensions to kmsg interface:

1. kmsg device does not require maintenance by reader process side.
Multiple writers can write to a device and new records overwrite logs saved earlier.
When system crashes logs can be restored with pstore mechanism.

2. Using kmsg can cause lower CPU utilisation in the real-word use case than
userspace logging mechanisms.
We created 2 tests: (1) 100 writer processes write to created kmsg buffer and
(2) 100 writers write to socket (stream)- there is one reader to protect
socket buffer against overflow. Tests show that cpu utilisation in case of first
test is about 2.3 times lower (39.1%) than it is in second case (87.7%) (measured
with top program; tests code is attached below). Tested on Odroid XU4.

Possible future enhancements:
1. It is possible to extend kmsg interface with ioctl that is blockingi read process
until the cyclic buffer is almost full (for example 3/4 of max size). This
modification can enable efficient read of records and prevents against cyclic buffer
overwriting.

2. Maybe we should consider to set some global and local limit for creating
additional buffer. In current version of patches it is possible to create buffers
as long as minor is below 256 and there is a free memory (ioctl fail if kmalloc fail).
It means that it is possible to allocate whole kernel memory for kmsg devices.

Best regards,
Kazimierz Krosman

Marcin Niesluchowski (6):
printk: add one function for storing log in proper format
kmsg: introduce additional kmsg devices support
kmsg: add additional buffers support to memory class
kmsg: add function for adding and deleting additional buffers
kmsg: add ioctl for adding and deleting kmsg* devices
kmsg: add ioctl for kmsg* devices operating on buffers

Paul Osmialowski (2):
printk: extract kmsg-related routines from printk.c to kmsg.c
kmsg: selftests

Documentation/ioctl/ioctl-number.txt | 1 +
drivers/char/mem.c | 27 +-
fs/proc/kmsg.c | 4 +-
include/linux/printk.h | 48 +
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/kmsg_ioctl.h | 45 +
kernel/printk/Makefile | 1 +
kernel/printk/kmsg.c | 1022 ++++++++++++++++
kernel/printk/printk.c | 1255 +++++---------------
kernel/printk/printk.h | 260 ++++
samples/kmsg/kmsg-api.h | 44 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kmsg/.gitignore | 1 +
tools/testing/selftests/kmsg/Makefile | 30 +
tools/testing/selftests/kmsg/kmsg-test.c | 344 ++++++
tools/testing/selftests/kmsg/kmsg-test.h | 28 +
tools/testing/selftests/kmsg/test-buffer-add-del.c | 78 ++
.../kmsg/test-buffer-add-write-read-del.c | 163 +++
.../kmsg/test-buffer-buf-multithreaded-torture.c | 201 ++++
.../selftests/kmsg/test-buffer-buf-torture.c | 141 +++
20 files changed, 2725 insertions(+), 970 deletions(-)
create mode 100644 include/uapi/linux/kmsg_ioctl.h
create mode 100644 kernel/printk/kmsg.c
create mode 100644 kernel/printk/printk.h
create mode 100644 samples/kmsg/kmsg-api.h
create mode 100644 tools/testing/selftests/kmsg/.gitignore
create mode 100644 tools/testing/selftests/kmsg/Makefile
create mode 100644 tools/testing/selftests/kmsg/kmsg-test.c
create mode 100644 tools/testing/selftests/kmsg/kmsg-test.h
create mode 100644 tools/testing/selftests/kmsg/test-buffer-add-del.c
create mode 100644 tools/testing/selftests/kmsg/test-buffer-add-write-read-del.c
create mode 100644 tools/testing/selftests/kmsg/test-buffer-buf-multithreaded-torture.c
create mode 100644 tools/testing/selftests/kmsg/test-buffer-buf-torture.c

--
1.9.1

Attached Test programs:
(1) kmsg_test.c
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/time.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdint.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <linux/kmsg_ioctl.h>


enum { ARGS_NUM_MAX = 4};
int MSG_COUNT = 1024*102;
int MSG_SIZE = 256;
int KMSG_SIZE = 1024;
int WRITERS_NUM = 100;

int* args[ARGS_NUM_MAX] = {&KMSG_SIZE, &WRITERS_NUM, &MSG_SIZE, &MSG_COUNT};

int main_minor = 0;
int main_fd = 0;

static int print_help(char* name)
{
printf( "Usage: %s [KMSG_SIZE [WRITERS_NUM [MSG_SIZE [MSG_COUNT]]]]\n"
"Example call: %s 1024 10 256 1048576\n"
"\nOption description:\n"
"\tKMSG_SIZE: size of buffer created with ioctl on /dev/kmsg\n"
"\tWRITER_NUM: number of process that writes to created buffer\n"
"\tMSG_SIZE: size of message\n"
"\tMSG_COUNT: number of write() calls done by each writer\n"
"\n\n", name, name
);
}

static int kmsg_cmd_buffer_add(char* path)
{
struct kmsg_cmd_buffer_add cmd;
int ret;
int fd = open("/dev/kmsg", O_RDWR);

if (fd < 0)
{
fprintf(stderr, "Cannot open /dev/kmsg: fd=%d, errno=%d\n", fd, errno);
exit(EXIT_FAILURE);
}
cmd.size = KMSG_SIZE;
cmd.mode = 0662;

ret = ioctl(fd, KMSG_CMD_BUFFER_ADD, &cmd);
if (ret < 0)
{
fprintf(stderr, "ioctl failed (size=%lld, mode=0%o) = return=%d, errno=%d\n", cmd.size, cmd.mode, ret, errno);
exit(EXIT_FAILURE);
}

main_minor = cmd.minor;
snprintf(path, 80, "/dev/kmsg%d", main_minor);
printf("Created device: %s\n", path);
main_fd = fd;
return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
}

static void kmsg_cmd_buffer_del()
{
int ret;
ret = ioctl(main_fd, KMSG_CMD_BUFFER_DEL, &main_minor);
printf("Deleted /dev/kmsg%d %d %d\n", main_minor, ret, errno);
close(main_fd);
}

static void fail()
{
kmsg_cmd_buffer_del();
exit(EXIT_FAILURE);
}

int main(int argc , char *argv[])
{
char path[80];
char buf[MSG_SIZE];
size_t size;
struct timeval start, end;
int i, fd, forked, pid;
forked = 0;

if (argc > 1)
{
int i = 0;
while ( i < ARGS_NUM_MAX && i+1 < argc)
*(args[i++]) = atoi(argv[i]);
} else
print_help(argv[0]);

printf("Test parameters: KMSG_SIZE=%d, WRITERS_NUM=%d, MSG_SIZE=%d, MSG_COUNT=%d\n", KMSG_SIZE, WRITERS_NUM, MSG_SIZE, MSG_COUNT);

kmsg_cmd_buffer_add(path);

if(argc == (ARGS_NUM_MAX+2))
{
fd = open(path, O_WRONLY);
if (fd < 0)
{
fprintf(stderr, "open failed: %s (fd=%d, errno=%d)\n", path, fd, errno);
fail();
}

return 0;
}

for (i = 0;i<WRITERS_NUM;i++)
{
if(!fork())
{
forked =1;
break;
}
}

if (!forked)
{
if (-1 == gettimeofday(&start, 0)) {
fprintf(stderr, "gettimeofday failed\n");
fail();
}
printf("%ld.%06ld\n", start.tv_sec, start.tv_usec);
} else {
fd = open(path, O_WRONLY);
if (fd < 0)
{
fprintf(stderr, "open failed: %s (fd=%d, errno=%d)\n", path, fd, errno);
exit(EXIT_FAILURE);
}

for(i=0; i<MSG_COUNT; i++)
if (write(fd, buf, MSG_SIZE) < 0) {
fprintf(stderr, "Write failed. %s\n", strerror(errno));
exit(EXIT_FAILURE);
}
return 0;
}

while (pid = waitpid(-1, NULL, 0))
if (errno == ECHILD)
break;

if (-1 == gettimeofday(&end, 0)) {
fprintf(stderr, "gettimeofday failed\n");
fail();
}

printf("%ld.%06ld\n", end.tv_sec, end.tv_usec);

printf("Time: %ld us\n", (end.tv_sec-start.tv_sec)*1000000 + end.tv_usec - start.tv_usec);
kmsg_cmd_buffer_del();
return 0;
}
//-------------------------------------------------------------------------------------------
(2) stream_test.c
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/time.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/epoll.h>

#define SOCKET_PATH "/var/run/test_socket"
enum {MSG_MAX_SIZE = 8192};
int WRT_NUM = 100; //WRITERS_NUMBER
enum { ARGS_NUM_MAX = 3};
int MSG_COUNT = 1024*102;
int MSG_SIZE = 256;

char buf[MSG_MAX_SIZE];
int sd;

static int print_help(char* name)
{
printf( "Usage: %s [WRITERS_NUM [MSG_SIZE [MSG_COUNT]]]\n"
"Example call: %s 10 256 1048576\n"
"\nOption description:\n"
"\tWRITERS_NUM: number of process that writes to created buffer\n"
"\tMSG_SIZE: size of message\n"
"\tMSG_COUNT: number of write() calls done by each writer\n"
"\n\n", name, name
);
}

void prepare_server()
{
struct sockaddr_un server_addr;
sd = socket(AF_UNIX, SOCK_STREAM, 0);
if (sd == -1) {
fprintf(stderr, "Could not create socket");
exit(EXIT_FAILURE);
}

memset(&server_addr, 0, sizeof(server_addr));
server_addr.sun_family = AF_UNIX;
strcpy(server_addr.sun_path, SOCKET_PATH);
unlink(server_addr.sun_path);

if (bind(sd, (struct sockaddr*)&server_addr, sizeof(server_addr)) == -1) {
fprintf(stderr, "Bind failed");
exit(EXIT_FAILURE);
}

if (listen(sd, 5) == -1) {
fprintf(stderr, "Listen failed");
exit(EXIT_FAILURE);
}

}


void do_server()
{
struct sockaddr_un client_addr;
long long int readed_size = 0;
long long int to_read = (MSG_COUNT*MSG_SIZE*(long long int)WRT_NUM);
int i = 0, conn_sock, epollfd, nfds, flags;
unsigned int client_len = sizeof(client_addr);
ssize_t size_one;
struct epoll_event ev, events[WRT_NUM+1];

epollfd = epoll_create1(0);
if (epollfd == -1) {
perror("epoll_create1");
exit(EXIT_FAILURE);
}

ev.events = EPOLLIN;
ev.data.fd = sd;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, sd, &ev) == -1) {
perror("epoll_ctl: listen_sock");
exit(EXIT_FAILURE);
}

for (;;) {
nfds = epoll_wait(epollfd, events, WRT_NUM+1, -1);
if (nfds == -1) {
perror("epoll_wait");
exit(EXIT_FAILURE);
}

for (i=0; i < nfds; ++i) {
if (events[i].data.fd == sd) {
conn_sock = accept(sd,
(struct sockaddr *) &client_addr, &client_len);
if (conn_sock == -1) {
perror("accept");
exit(EXIT_FAILURE);
}
ev.events = EPOLLIN;
ev.data.fd = conn_sock;
flags = fcntl(conn_sock, F_GETFL, 0);
fcntl(conn_sock, F_SETFL, flags | O_NONBLOCK);
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock, &ev) == -1) {
perror("epoll_ctl: conn_sock");
exit(EXIT_FAILURE);
}
} else {
size_one = read(events[i].data.fd, buf, MSG_MAX_SIZE);
if (size_one == 0 && events[i].events & EPOLLHUP)
close(events[i].data.fd);
readed_size += size_one;
}
}
if (readed_size >= to_read)
return;
}
}

int main(int argc , char *argv[])
{
int sd;
struct sockaddr_un client_addr;
size_t size;
ssize_t size_one;
struct timeval start;
struct timeval end;
int pid, i;
int forked=0;

int* args[ARGS_NUM_MAX] = {&WRT_NUM, &MSG_SIZE, &MSG_COUNT};

if (argc > 1)
{
int i = 0;
while ( i < ARGS_NUM_MAX && i+1 < argc)
*(args[i++]) = atoi(argv[i]);
} else
print_help(argv[0]);

printf("Test parameters: WRITERS_NUM=%d, MSG_SIZE=%d, MSG_COUNT=%d\n", WRT_NUM, MSG_SIZE, MSG_COUNT);
prepare_server();

for (i=0;i<WRT_NUM;i++)
if (!fork())
{
forked=1;
break;
}

if (!forked)
{
if (-1 == gettimeofday(&start, 0)) {
fprintf(stderr, "gettimeofday failed");
exit(EXIT_FAILURE);
}
printf("%ld.%06ld\n", start.tv_sec, start.tv_usec);
} else
{
sd = socket(AF_UNIX, SOCK_STREAM, 0);
if (sd == -1) {
fprintf(stderr, "Could not create socket");
exit(EXIT_FAILURE);
}

memset(&client_addr, 0, sizeof(client_addr));
client_addr.sun_family = AF_UNIX;
strcpy(client_addr.sun_path, SOCKET_PATH);

if (connect(sd, (struct sockaddr*)&client_addr, sizeof(client_addr)) == -1) {
fprintf(stderr, "Connect failed");
exit(EXIT_FAILURE);
}

for(i=0; i<MSG_COUNT; i++) {
if (write(sd, buf, MSG_SIZE) < 0) {
fprintf(stderr, "Write failed. %s", strerror(errno));
exit(EXIT_FAILURE);
}
}
return 0;
}
do_server();

while (pid = waitpid(-1, NULL, 0))
if (errno == ECHILD)
break;

if (-1 == gettimeofday(&end, 0)) {
fprintf(stderr, "gettimeofday failed");
exit(EXIT_FAILURE);
}

printf("%ld.%06ld\n", end.tv_sec, end.tv_usec);
printf("Time: %ld us\n", (end.tv_sec-start.tv_sec)*1000000 + end.tv_usec - start.tv_usec);

return 0;
}