Re: 2.2.6_andrea2.bz2

Andrea Arcangeli (andrea@e-mind.com)
Sun, 25 Apr 1999 21:31:30 +0200 (CEST)


On Sun, 25 Apr 1999, Chuck Lever wrote:

>On Sun, 25 Apr 1999, Andrea Arcangeli wrote:
>> o smp lock profiling. It's also a config option (so no overhead if you
>> are not interested to see how much your CPU are scaling well).
>
>is there a nice way to find out *which* spinlocks are popular? i already

Just press SYSRQ+P and disassemble the .lock section. A more automated
approch could be to write a spinlock profiler where every spinlock
increase a variable in a map per every spin in the slow path.

But before starting dropping spinlocks I think we can try to achieve the
maximal single threaded performances as I am trying to do (so UP machines
will go faster too and then SMP will be faster too). I'll produce an
example, it would make no sense at all to drop the spinlock in shrink_mmap
without first changing shrink_mmap to work on lru-cache basis.

I also think that improving the code by changing algorithms where
possible, may also mitigate the need of dropping the big kernel lock
somewhere. I have also to say that I have a 2way SMP and the locking time
here is not too bad, but on a 4way SMP it will be sure worse (and far more
than only two times worse).

See here what tell my SMP lock profiler:

andrea@laser:/tmp/linux-ikd$ cat /proc/stat | grep cpu
cpu 234496 0 57773 2390129
cpu0 120751 0 30132 1190316 5335
^^^^
cpu1 113745 0 27641 1199813 5142
^^^^

>> o rb-tree in the page cache (not per-inode rb but only one whole rb).
>
>how about dcache too? :)

I was researching into that now ;).

>> o update_shared_mappings (will greatly improve performances while
>> writing from many task to the same shared memory).
>
>do you have performance numbers on this?

The performance optimization can be huge.

I hacked this proggy for allow you to test. I think that this proggy will
generate exactly the double of I/O on clean 2.2.6 compared to
2.2.6_andrea2. 10mbyte of I/O on 2.2.6 and 5mbyte of I/O on 2.2.6_andrea2.

#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>

/* file size, should be half of the size of the physical memory */
#define FILESIZE (5 * 1024 * 1024)

int main(void)
{
char *ptr;
int fd, i;
char c = 'A';
pid_t pid;

if ((fd = open("foo", O_RDWR | O_CREAT | O_EXCL, 0666)) == -1) {
perror("open");
exit(1);
}
lseek(fd, FILESIZE - 1, SEEK_SET);
/* write one byte to extend the file */
write(fd, &fd, 1);
ptr = mmap(0, FILESIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == NULL) {
perror("mmap");
exit(1);
}

if ((pid = fork())) { /* parent, wait */
for (i = 0; i < FILESIZE; i += 4096)
ptr[i] = c;
sleep(20);
wait();
/* dirty every page in the mapping */
msync(ptr, FILESIZE, MS_SYNC);
} else { /* child, exec away */
for (i = 0; i < FILESIZE; i += 4096)
ptr[i] = c;
sleep(10);
msync(ptr, FILESIZE, MS_SYNC);
exit();
}
}

The reason this my code is not in the kernel is not because it's buggy but
simple because there are plans for 2.3.x (no-way for 2.2.x) to allow the
file cache to be dirty (to cache also writes and not only read in the page
cache). It's not an obvious approch though and I am not sure how much
dirty pages in the file cache will be a win for the normal case of not
shared mappings (I am not sure if it will worth the overhead of handling a
second layer of flushing the dirty pages into the buffer cache as we do
now between buffers and disk via bdflush and update, we just cache writes
in the buffer cache and it's just very tricky to handle the flushing of
dirty pages right and efficient (it was broken according to me btw)). I am
not against research obviosly is just that it's not an obvious improvement
to the standard path to my eyes (while my update_shared_mappings it is and
won't impact the fast path at all ;).

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/