2.0 chokes on clone() w/ heavy concurrent m{re,un}map()

Wolfram Gloger (Wolfram.Gloger@dent.med.uni-muenchen.de)
Tue, 11 Jun 1996 00:20:19 +0200


Hi,

I am experimenting with a lock-free multi-threading malloc
implementation (each thread gets its own heap -- therefore no locking
is necessary). The heaps and large malloc()ed chunks are allocated
with mmap(), extended with mremap() if necessary and possible, and
deallocated with munmap(). I have written a test program that does
nothing in each thread but malloc() and free() in random succession.
(It's a bit like crashme but only tests for memory-management bugs.)

Note that I'm not doing `evil' things like munmap()ing and mremap()ing
a _single_ region from two threads (which where created with CLONE_VM
of course). Each thread allocates its own memory areas with mmap(),
does stuff on those areas, and releases them with munmap() when it
dies.

The package including the testing app (try `make linux' to compile) is
available from

ftp://md.dent.med.uni-muenchen.de/pub/wmglo/thr-malloc.tar.gz

Now the problem: With `thread-test' I can reproduce `put_page: page
already exists' messages from the 2.0 kernel quite easily (they don't
seem to have bad effects). And the following Oops happened once so
far, while running 20 concurrent threads (not that I expect to produce
this kind of load in practice every day). If you want to reproduce,
the command line was

% thread-test 100 20 20000 133000

general protection: 0000
CPU: 0
EIP: 0010:[<001191a9>]
EFLAGS: 00010206
eax: 0029b040 ebx: 0029b000 ecx: 0029b042 edx: 0029b042
esi: 01052788 edi: 4019fc00 ebp: 00437018 esp: 005adf58
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process thread-test (pid: 730, process nr: 32, stackpage=005ad000)
Stack: 00437018 411e2000 005adfbc 00000006 0079f7d8 00000000 4019fc00 0029b000
0029b042 0029b040 0079fa98 0010febb 00777018 00437018 411e2000 00000002
0010fd7c 00020400 000003df 402dfb5c 41161000 00000000 0010a7fb 005adfbc
Call Trace: [<0010febb>] [<0010fd7c>] [<0010a7fb>]
Code: ff 47 3c 8b 44 24 30 ff 80 74 01 00 00 85 db 75 22 50 e8 7c

ksymoops output:

>>EIP: 1191a9 <do_no_page+275/3cc>
Trace: 10febb <do_page_fault+13f/29c>
Trace: 10febb <do_page_fault+13f/29c>
Trace: 10a7fb <error_code+4b/60>

Code: 1191a9 <do_no_page+275/3cc> incl 0x3c(%edi)
Code: 1191ac <do_no_page+278/3cc> movl 0x30(%esp,1),%eax
Code: 1191b0 <do_no_page+27c/3cc> incl 0x174(%eax)
Code: 1191b6 <do_no_page+282/3cc> testl %ebx,%ebx
Code: 1191b8 <do_no_page+284/3cc> jne 1191dc <do_no_page+2a8/3cc>
Code: 1191ba <do_no_page+286/3cc> pushl %eax
Code: 1191bb <do_no_page+287/3cc> call 90900093 <_EIP+90900093>
Code: 1191c0 <do_no_page+28c/3cc> nop

Regards,
Wolfram.