Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

From: Byron Stanoszek (gandalf@winds.org)
Date: Fri Dec 29 2000 - 22:08:49 EST


On Fri, 29 Dec 2000, Linus Torvalds wrote:

>
> Ok, there's a test13-pre6 out there now, which does a partial sync with
> Alan, in addition to hopefully fixing the innd shared mapping writeback
> problem for good. Thanks to Marcelo Tosatti and others..

I've been noticing a problem with the memory context switching conflicting with
fork() on my Athlon. The problem began in the test13-pre2 patch, and because
nobody else has seen this problem (or otherwise reported it) since then, I
felt I should look into it a little further.

I narrowed the problem down to a subset of patches from the MM set in
test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
i386), but I'm not yet sure why. test13-pre2 and up work without any problems
on an Intel cpu (Pentium 180 & P3 800 tested).

Anyways, I can't seem to find out what really changes with the patch except for
the obvious 'void *segment' changing into a typedef-struct. The only thing I
can think of is that the compiler decodes it differently, but I think I can
safely rule that out. I tried both 2.91.66 and 2.95.2, using both different
types of parameters for P5 & K7 (-march=i586 & -march=i686 -malign-functions=4)
and it still gives the problem on the Athlon. Maybe there's something I've
overlooked in that attached patch. Request for an extra pair of eyes please. :)

Here are the casual symptoms. The parent seems to die as soon as a forked child
exits, which seems to me that a new LDT isn't being initialized correctly:

root:~> ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 1.1 0.4 1228 532 ? S 21:42 0:05 init [3]
root 2 0.0 0.0 0 0 ? SW 21:42 0:00 [keventd]
root 3 0.0 0.0 0 0 ? SW 21:42 0:00 [kswapd]
root 4 0.0 0.0 0 0 ? SW 21:42 0:00 [kreclaimd]
root 5 0.0 0.0 0 0 ? SW 21:42 0:00 [bdflush]
root 6 0.0 0.0 0 0 ? SW 21:42 0:00 [kupdate]
root 289 0.0 0.4 1284 604 ? S 21:42 0:00 syslogd -m 0
root 299 0.0 0.8 1912 1104 ? S 21:42 0:00 klogd
root 351 0.0 1.2 9292 1576 ? S 21:42 0:00 named
root 361 0.0 0.0 0 0 ? Z 21:42 0:00 [named <defunct>]
root 363 0.0 1.2 9292 1576 ? S 21:42 0:00 named
root 364 0.0 1.2 9292 1576 ? S 21:42 0:00 named
root 365 0.0 0.7 2064 936 ? S 21:42 0:00 /usr/sbin/sshd
..etc
(Note PID 361)

root:~> strace nslookup sunsite.unc.edu
 :
 :
rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x4000000}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x4000000}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
getpid() = 2615
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
close(3) = 0
socket(PF_INET6, SOCK_STREAM, 0) = -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0) = -1 ENOSYS (Function not implemented)
socket(PF_INET6, SOCK_STREAM, 0) = -1 EAFNOSUPPORT (Address family not supported by protocol)--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

---Example parent/child process:

root:~> tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
-rw------- rus/users 1356 2000-06-01 11:46:57 zgv-5.2/INSTALL
-rw------- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING
-rw------- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
-rw------- rus/users 120 2000-04-22 22:46:49 zgv-5.2/AUTHORS
-rw------- rus/users 3714 2000-01-23 16:29:40 zgv-5.2/SECURITY
Segmentation fault (core dumped)

root:~> strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz
 :
 :
open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "\t\t GNU GENERAL PUBLIC LICENSE"..., 9728) = 9728
read(3, "ccept this License. Therefore, "..., 10240) = 10240
write(4, "ccept this License. Therefore, "..., 8248) = 8248
close(4) = 0
utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0
chown32("zgv-5.2/COPYING", 500, 100) = 0
write(1, "-rw------- rus/users 1077 1"..., 72-rw------- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
) = 72
open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024
read(3, "\"as\nis\" without express or impli"..., 10240) = 8192
--- SIGCHLD (Child exited) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

Ideas, anyone?

 -Byron

-- 
Byron Stanoszek                         Ph: (330) 644-3059
Systems Programmer                      Fax: (330) 644-8110
Commercial Timesharing Inc.             Email: bstanoszek@comtime.com


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 31 2000 - 21:00:13 EST