IBM DB2 failures in 2.4.0-test*

From: Urban Widmark (urban@svenskatest.se)
Date: Mon Jun 05 2000 - 15:15:29 EST


Hello

I have a problem with IBM DB2 not working in the few 2.4.0-test1-ac? I
know I have tested this on (2 & 7, trying to restore a backup to be
precise, other stuff seems to work). Everything is fine in 2.2. I'm not
sure where it stopped working or why. It is of course perfectly possible
for this to be a db2 bug.

I'm looking for:
+ shm changes, what versions had major changes, ... I'd like to try and
  find which version it stops working. (just guessing, but db2 likes ipc a
  lot)
+ good ideas on what to search for.
+ someone to say stop if ipc/whatever is supposed to be binary
  incompatible with 2.2.
+ someone to say stop if this is an obvious userspace bug.
+ fix'it patches :)

Here is most of what I know now:

strace -o /tmp/db2-restore.trace -ff db2 restore database A0002A from `pwd` into temp
Process 5272 attached
Process 5271 suspended
Process 5273 attached
Process 5271 resumed
Process 5272 detached

... and there it apears to hang.

The end of /tmp/db2-restore.trace.5273
semop(0x5000a, 0x1, 0, 0xbfffb868) = 0
semop(0x5000a, 0x1, 0, 0xbfffb760) = 0
semop(0x5000a, 0x1, 0, 0xbfffec8c) = 0
semop(0x5000a, 0x1, 0, 0xbfffeb84

5271 is looping on this:
--- SIGALRM (Alarm clock) ---
sigreturn() = ? (mask now ~[INT KILL ALRM STOP])
rt_sigprocmask(SIG_SETMASK, [RT_0], NULL, 8) = 0
alarm(0) = 0
rt_sigaction(SIGALRM, {SIG_IGN}, {0x40974460, [], 0x4000000}, 8) = 0
kill(5273, SIG_0) = 0
rt_sigprocmask(SIG_SETMASK, ~[INT ALRM RT_1], [RT_0], 8) = 0
rt_sigaction(SIGALRM, {0x40974460, [], 0x4000000}, {SIG_IGN}, 8) = 0
alarm(31) = 0
read(6, 0xbfffec00, 4) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) ---

% ps -l 5273
  F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
030 S 503 5273 1 0 60 0 - 2853 semop pts/0 0:00 /opt/db2hom
% ps -l 5271
  F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
030 S 503 5271 5269 0 60 0 - 2738 pipe_w pts/0 0:00 db2
        (ps -l n shows that to be pipe_wait)

The sys_semop appears to be looping in the for (;;) {
(I assume it is the 'schedule();' I keep seeing with ps, here is gdb
 vmlinux output:

0xc01652a3 <sys_semop+847>: movb $0x1,0xc0280070
0xc01652aa <sys_semop+854>: call 0xc0117c78 <schedule>
0xc01652af <sys_semop+859>: mov %ebp,%eax

).

Upgrading db2 to the latest "fixpak" generates something similar, here the
parent detects that something is wrong, probably because the child does:
semop(0x1c8010, 0x1, 0, 0xbfffb888) = 0
semop(0x1c8010, 0x1, 0, 0xbfffb780) = 0
semop(0x1c8010, 0x1, 0, 0xbfffba9c) = -1 EINVAL (Invalid argument)

but it still fails ... it even fails to die properly :(

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:22 EST