Re: Is there something wrong here?

Simon Kirby (sim@netnation.com)
Sat, 16 Jan 1999 03:24:46 -0800 (PST)


On Sat, 16 Jan 1999, Simon Kirby wrote:

> Voila! I shall now try with your one-liner (back in a reboot...)

Well, the good news is that all my files still seem to be in-tact. ;)
Unfortunately, the floppy still seems to block. I got some more info,
though.

There was still a delay on the copy again just as there was before I made
the change to the if condition...So, as I noticed it was blocking, I did
a ps auxwl and ps auxwln quickly to see where it was stuck:

[sroot@red:/root]# cat foo1
0 0 214 101 4 0 8 8 wait_on_buf D 1 0:00 sync
0 0 217 101 5 0 768 284 wait_on_buf D 1 0:00 cp -v etc/lilo.conf /dev/null
[sroot@red:/root]# cat foo2
0 0 214 101 2 0 8 8 c0123ac9 D 0401 0:00 sync
0 0 217 101 3 0 768 284 c0123ac9 D 0401 0:00 cp -v etc/lilo.conf /dev/nul

[sroot@red:/root]# grep -3 ^c0123a /System.map
c0123878 T get_empty_filp
c0123964 T init_private_file
c01239c4 T fput
c0123a0c T put_filp
c0123a40 T __wait_on_buffer
c0123b04 t sync_buffers
c0123cd0 T sync_dev
c0123cf8 T fsync_dev

Hmm. *blink*

Well, I went to the mail server and did a "head -c128m /dev/zero >
/var/spool/foo ; sync", and it seemed to reproduce the problem
beautifully. Ran a few other things on other consoles including a process
that logged "ps axl" every 0.5 seconds and "vmstat 1".

Check out the pretty vmstat...Looks like almost every new login blocked:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 5 1 432 1724 141692 254000 0 0 24 931 386 428 8 17 75
0 1 1 432 1728 146796 249248 0 0 5 569 360 324 16 17 68
1 2 1 432 1656 155716 239320 0 0 5 942 340 229 8 19 73
0 1 0 432 1560 158748 236076 0 0 10 558 383 274 12 16 72
2 2 1 432 1664 171360 221920 0 0 7 950 338 267 6 29 65
0 1 2 432 1472 175944 216864 0 0 11 855 446 283 25 24 51
2 0 0 432 2968 176284 216756 0 0 0 827 474 559 12 17 71
0 1 0 432 2408 176288 216756 0 0 0 1836 451 675 13 12 75
3 1 0 432 2452 176288 216760 0 0 1 2953 385 510 16 15 69
0 5 0 432 2160 176288 216756 0 0 0 2088 385 540 14 11 75
0 6 0 432 1984 176172 216744 0 0 0 3027 362 504 10 9 81
1 7 0 432 1796 176060 216736 0 0 0 2741 383 546 24 12 64
0 7 0 432 2128 175952 216716 0 0 0 2507 367 496 11 6 83
3 8 0 432 1784 175836 216704 0 0 0 2842 362 504 14 12 73
2 9 0 432 1452 175320 216580 0 0 0 1523 375 487 23 10 67
1 10 1 432 1324 174864 216280 0 0 0 1840 473 755 16 10 74
0 11 0 432 1492 174820 216208 0 0 1 3179 466 708 13 8 79
2 11 0 432 1796 174768 216148 0 0 0 3942 368 465 16 13 70
0 12 0 432 2564 174768 216144 0 0 0 4094 310 392 12 14 74
3 12 0 432 2416 174772 216144 0 0 1 3366 333 454 12 7 81
2 14 0 432 2004 174772 216144 0 0 0 1998 353 483 16 11 73
0 16 0 432 1912 174776 216144 0 0 0 3220 371 507 19 11 70
1 16 0 432 1880 174664 216128 0 0 0 2799 420 815 15 8 78
1 18 0 432 1764 174564 216100 0 0 0 4474 290 360 14 9 77
1 18 0 432 1736 174484 216056 0 0 2 3455 313 415 18 13 69
0 19 0 432 1992 174400 216016 0 0 0 3774 308 423 16 8 76
0 20 0 432 1744 174280 216008 0 0 0 3621 319 426 12 10 79
2 22 0 432 1152 174072 215964 0 0 0 5327 301 401 18 17 65
1 24 0 432 1596 173484 215784 0 0 0 2872 322 448 20 9 71
1 25 0 432 1368 173392 215752 0 0 0 4803 271 328 24 10 67
0 25 0 432 1608 173288 215732 0 0 0 3698 293 406 14 9 77
0 26 0 432 1480 173204 215688 0 0 0 5685 296 386 20 12 69
2 25 0 432 1584 173116 215652 0 0 0 5342 251 283 16 11 73
2 27 0 432 1784 172584 215416 0 0 0 5393 312 450 21 18 61
11 15 0 432 1568 172564 215356 0 0 41 3020 332 273 21 16 63
7 0 0 432 5008 172276 215064 0 0 18 70 591 726 20 39 41
0 0 0 432 7092 172284 215080 0 0 1 0 348 389 15 10 76
0 0 0 432 7632 172284 215068 0 0 1 0 221 180 28 9 63

A "ps axl" dump from near the end showed:

40 0 34 1 0 0 848 312 wait_on_buf D ? 0:12 /usr/sbin
40 0 7812 1 19 19 2368 1912 wait_on_buf D N ? 0:00 sendmail:
40 0 7815 1 19 19 2372 1936 wait_on_buf D N ? 0:00 sendmail:
100 16757 7728 71 16 16 892 568 wait_on_buf D N ? 0:00 popper -s
100 10143 7746 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 18172 7747 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 10360 7752 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 16643 7756 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 12728 7758 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 13289 7779 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 11919 7783 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 10590 7790 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 13255 7855 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 16773 7882 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 15233 7891 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 17140 7913 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 13603 7922 71 16 16 888 568 down_failed D N ? 0:00 popper -s
100 13604 7926 71 16 16 888 568 down_failed D N ? 0:00 popper -s
0 0 7725 7091 1 0 12 12 get_request D p5 0:00 sync
40 0 7842 7837 19 19 2292 1800 wait_on_buf D N ? 0:00 sendmail:
40 0 7848 7843 19 19 2292 1800 down_failed D N ? 0:00 sendmail:
40 0 7856 7853 19 19 2292 1800 down_failed D N ? 0:00 sendmail:
40 0 7865 7861 19 19 2292 1800 down_failed D N ? 0:00 sendmail:
40 0 7885 7883 19 19 2292 1800 down_failed D N ? 0:00 sendmail:
40 0 7892 7889 19 19 2292 1800 down_failed D N ? 0:00 sendmail:
40 0 7898 7897 19 19 2292 1800 down_failed D N ? 0:00 sendmail:
40 0 7904 7902 19 19 2292 1800 down_failed D N ? 0:00 sendmail:
140 0 4221 26294 19 19 2676 2248 wait_on_buf D N ? 0:00 sendmail:

Which looks to me like it's getting stuck in two places (?), unless this
is something to do with the fact that this box (as opposed to my home box
is running SMP.

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/