Re: RT : Large transfert with2.6.12rc5+realtime-preempt-2.6.12-rc5-V0.7.47-15

From: Serge Noiraud
Date: Tue Jun 07 2005 - 08:42:11 EST


Le mer 01/06/2005 à 13:10, K.R. Foley a écrit :
> Serge Noiraud wrote:
> > Le mar 31/05/2005 à 18:18, K.R. Foley a écrit :
> >
> >>Serge Noiraud wrote:
> >>
> >>>I had the same problem with rc4+47-07, rc5+47-10,47-13
> >>>I reproduce this problem with a tg3 driver and with e1000 driver.
> >>>So I think it's not a driver problem.
> >>>
> >>>I try to copy an iso image from this machine to another one by scp.
> >>>after 35 to 45MB, the copy become stalled with no more transfert.
> >>>We can ping the target machine, all apparently is OK except the scp
> >>>which finish with timeout.
> >>>With ftp, the stalled state is about 100MB.
> >>>If I reboot with a standard kernel ( without RT ), no problem.
> >>>
> >>>Perhaps there is a progress, in 47-15, the size is now 135-140MB
> >>>
> >>>On this machine, we have an ide disk.
> >>>I have setup : hdparm
> >>>-sh-2.05b# hdparm /dev/hda
> >>>
> >>>/dev/hda:
> >>> multcount = 16 (on)
> >>> IO_support = 3 (32-bit w/sync)
> >>> unmaskirq = 1 (on)
> >>> using_dma = 1 (on)
> >>> keepsettings = 0 (off)
> >>> readonly = 0 (off)
> >>> readahead = 256 (on)
> >>> geometry = 65535/16/63, sectors = 78165360, start = 0
> >>>
> >>
> >>Hi,
> >>
> >>I am not sure what might be causing this problem for you. I just tried
> >>to reproduce this on one of my systems but could not (scsi not ide). The
> >>first time it copied 450MB before the remote system ran out of space.
> >>After cleaning up a bit I got the whole 630MB without a hitch. Do you
> >>have the RT patch on both systems or just on the originating system? In
> >>my case its the latter. There is
> >
> >
> > The scp or ftp start on a RT machine.
> > The destination is an RT or Non RT machine, the problem is the same.
> > It's not a space problem, I have 4GB available on the destination path.
> > I can reproduce the phenomena at each try.
> > If I <CTRL C> the scp when it is stalled then relaunch the scp command, the transfert restart without problem.
> > I'm trying to trace this problem but I don't know how to do this.
> > Has someone one method ?
> >
> >
>
> Do any of your logs provide any clues? Does this happen with just
> 2.6.12-rc5 and no rt-preempt patch? Does ifconfig provide any clues? How
> about netstat -a? What is the state of the connection?

I have no problem with a 2.6.12rc5 with no RT patch.

With the RT pach :
When I get the problem, netstat -a seems ok.
The connection is established.

I progress in my tests.
runlevel = 1

cp of the iso file to another one in the same directory : no problem
I reboot in single then start only lo ( loopback )
cd /tmp
scp F1.iso localhost:/tmp/F2.iso : no problem

In runlevel > 1

dd if=/dev/zero bs=4k | ssh localhost:/dev/null
it is OK after 1 hour.

sh-2.05b# dd if=/dev/zero bs=8k | ssh 192.168.44.112 dd of=/dev/null
root@xxxxxxxxxxxxxx's password:
Read from remote host 192.168.44.112: Connection timed out
3565+0 records in
3564+0 records out
sh-2.05b#cat /proc/latency_trace
preemption latency trace v1.1.4 on 2.6.12-DAV06_dbg
--------------------------------------------------------------------
latency: 45 us, #64/64, CPU#0 | (M:rt VP:0, KP:1, SP:1 HP:1 #P:2)
-----------------
| task: ksoftirqd/0-3 (uid:0 nice:-10 policy:0 rt_prio:0)
-----------------

_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
dd-6389 0dn.2 2us : _raw_spin_lock (__up_mutex)
dd-6389 0dn.3 2us : _raw_spin_lock (__up_mutex)
dd-6389 0dn.4 3us : mutex_getprio (__up_mutex)
dd-6389 0dn.4 4us : mutex_getprio <dd-6389> (73 73):
dd-6389 0dn.4 4us : _raw_spin_unlock (__up_mutex)
dd-6389 0dn.3 4us : preempt_schedule (__up_mutex)
dd-6389 0dn.3 5us : _raw_spin_unlock (__up_mutex)
dd-6389 0dn.2 5us : preempt_schedule (__up_mutex)
dd-6389 0dn.2 5us : _raw_spin_unlock (__up_mutex)
dd-6389 0dn.1 6us : preempt_schedule (__up_mutex)
dd-6389 0dn.1 7us : smp_reschedule_interrupt (c0141a7a 0 0)
dd-6389 0dn.1 8us < (608)
dd-6389 0dnh1 9us : do_IRQ (c0141a7a 0 0)
dd-6389 0d.h. 10us : _raw_spin_lock (__do_IRQ)
dd-6389 0d.h1 11us : ack_lapic_irq (__do_IRQ)
dd-6389 0d.h1 11us : redirect_hardirq (__do_IRQ)
dd-6389 0d.h1 12us : _raw_spin_unlock (__do_IRQ)
dd-6389 0d.h. 12us : handle_IRQ_event (__do_IRQ)
dd-6389 0d.h. 12us : timer_interrupt (handle_IRQ_event)
dd-6389 0d.h. 13us : _raw_spin_lock (timer_interrupt)
dd-6389 0d.h1 13us : mark_offset_pmtmr (timer_interrupt)
dd-6389 0d.h1 14us+: _raw_spin_lock (mark_offset_pmtmr)
dd-6389 0d.h2 17us : _raw_spin_unlock (mark_offset_pmtmr)
dd-6389 0d.h1 18us+: _raw_spin_lock_irqsave (timer_interrupt)
dd-6389 0d.h2 20us : _raw_spin_unlock_irqrestore
(timer_interrupt)
dd-6389 0d.h1 21us : do_timer (timer_interrupt)
dd-6389 0d.h1 21us : _raw_spin_unlock (timer_interrupt)
dd-6389 0d.h. 22us : _raw_spin_lock (__do_IRQ)
dd-6389 0d.h1 22us : note_interrupt (__do_IRQ)
dd-6389 0d.h1 23us : end_lapic_irq (__do_IRQ)
dd-6389 0d.h1 23us : _raw_spin_unlock (__do_IRQ)
dd-6389 0dnh1 23us : irq_exit (do_IRQ)
dd-6389 0dn.2 24us : do_softirq (irq_exit)
dd-6389 0d.s. 25us : __do_softirq (do_softirq)
dd-6389 0d.s. 25us : wake_up_process (do_softirq)
dd-6389 0d.s. 26us : try_to_wake_up (wake_up_process)
dd-6389 0d.s. 26us : _raw_spin_lock (try_to_wake_up)
dd-6389 0d.s1 27us : _raw_spin_unlock_irqrestore
(try_to_wake_up)
dd-6389 0d.s. 27us : wake_up_process (do_softirq)
dd-6389 0dn.1 28us < (608)
dd-6389 0.n.1 29us : preempt_schedule (rt_up)
dd-6389 0.n.. 29us : preempt_schedule (pipe_writev)
dd-6389 0dn.. 29us : __schedule (preempt_schedule)
dd-6389 0dn.1 30us : sched_clock (__schedule)
dd-6389 0dn.1 31us : _raw_spin_lock_irq (__schedule)
dd-6389 0dn.1 31us : _raw_spin_lock_irqsave (__schedule)
dd-6389 0dn.2 32us : _raw_spin_unlock (__schedule)
dd-6389 0dn.1 32us : preempt_schedule (__schedule)
dd-6389 0dn.1 32us : _raw_spin_lock (__schedule)
dd-6389 0dn.1 33us : preempt_schedule (_raw_spin_lock)
dd-6389 0dn.2 34us : find_next_bit (__schedule)
dd-6389 0d..2 36us+: trace_array (__schedule)
dd-6389 0d..2 38us : trace_array <<...>-3> (69 6e):
dd-6389 0d..2 38us : trace_array <dd-6389> (73 78):
dd-6389 0d..2 39us+: trace_array (__schedule)
<...>-3 0d..2 42us : __switch_to (__schedule)
<...>-3 0d..2 42us : __schedule <dd-6389> (73 69):
<...>-3 0d..2 43us : _raw_spin_unlock (__schedule)
<...>-3 0d..1 43us : trace_stop_sched_switched (__schedule)
<...>-3 0d..1 43us : trace_stop_sched_switched <<...>-3> (69
0):
<...>-3 0d..1 44us : _raw_spin_lock_irqsave
(trace_stop_sched_switched)
<...>-3 0d..1 45us : trace_stop_sched_switched (__schedule)


vim:ft=help

... Week end ...

I progress in my test. I have the same problem with rc5+47-18

I reproduce the problem on only one machine with one ide disk:

dd if=/dev/zero b=4k | ssh 192.168.44.112 dd of=/dev/null

In all cases, this command duration in 18 minutes even when I launch it
several times.

It works fine with localhost.
The target machine can be everything.
I'm trying to test with rc6+47-25

all my infos are in the script command results dd-test.bz2
which include the dd command, lspci -vv, dmesg, config, mii-tool,
ifconfig

Attachment: dd-test.bz2
Description: application/bzip