Re: real-time threaded IO with low latency (audio)

David Olofson (audiality@swipnet.se)
Sun, 25 Jul 1999 20:07:25 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Chuck Lever: "Re: clustering page-ins"
Previous message: Paul Barton-Davis: "audio/PCI/mmap oddity (again)"

Ove Ewerlid wrote:
>
> David Olofson wrote:
> > Steve Underwood wrote:
> > > Not possible on a dedicated DSP? Rubbish. A large number of DSP apps do just this.
> > > You clock the DSP from the same source as the converters. At the start of
> > > execution you sync up to the flow from the data source. After that you use a
> > > processing loop padded so its execution time is precisely related to the sample
> > > interval of the data flowing through. DSP chips usually have deterministic
> > > processing times - no vagueness with caches, and so on. If the processor is fast
> > > enough to leave a few spare processing cycles, its a good idea to do a sync check
> > > each time round the main loop. If not, just live with it!
> >
> > Anyway, we're off topic here again... We can't tune Linux to THAT level,
> > after all! ;-)
>
> Have to disagree, if linux can lock a processor to "anything" then linux
> has done what
> it can wrt tuning and the limits are in the hardware.

Not being too serious there, I didn't mention that I had only single CPU
systems in mind. I wouldn't call a single CPU locked to a single process
one running Linux...

> I have very good experiences with DSP
> algorithms on locked Pentiums CPU's (L1 cache). In typical hard real
> time audio signal
> processing you usually have tight loops that dominates the time
> complexity.

> The worst case _conditionals_ are usually not a problem.

No, not on a system that's _synchronized_ as opposed to a DSP loop
padded with NOPs...

> This assumes that you can split the algorithms into one hard category
> and one soft or for
> that matter one fast and one slow. Here is a real word example -
> inversion of speaker
> audio characteristics and cancellation of echo effects in the room where
> you listen to
> the sound. In short you need a number of rather long FIR filters
> operating at 44.1 kHz
> and a more advanced high level adaption algorithm that computes new FIR
> filter coefficients
> a couple of time per second (based on feedback from the room).
> FIR filters written in assembler and high level algorithm written in
> C/C++ or for that matter
> directly in some interpreted tool for matrix computations (such as
> matlab).
>
> Linux can deal with this scenario just fine as long as it has CPU
> locking.

I'd say that's more "working around Linux" than "Linux dealing with it",
but so what; it works! :-)

> One processor deals with the FIR filter and one processor deals with the
> rest.
> Using shared memory via MMAP allows very optimal interprocess
> communication.

Much like a DSP card, but with a lot faster "card<->host" communication,
a niced development environment, lower price, more power and no weird
hardware that's hard to get at. :-)

> When you work with this type of algorithms you do not want the OS to be
> in the way.
> You want the OS to pave the wave to the hardware (with mmap) and give
> you a nice
> development environment.

Yep, but if I can do with the jitter caused by cache misses and the
latency lower limit implied by context switching, I think using, say 80%
of BOTH CPUs is pretty interesting too. And that's good enough for most
kinds of music/audio processing, which is what I'm most interested in.
However, locking one CPU for ultra low latency processing, and using 50%
of the other one for RTL at higher latency could be pretty
interesting...

> Ofcourse you can find examples where the PC _hardware_ is not good
> enough and you
> simply have to resort to DSP and the high costs you usually find there
> (money/time).
> If the PC-hardware is not good enough there is nothing linux can do
> about it.
>
> For comparisons, a vanilla PII@400 MHz, executes 100 miljon FIR
> operations per second.
> This is using 64 bits floating point. You usually end up using 32 bit
> floats to get
> as much as possible into the L1 cache.
> (That makes 200 miljon FLOPS)

That should do for a couple of flangers... ;-)

> I think that many audio freaks want to work directly ontop of the bare
> hardware
> and this is something linux offers today with very little modifications.

Agree, and I have to admit that this discussion made me realize that my
old ideas of single sample input->output latency may not be completely
impossible, at least not on SMP systems. By compiling plug-ins
"statements" into a tight loop executed on the locked CPU when the user
changes the signal routing, this kind of latency may even be possible to
use in a quite transparent way.

My hihest priority is end users being able to get their audio
recording/editing/processing work done as reliably on a PC as on
dedicated hardware, and less than 1 ms of latency isn't really
necessary. But why stop at that? What's pointless overkill today is an
expected feature tomorrow...

> Ove
>
> --
> Ove Ewerlid
> Ove.Ewerlid@syscon.uu.se or Ove.Ewerlid@signal.uu.se
> Phone: +46 70 666 23 63, Fax: +46 18 503 611, +46 18 555 096

//David

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Chuck Lever: "Re: clustering page-ins"
Previous message: Paul Barton-Davis: "audio/PCI/mmap oddity (again)"