Re: real-time threaded IO with low latency (audio)

Paul Barton-Davis (pbd@Op.Net)
Sat, 24 Jul 1999 09:11:25 -0400


>> a "real" FX processor can do real-time flanging with a 0.2ms
>> delay. the idea that linux (or beos) for that matter can step into
>> this kind of role is absurd, or close to it.
>
>Absurd? My greatest problem seems to be the PCI DMA burst size of the
>card I'm using now.

At a 48KHz sample rate, a 0.2ms flange is slightly under 10
samples. If you specify a fragment size of 16 (just for the power-of-2
fan club), that translates to 4800 interrupts a second from the
soundcard just to say "i crossed a fragment boundary". I didn't see
any posts to linux-kernel that really answered the question "how many
cycles are burnt handling a timer interrupt", but I have a pretty good
feeling that Linux wasn't meant to run well with a regular 4.8KHz
interrupt rate from something requiring an interrupt handler as
relatively complex as the typical DAC handler.

>Driving the PC speaker at 8 kHz interrupt rate or higher is a no brainer

the PC speaker, as I recall, doesn't talk back to you. You have no
idea of fragment boundary crossings. this won't work for real DAC's.

>with RTL. (I've done it at 85 kHz on a dual P-II 233 with periodic
>scheduling, but that's burning lots of CPU power on scheduling, and the
>jitter is [probably] bigger than the average latency.)

Yes, but thats not a realistic example of what an FX unit does. You've
got to copy data from the DAC output buffer into the host memory and
back again. Believing that you can do this *and* the FX computation
*and* run other things seems wrong to me.

I also don't understand the appeal of using RTL for this, other than
"it works" and "its not windows or macos" ...

>Ok... Say you have the CPU processing data using 80% of the avaiable
>cycles. That is, the minimum time it can take to process one buffer is
>80% of the time it takes to play it.
>
>Now, for some reason, the processing task doesn't get started in time.
>If this delay is within the maximum allowed limits, the engine will
>still be available to deliver the output buffer in time.
>
>However, when the next buffer comes in, it has to *wait* until the
>current (delayed) one is processed, thus causing the next buffer to be
>delayed as well! The engine will cath up after a few buffers, as the
>load is less than 100%.
>
>But what if your process loses the CPU once again before it has cathed
>up...?

All understood. But you're missing something here. I mentioned that my
measurements said that 99% of the write()'s to the soundcard complete
within 0.05ms of the specified time. Moreover, if you use ALSA (which
properly allows non-blocking I/O to the ADAC), you'll find that the
actual system call takes only 2-5 microseconds (on a PII-450). That
means that in almost every case (on a quiescent system), your task
*does* get scheduled in time.

If you now add in SCHED_FIFO for the relevant thread, it is
essentially *guaranteed* to be scheduled to run after the ADAC
interrupts us to say that it just crossed a fragment boundary.

So, we're almost there ... well, no, because as Benno's benchmark
demonstrates, even though its scheduled to run, it can't because the
update thread is busy with the page cache. Sigh.

>(...)
>> >"Average latency" and "maximum latency" are two very different things.
>>
>> no - latency and jitter are two different things. what we have a
>> problem with is jitter, not latency. the difference is subtle but its
>> important because it affects how you think about solving it.
>
>True, but I was actually thinking about interrupt and scheduling

Let me try to be precise in my usage of the terms:

latency: the constant delay between the generation of an audio
sample and its effect on the analog or digital output
of an audio device. caused almost entirely by the
fragment size requested by the application, and limited
only by the h/w's minimum fragment size and maximum
CPU throughput.

jitter: variation in the delay between sample generation and sample
output from the audio device caused by the host CPU.

>latencies here. I'm being unclear again, obviously. Maximum latency =
>average latency + maximum deviation due to jitter. Anyway, the problem
>is that your code doesn't always get control in time, and/or that it can
>sometimes lose control before it's done.

Not if its SCHED_FIFO. If you wire down its pages (mlockall(2)), and
there are no other SCHED_FIFO tasks, only interrupt servicing can get
in your way. If you don't believe me, try running:

while (1) { }

as a SCHED_FIFO task. Your system will appear to hang, but its really
just exceedingly busy :)

Now, interrupt servicing is clearly a problem, but what are you
proposing ? That we *don't* service an interrupt in a timely fashion ?
This is one of the reasons dedicated FX units work so reliably at low
latencies: they have no "other interrupts".

--p

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/