Re: Strange interrupt behaviour

Andi Kleen (ak@muc.de)
12 Jul 1998 05:48:36 +0200


Linus Torvalds <torvalds@transmeta.com> writes:

> On 11 Jul 1998, Andi Kleen wrote:
>
> > alan@lxorguk.ukuu.org.uk (Alan Cox) writes:
> >
> > > 3. Drivers that get repeated interrupts appear to re-enter the handler
> > > uncontrollably blow the stack and crash. I suspect nested interrupt
> > > handling problems may be half the 8K stack issue, and could be tons
> > > of our other remaining bugs.
> >
> > How about using a separate per-CPU 16K stack for interrupts, instead of
> > handling them on the per-process kernel stack? Then we could probably
> > switch back to 4K process kernel stacks too.
>
> No, the interrupt stack issue must be a red herring. We must _never_ get
> nested interrupts, no matter what. Using a separate stack is never an
> acceptable solution, it just means we have other problems.

When Ingo tells us that the current 2.1 tree barely works with about
7K stack then this opens the question: Where is all this spend?

Currently the system is just very unreliable, because when memory
is fragmented - which happens quickly with the current buddy - then
either all forks will fail because they can't grab 8K, or the system
will start to swap madly, throwing randomly pages away to to find continuous
8K.

Both is not acceptable IMHO, so a solution must be found.

One proposal included using vmalloc() for the stack, but this was
put down because lots of drivers seem to break with that. They put
DMA buffers on the stack which might not work when they happen to
lie on the border between the two pages. This would probably
cause lots of hard-to-find, difficult-to-reporduce bugs.

The MM system could be fixed to make reliable allocation of 8K
pages possible. It seems it is too late to do that for 2.2.

Another solution would be to go back to a 4K stack again.

This has the following problems:
- It might not be enough for the current kernel when lots of
interrupts come in.
- There is not enough space left to put the task_structure at the
bottom for the nice (esp & ~8191)=current trick.

The first point could be solved be a separate per CPU IRQ stack. Another
advantage: is more cache friendly because interrupts always work on the
same space. Disadvantage is that it makes the already-not-so-great interrupt
latency longer (but with luck this could be amortized by the cache wins)

The second could be solved by just putting a pointer to current on the
bottom of the stack (like Bernd Schmidt proposed). This adds an additional
cache miss, but it is probably amortized again by the usage of a proper
cache-coloured task pointer, instead of the cache-unfriendly fixed address
of current[1].

It also costs 2 bytes per current access (for the movl (reg),reg)

Unfortunately the interrupt stack more or less requires the indirect
current scheme, because otherwise current could not be accessed inside
interrupt handlers (the interrupt entry routine would just copy current
to the bottom of the irq stack)

If the reason for the excessive stack use could be found and eliminated
then 4K stacks could probably be used without the extra interrupt stack.
Problem is that this is hard to test and could cause lots of subtle bugs.
The indirect-current scheme would be still needed because 3K
(4K-sizeof(task_struct)) is probably not enough even with all tricks pulled.

Any other ideas?

-Andi

[1] Most cache architecture use the lower bits of the address to
select the cache line, and these are always the same because of the
required 8k alignment of the task_struct, so there are lots of cache line
collisions.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html