RE: [RFC] IRQ handlers run with some high-priority interrupts(not NMI) enabled on some platform

From: Finn Thain
Date: Thu Feb 18 2021 - 00:40:50 EST


On Wed, 17 Feb 2021, Song Bao Hua (Barry Song) wrote:

> > On Sat, 13 Feb 2021, Song Bao Hua (Barry Song) wrote:
> >
> > >
> > > So what is really confusing and a pain to me is that: For years
> > > people like me have been writing device drivers with the idea that
> > > irq handlers run with interrupts disabled after those commits in
> > > genirq. So I don't need to care about if some other IRQs on the same
> > > cpu will jump out to access the data the current IRQ handler is
> > > accessing.
> > >
> > > but it turns out the assumption is not true on some platform. So
> > > should I start to program devices driver with the new idea
> > > interrupts can actually come while irqhandler is running?
> > >
> > > That's the question which really bothers me.
> > >
> >
> > That scenario seems a little contrived to me (drivers for two or more
> > devices sharing state through their interrupt handlers). Is it real? I
> > suppose every platform has its quirks. The irq lock in
> > sonic_interrupt() is only there because of a platform quirk (the same
> > device can trigger either of two IRQs). Anyway, no-one expects all
> > drivers to work on all platforms; I don't know why it bothers you so
> > much when platforms differ.
>
> Basically, we wrote drivers with the assumption that this driver will be
> cross-platform. (Of course there are some drivers which can only work on
> one platform, for example, if the IP of the device is only used in one
> platform as an internal component of a specific SoC.)
>
> So once a device has two or more interrupts, we need to consider one
> interrupt might preempt another one on m68k on the same cpu if we also
> want to support this driver on m68k. this usually doesn't matter on
> other platforms.
>

When users show up who desire to run your drivers on their platform, you
can expect them to bring patches and a MAINTAINERS file entry. AFAIK,
Linux development has always worked that way.

Besides, not all m68k platforms implement priority masking. So there's no
problem with portability to m68k per se.

> on the other hand, there are more than 400 irqs_disabled() in kernel, I
> am really not sure if they are running with the knowledge that the true
> irqs_disabled() actually means some interrupts are off and some others
> are still open on m68k.

Firstly, use of irqs_disabled() is considered an antipattern by some
developers. Please see,
https://lore.kernel.org/linux-scsi/X8pfD5XtLoOygdez@lx-t490/
and
commit e6b6be53ec91 ("Merge branch 'net-in_interrupt-cleanup-and-fixes'")

This means that the differences in semantics between the irqs_disabled()
implementations on various architectures are moot.

Secondly, the existence of irqs_disabled() call sites does not imply a
flaw in your drivers nor in the m68k interrupt scheme. The actual semantic
differences are immaterial at many (all?) of these call sites.

> Or they are running with the assumption that the true irqs_disabled()
> means IRQ is totally quiet? If the latter is true, those drivers might
> fail to work on m68k as well.
>

Yes it's possible, and that was my fear too back in 2017 when I raised the
same question with the m68k maintainer. But I never found any code making
that assumption. If you know of such a bug, do tell. So far, your fears
remain unfounded.

> Thanks
> Barry
>