Re: [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI irq domain

From: Dey, Megha
Date: Fri Aug 07 2020 - 13:54:57 EST


Hi Thomas,

On 8/7/2020 9:47 AM, Thomas Gleixner wrote:
Jason Gunthorpe <jgg@xxxxxxxxxx> writes:
Though it is more of a rational and a cookbook on how to combine
existing technology pieces. (eg PASID, platform_msi, etc)

The basic approach of SIOV's IMS is that there is no longer a generic
interrupt indirection from numbers to addr/data pairs like
IOAPIC/MSI/MSI-X owned by the common OS code.

Instead the driver itself is responsible to set the addr/data pair
into the device in a device specific way, deal with masking, etc.

This lets the device use an implementation that is not limited by the
harsh MSI-X semantics.

In Linux we already have 'IMS' it is called platform_msi and a few
embedded drivers already work like this. The idea here is to bring it
to PCI.
platform_msi as it exists today is a crutch and in hindsight I should
have payed more attention back then and shoot it down before it got
merged.

IMS can be somehow mapped to platform MSI but the proposed approach to
extend platform MSI with the extra bolts for IMS (valid for one
particular incarnation) is just going into the wrong direction.

We've been there and the main reason why hierarchical irq domains exist
is that we needed to make a clear cut between the involved hardware
pieces and their drivers. The pre hierarchy model was a maze of stuff
calling back and forth between layers with lots of duct tape added to
make it "work". This finally fell apart when Intel tried to support
I/O-APIC hotplug. The ARM people had similar issues with all the special
irq related SoC specific IP blocks which are placed between the CPU
level interrupt controller and the device.

The hierarchy strictly seperates the per layer resource management and
each layer can work mostly independent of the actual available parent
layer.

Now looking at IMS. It's a subsystem inside a physical device. It has
slot management (where to place the Message) and mask/unmask. Resource
management at that level is what irq domains are for and mask/unmask is
what a irq chip handles.

So the right thing to do is to create shared infrastructure which is
utilized by the device drivers by providing a few bog standard data
structures and the handful of device specific domain and irq functions.

That keeps the functionality common, but avoids that we end up with

- msi_desc becoming a dump ground for random driver data

- a zoo of platform callbacks
- glued on driver specific resource management

and all the great hacks which it requires to work on hundreds of
different devices which all implement IMS differently.

I'm all for sharing code and making the life of driver writers simple
because that makes my life simple as well, but not by creating a layer
at the wrong level and then hacking it into submission until it finally
collapses.

Designing the infrastructure following the clear layering rules of
hierarchical domains so it works for IMS and also replaces the platform
MSI hack is the only sane way to go forward, not the other way round.
From what I've gathered, I need to:
1. Get rid of the mantra that "IMS" is an extension of platform-msi.
2. Make this new infra devoid of any platform-msi references
3. Come up with a ground up approach which adheres to the layering constraints of the IRQ subsystem
4. Have common code (drivers/irqchip maybe??) where we put in all the generic ims-specific bits for the IRQ chip and domain
which can be used by all device drivers belonging to this "IMS"class.
5. Have the device driver do the rest:
    create the chip/domain (one chip/domain per device?)
    provide device specific callbacks for masking, unmasking, write message

So from the hierarchical domain standpoint, we will have:
- For DSA device: vector->intel-IR->IDXD
- For Jason's device: root domain-> domain A-> Jason's device's IRQ domain
- For any other intel IMS device in the future which
    does not require interrupt remapping: vector->new device IRQ domain
    requires interrupt remapping: vector->intel-IR->new device IRQ domain (i.e. create a new domain even though IDXD is already present?)
Please let me know if my understanding is correct.

What I still don't understand fully is what if all the IMS devices need the same domain ops and chip callbacks, we will be creating various instances of the same IRQ chip and domain right? Is that ok?
Currently the creation of the IRQ domain happens at the IR level so that we can reuse the same domain but if it advisable to have a per device interrupt domain, I will shift this to the device driver.

Thanks,

tglx