Re: [patch 23/26] x64, x2apic/intr-remap: MSI and MSI-X support for interrupt remapping infrastructure

From: Eric W. Biederman
Date: Thu Jul 10 2008 - 21:32:19 EST


Suresh Siddha <suresh.b.siddha@xxxxxxxxx> writes:

> MSI and MSI-X support for interrupt remapping infrastructure.
>
> MSI address register will be programmed with interrupt-remapping table
> entry(IRTE) index and the IRTE will contain information about the vector,
> cpu destination, etc.
>
> For MSI-X, all the IRTE's will be consecutively allocated in the table,
> and the address registers will contain the starting index to the block
> and the data register will contain the subindex with in that block.
>
> This also introduces a new irq_chip for cleaner irq migration (in the process
> context as opposed to the current irq migration in the context of an interrupt.
> interrupt-remapping infrastructure will help us achieve this).
>
> As MSI is edge triggered, irq migration is a simple atomic update(of vector
> and cpu destination) of IRTE and flushing the hardware cache.

A couple of things. I believe MSI edge triggered irqs can always be migrated
safely outside of irq context.

Flushing the hardware cache sounds like it will flush the irqs towards the cpu.
How do we flush the inflight irqs flushed to the apic. Does a register read work?

For MSI irqs going directly to the cpu it should, as long as the cpu and local
apic count as the same device from the perspective of pci ordering rules.


> Index: tree-x86/arch/x86/kernel/io_apic_64.c
> ===================================================================
> --- tree-x86.orig/arch/x86/kernel/io_apic_64.c 2008-07-10 09:52:31.000000000
> -0700
> +++ tree-x86/arch/x86/kernel/io_apic_64.c 2008-07-10 09:52:34.000000000 -0700
> @@ -2289,6 +2289,9 @@
>
> dynamic_irq_cleanup(irq);
>
> +#ifdef CONFIG_INTR_REMAP
> + free_irte(irq);
> +#endif
> spin_lock_irqsave(&vector_lock, flags);
> __clear_irq_vector(irq);
> spin_unlock_irqrestore(&vector_lock, flags);
> @@ -2307,11 +2310,42 @@
>
> tmp = TARGET_CPUS;
> err = assign_irq_vector(irq, tmp);
> - if (!err) {
> - cpus_and(tmp, cfg->domain, tmp);
> - dest = cpu_mask_to_apicid(tmp);
> + if (err)
> + return err;
> +
> + cpus_and(tmp, cfg->domain, tmp);
> + dest = cpu_mask_to_apicid(tmp);

Can we simplify this a little. In particular have a function

struct IOAPIC_ROUTE_entry x86_map_irq(irq, mask);

Where x86_map_irq would ultimately figure out the path to the cpu.
In the simple case it would just call assign_irq_vector();
When irqs are remapped it would perform the additional
map_irq_to_irte_handle();
modify_irte(irq, &irte);

And then have the generic msi code and the ioapic code.
Map from the struct IOAPIC_ROUTE_entry or whatever to the appropriate bits for the hardware
they control.

That should allows us a lot more flexibility going forward with less code then is in your
patches.

> +#ifdef CONFIG_INTR_REMAP
> + if (irq_remapped(irq)) {
> + struct irte irte;
> + int ir_index;
> + u16 sub_handle;
> +
> + ir_index = map_irq_to_irte_handle(irq, &sub_handle);
> + BUG_ON(ir_index == -1);
> +
> + memset (&irte, 0, sizeof(irte));
> +
> + irte.present = 1;
> + irte.dst_mode = INT_DEST_MODE;
> + irte.trigger_mode = 0; /* edge */
> + irte.dlvry_mode = INT_DELIVERY_MODE;
> + irte.vector = cfg->vector;
> + irte.dest_id = IRTE_DEST(dest);
> +
> + modify_irte(irq, &irte);
>
> msg->address_hi = MSI_ADDR_BASE_HI;
> + msg->data = sub_handle;
> + msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
> + MSI_ADDR_IR_SHV |
> + MSI_ADDR_IR_INDEX1(ir_index) |
> + MSI_ADDR_IR_INDEX2(ir_index);
> + } else
> +#endif
> + {
> + msg->address_hi = MSI_ADDR_BASE_HI;
> msg->address_lo =
> MSI_ADDR_BASE_LO |
> ((INT_DEST_MODE == 0) ?
> @@ -2361,6 +2395,55 @@
> write_msi_msg(irq, &msg);
> irq_desc[irq].affinity = mask;
> }
> +
> +#ifdef CONFIG_INTR_REMAP
> +/*
> + * Migrate the MSI irq to another cpumask. This migration is
> + * done in the process context using interrupt-remapping hardware.
> + */
> +static void ir_set_msi_irq_affinity(unsigned int irq, cpumask_t mask)
> +{
> + struct irq_cfg *cfg = irq_cfg + irq;
> + unsigned int dest;
> + cpumask_t tmp, cleanup_mask;
> + struct irte irte;
> +
> + cpus_and(tmp, mask, cpu_online_map);
> + if (cpus_empty(tmp))
> + return;
> +
> + if (get_irte(irq, &irte))
> + return;
> +
> + if (assign_irq_vector(irq, mask))
> + return;
> +
> + cpus_and(tmp, cfg->domain, mask);
> + dest = cpu_mask_to_apicid(tmp);
> +
> + irte.vector = cfg->vector;
> + irte.dest_id = IRTE_DEST(dest);
> +
> + /*
> + * atomically update the IRTE with the new destination and vector.
> + */
> + modify_irte(irq, &irte);
> +
> + /*
> + * After this point, all the interrupts will start arriving
> + * at the new destination. So, time to cleanup the previous
> + * vector allocation.
> + */
> + if (cfg->move_in_progress) {
> + cpus_and(cleanup_mask, cfg->old_domain, cpu_online_map);
> + cfg->move_cleanup_count = cpus_weight(cleanup_mask);
> + send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
> + cfg->move_in_progress = 0;
> + }
> +
> + irq_desc[irq].affinity = mask;
> +}
> +#endif
> #endif /* CONFIG_SMP */
>
> /*
> @@ -2378,26 +2461,157 @@
> .retrigger = ioapic_retrigger_irq,
> };
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/