Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "active" device IRQ interrruption

From: Eric W. Biederman
Date: Thu Jun 04 2009 - 19:16:28 EST


Gary Hade <garyhade@xxxxxxxxxx> writes:

> On Wed, Jun 03, 2009 at 02:13:23PM -0700, Eric W. Biederman wrote:
>> Gary Hade <garyhade@xxxxxxxxxx> writes:
>>
>> > Correct, after the fix was applied my testing did _not_ show
>> > the lockups that you are referring to. I wonder if there is a
>> > chance that the root cause of those old failures and the root
>> > cause of issue that my fix addresses are the same?
>> >
>> > Can you provide the test case that demonstrated the old failure
>> > cases so I can try it on our systems? Also, do you recall what
>> > mainline version demonstrated the old failure
>>
>> The irq migration has already been moved to interrupt context by the
>> time I started working on it. And I managed to verify that there were
>> indeed problems with moving it out of interrupt context before my code
>> merged.
>>
>> So if you want to reproduce it reduce your irq migration to the essentials.
>> Set IRQ_MOVE_PCNTXT, and always migrate the irqs from process context
>> immediately.
>>
>> Then migrate an irq that fires at a high rate rapidly from one cpu to
>> another.
>>
>> Right now you are insulated from most of the failures because you still
>> don't have IRQ_MOVE_PCNTXT. So you are only really testing your new code
>> in the cpu hotunplug path.
>
> OK, I'm confused.
>
> It sounds like you want me force IRQ_MOVE_PCNTXT so that I can
> test in a configuration that you say is already broken. Why
> in the heck would this config, where you expect lockups without
> the fix, be a productive environment in which to test the fix?

Because that is what the brokenness of fixup_irqs does. It
effectively forces IRQ_MOVE_PCNTXT on code that has not been written
to deal with it and the failures are because of that forcing.

>> Now that I look at it in more detail you are doing a double
>> mask_IO_APIC_irq and unmask_IO_APIC_irq on the fast path
>
> Good question. I had based my changes on the previous
> CONFIG_INTR_REMAP IRQ migration delay code that was doing
> the same thing. I had assumed that the CONFIG_INTR_REMAP
> code was correct and that the nesting of
> mask_IO_APIC_irq_desc/unmask_IO_APIC_irq_desc sequences
> was OK. Now that I look at how mask_IO_APIC_irq_desc and
> unmask_IO_APIC_irq_desc are implemented I see that there
> is e.g. no reference counting to assure that only the last
> caller of unmask_IO_APIC_irq_desc wins. This reduces the
> range of code that is executed with the IRQ masked.
>
> Do you see any reason why the IRQ needs to be masked
> downstream of the unmask_IO_APIC_irq_desc call that the
> patch adds?

No. However I do see reason why the irq needs to be masked
upstream of where the mask call your patch adds.

>> and duplicating the pending irq check.
>
> Yes, I was aware of this and had considered removing
> the current check but I was trying to minimize the changes
> as much as possible, especially those changes affecting
> the normal IRQ migration path.
>
> Removal of the current check would simplify the code
> and would also eliminate the failed affinity change
> requests from user level that can result from the
> current check.
>
> Do you think it would be a good idea to remove this
> extra check? If so, I would prefer to do it with a
> separate patch so that I can do a good job of testing it.

Well I tell you what.

fixup_irqs already has a bunch of heuristics to make the code
mostly work when migrating irqs from process context.

If you can limit your changes to that path I will be sad because
we have not completely fixed the problem. But at least we
are working on something that has a shot.

Which means you won't destabilize the existing code.
By modifying it's set_affinity methods.

Something like:

static void fixup_irq_affinity(unsigned int irq, struct irq_desc *desc)
{
const struct cpumask *affinity;
int break_affinity = 0;
int set_affinity = 1;

/* interrupt's are disabled at this point */
spin_lock(&desc->lock);

affinity = desc->affinity;
if (!irq_has_action(irq) ||
cpumask_equal(affinity, cpu_online_mask)) {
spin_unlock(&desc->lock);
return;
}

if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
break_affinity = 1;
affinity = cpu_all_mask;
}


if (desc->status & IRQ_MOVE_PCNTXT) {
spin_unlock(&desc->lock);
if (irq_set_affinity(irq, affinity) != 0)
set_affinity = 0;
} else {
if (desc->chip->mask)
desc->chip->mask(irq);

if (desc->chip->set_affinity) {
/* Any additional weird stabilitiy hacks can go here */
desc->chip->set_affinity(irq, affinity);
} else
set_affinity = 0;

if (desc->chip->unmask)
desc->chip->unmask(irq);

spin_unlock(&desc->lock);
}
if (break_affinity && set_affinity)
printk("Broke affinity for irq %i\n", irq);
else if (!set_affinity)
printk("Cannot set affinity for irq %i\n", irq);

/* Do something equivalent of sending the cleanup ipi here */
}

/* A cpu has been removed from cpu_online_mask. Reset irq affinities. */
void fixup_irqs(void)
{
unsigned int irq;
struct irq_desc *desc;

for_each_irq_desc(irq, desc) {
if (!desc)
continue;
if (irq == 2)
continue;

fixup_irq_affinity(irq, desc);
}
}

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/