Re: [Regression] "irqdomain: Don't set type when mapping an IRQ" breaks nexus7 gpio buttons

From: John Stultz
Date: Tue Aug 09 2016 - 00:25:31 EST


On Mon, Aug 8, 2016 at 2:31 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>
> On 06/08/16 00:45, John Stultz wrote:
>> On Mon, Aug 1, 2016 at 3:26 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>>> Hi John,
>>>
>>> On 30/07/16 05:39, John Stultz wrote:
>>>> Hey Jon,
>>>> So after rebasing my nexus7 patch stack onto pre-4.8-rc1 tree, I
>>>> noticed the power/volume buttons stopped working.
>>>>
>>>> I did a manual rebased bisection and chased it down to your commit
>>>> 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ").
>>>>
>>>> Reverting that patch makes things work again, so I wanted to see if
>>>> there was any debugging info I could provide to try to help narrow
>>>> down the problem here. (Sorry, I'd tinker myself with it some and try
>>>> to debug the issue, but after burning my friday night on this, I'm
>>>> eager to get away from the keyboard for the weekend).
>>>
>>> Before this commit bad IRQ type settings in device-tree were not getting
>>> reported and so failures to set the IRQ type were going unnoticed. It's
>>> most likely a bad IRQ type settings somewhere.
>>>
>>> As Thomas mentioned hopefully dmesg will shed a bit more light.
>>>
>>> Otherwise it can be worth looking at the ->irq_set_type() function for
>>> the irqchips in the path of the interrupt requested to see if any are
>>> failing. Looking at the nexus7 (assuming qcom variant), it looks like
>>> there are 3 irqchips in the path (pm8921 --> apq8064-pinctrl --> gic).
>>> The pm8xxx_irq_set_type() could return a failure when setting up the IRQ
>>> type and could be worth checking. It does not look like the set_type for
>>> the apq8064-pinctrl should ever fail (apart from calling BUG() which
>>> would be obvious). The gic can also return a failure for setting the
>>> type, but I did not see anything at first glance that looks incorrect in
>>> the dts.
>>>
>>> If we can narrow down irqchip, then hopefully it will be clearer.
>>
>> The pm_8xxx_irq_set_type doesn't seem to be failing as far as I can see..
>>
>> Looking at the patch that seems to cause the trouble, I narrowed it
>> down to just the following chunk:
>>
>> @@ -614,7 +615,11 @@ unsigned int irq_create_fwspec_mapping(struct
>> irq_fwspec *fwspec)
>> * it now and return the interrupt number.
>> */
>> if (irq_get_trigger_type(virq) == IRQ_TYPE_NONE) {
>> - irq_set_irq_type(virq, type);
>> + irq_data = irq_get_irq_data(virq);
>> + if (!irq_data)
>> + return 0;
>> +
>> + irqd_set_trigger_type(irq_data, type);
>> return virq;
>> }
>>
>> If I revert just that, it works again.
>>
>> I was worried we were hitting an early failure from !irq_data, but it
>> seems there's some subtle difference between irqd_set_trigger_type and
>> irq_set_type that makes the former break for me.
>
> Thanks this is good info and at the same time odd.
>
> I am guessing that it is failing above because the irq_data is not found
> for the irq?

So actually no. We usually call irqd_set_trigger_type() but something
still doesn't work.

Interestingly, just adding irq_set_irq_type(virq, type); to the top of
that block (leaving the rest of the code) also works.

> What is odd, is that the above sequence is only executed if a irq
> mapping exists and so really, AFAICT this should not happen. Ie. the irq
> descriptor should have been allocated for the mapping to exist. We
> should probably warn if this happens.
>
> Without reverting the above, can you add a print to show the
> domain->name, hwirq and virq information if !irq_data? That will confirm
> the domain for us.

So I put some printk info in (in either case since I'm never seeing
the !irq_data case happen):

[ 1.514217] JDB: virq: 93 hwirq: 74 domain name: msmgpio
[ 1.838342] JDB: virq: 25 hwirq: 6 domain name: msmgpio

Which is odd, looking at:

shell@flo:/ $ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
16: 1159 1138 1332 1574 GIC-0 18 Edge
gp_timer
25: 0 0 0 0 msmgpio 6 Edge
ekth3500
111: 6 0 0 0 GIC-0 51 Edge
qcom_rpm_ack
112: 0 0 0 0 GIC-0 53 Edge
qcom_rpm_err
113: 0 0 0 0 GIC-0 54 Edge
qcom_rpm_wakeup
114: 48 0 0 0 GIC-0 132 Edge
msm_otg, ci_hdrc_msm
115: 796 0 0 0 GIC-0 130 Level bam_dma
116: 0 0 0 0 GIC-0 128 Level bam_dma
117: 0 0 0 0 GIC-0 127 Level bam_dma
118: 2627 0 0 0 GIC-0 136 Level
mmci-pl18x (cmd)
119: 54 0 0 0 GIC-0 226 Level i2c_qup
120: 21 0 0 0 GIC-0 183 Level i2c_qup
122: 0 0 0 0 GIC-0 189 Level i2c_qup
123: 202 0 0 0 GIC-0 190 Level
msm_serial0
124: 0 0 0 0 GIC-0 70 Edge smsm
125: 0 0 0 0 GIC-0 121 Edge smsm
126: 0 0 0 0 GIC-0 236 Edge smsm
127: 0 0 0 0 GIC-0 169 Edge smsm
131: 0 0 0 0 pm8xxx 195 Edge
Volume Up
165: 0 0 0 0 pm8xxx 229 Edge
Volume Down
184: 0 0 0 0 pm8xxx 39 Edge
pm8xxx_rtc_alarm
185: 0 0 0 0 pm8xxx 50 Edge
pmic8xxx_pwrkey_release
186: 0 0 0 0 pm8xxx 51 Edge
pmic8xxx_pwrkey_press
IPI0: 0 1 1 1 CPU wakeup interrupts
IPI1: 0 0 0 0 Timer broadcast interrupts
IPI2: 944 539 1015 529 Rescheduling interrupts
IPI3: 1 4 6 4 Function call interrupts
IPI4: 0 0 0 0 CPU stop interrupts
IPI5: 0 0 0 0 IRQ work interrupts
IPI6: 0 0 0 0 completion interrupts
Err: 0

Since 25 maps to the ekth3500 (touch panel, which is still working
fine), but 93/74 doesn't seem to map to anything, and the problematic
irqs are the volume keys 195/229 and power keys 50/51.

thanks
-john