Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent racecondition)

From: Alan Jenkins
Date: Sat Jul 19 2008 - 16:41:31 EST

Alexey Starikovskiy wrote:
> Alan,

Thanks for your robust response. Sorry for not discussing this more in
the first place. I wanted to write the code that fixes my system first,
and post it as soon as possible. I think I missed a flag to say "this
post is not an immediate request for submission".

> I don't think this is very good idea -- now every interrupt will add a
> new entry to workqueue,
> and given the rate of this interrupt, it could be become quite long.

I confess I gave up too quickly on the synchronisation issues and
resorted to brute force. Patch #1 is preventing a narrow race:

- acpi_ec_write_cmd() (QUERY)
- EC writes 0 (no event) to input buffer
- new event arrives, triggering a GPE interrupt which is ignored because
- QUERY_PENDING is cleared (but too late).

Now we ignored an event, which will be delayed until the next GPE interrupt.

The original reason for patch #1 was to stop patch #2 driving me
insane. If you apply the second patch on it's own, you break
EC_FLAGS_QUERY_PENDING. It gets cleared as soon as the _first_ query
command is submitted. It works, but it means the flag no longer has a
clearly defined meaning. I tried to fix that by only clearing it once
the loop has finished - and then realised I was widening a pre-existing
race condition.

As you say, I took the easy way out and pushed it all into the
workqueue. So the workqueue ends up as a buffer of GPE occurrences. I
did look at reducing the memory usage while avoiding race conditions,
but I couldn't find a reasonable solution. But I looked at it again now
and I have a better solution:

/* moved here from acpi_ec_transaction_unlocked() */
clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);

while (acpi_ec_query(ec, &value) == 0)
acpi_ec_gpe_run_handler(ec, value);

The PENDING flag then reflects whether or not there's an item on the
workqueue. Does that make sense?

> Your second patch is redundant if you add queue entry for each
> interrupt, and as it does not
> require any investments into memory, I like it more.
It's not quite redundant with the first patch. We still have GPE
polling mode - patch #1 doesn't affect that. In polling mode, it's
essential to query all the pending events - otherwise, if they arrive
more frequently than the polling interval then you will inevitably drop

Patch #2 is also required to fix my buggy hardware. My laptop's EC
buffers multiple events, but clears SCI_EVT after every query. This
caused problems in polling mode; with multiple events between polling
intervals only one gets queried - and after a while the buffer overflows
and it breaks completely.
> Also, it is very cool to rip of things you don't care for, but that
> essentially reverts a fix done for 9998,
> so at least, you should ask these people if you broke their setups.
I assume you're referring to the "would benefit from wider testing"
patch #3. Thanks for identifying the bugzilla entry - I had difficulty
separating the different entries on GPEs. I'm optimistic that we can
fix all these crazy buffering EC's without having to disable GPE interrupts.

The reason I like my proposed fix is that it makes the code simple
enough that I can understand it, without making any assumptions about
how many interrupts arrive per GPE. The EC can be broken in lots of
ways, so long as:

1. We receive interrupts when one or more GPE's are pending.
2. We don't get a constant interrupt storm.

I don't think I missed anything. Is there anything else I should check
before I try to get testing?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at