Re: [RFC PATCH v11 4/5] PCI / PM: Add support for the PCIe WAKE# signal for OF

From: Rafael J. Wysocki
Date: Thu Dec 28 2017 - 12:30:32 EST


On Thursday, December 28, 2017 5:51:34 PM CET Tony Lindgren wrote:
> * Rafael J. Wysocki <rafael@xxxxxxxxxx> [171228 12:21]:
> > On Thu, Dec 28, 2017 at 5:22 AM, Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> > > * Rafael J. Wysocki <rafael@xxxxxxxxxx> [171228 00:51]:
> > >> On Wed, Dec 27, 2017 at 4:08 PM, Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> > >> > * Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> [171227 01:00]:
> > >> >> On Tuesday, December 26, 2017 2:06:47 AM CET JeffyChen wrote:
> > >> >> > Hi Rafael,
> > >> >> >
> > >> >> > Thanks for your reply :)
> > >> >> >
> > >> >> > On 12/26/2017 08:11 AM, Rafael J. Wysocki wrote:
> > >> >> > >> >+
> > >> >> > >> >+ dn = pci_device_to_OF_node(ppdev);
> > >> >> > >> >+ if (!dn)
> > >> >> > >> >+ return 0;
> > >> >> > >> >+
> > >> >> > >> >+ irq = of_irq_get_byname(dn, "wakeup");
> > >> >> > > Why is this a property of the bridge and not of the device itself?
> > >> >> >
> > >> >> > That is suggested by Brian, because in that way, the wakeup pin would
> > >> >> > not "tied to what exact device is installed (or no device, if it's a slot)."
> > >> >>
> > >> >> But I don't think it works when there are two devices using different WAKE#
> > >> >> interrupt lines under the same bridge. Or how does it work then?
> > >> >
> > >> > It won't work currently for multiple devices but adding more than
> > >> > one wakeriq per device is doable. And I think we will have other
> > >> > cases where multiple wakeirqs are connected to a single device, so
> > >> > that issue should be sorted out sooner or later.
> > >> >
> > >> > And if requesting wakeirq for the PCI WAKE# lines at the PCI
> > >> > controller does the job, then maybe that's all we need to start with.
> > >>
> > >> These are expected to be out-of-band, so not having anything to do
> > >> with the Root Complex.
> > >>
> > >> In-band PME Messages go through the PCIe hierarchy, but that is a
> > >> standard mechanism and it is supported already.
> > >>
> > >> WAKE# are platform-specific, pretty much by definition and I guess
> > >> that on most ARM boards they are just going to be some kind of GPIO
> > >> pins.
> > >
> > > OK. So probably supporting the following two configurations
> > > should be enough then:
> > >
> > > 1. One or more WAKE# lines configured as a wakeirq for the PCI
> > > controller
> > >
> > > When the wakeirq calls pm_wakeup_event() for the PCI controller
> > > device driver, the PCI controller wakes up and can deal with
> > > it's child devices
> >
> > But this shouldn't be necessary at all. Or if it is, I wonder why
> > that's the case.
>
> Well Brian had a concern where we would have to implement PM runtime
> for all device drivers for PCI devices.

Why would we?

> > I'm assuming that we're talking about PCI Express here, which has two
> > wakeup mechanisms defined, one of which is based on using PME Messages
> > (Beacon) and the second one is WAKE#:
> >
> > "The WAKE# mechanism uses sideband signaling to implement wakeup
> > functionality. WAKE# is
> > an âopen drainâ signal asserted by components requesting wakeup and
> > observed by the associated
> > power controller."
> >
> > (from PCIe Base Spec 3.0). [And there's a diagram showing the routing
> > of WAKE# in two cases in Figure 5-4: Conceptual Diagrams Showing Two
> > Example Cases of WAKE# Routing.]
>
> Thanks for the pointer, I had not seen that :) So the use cases
> I was trying to describe above are similar to the wiring in the
> PCIe Base Spec 3.0 "Figure 5-4" , but numbered the other way around.
>
> > Note that WAKE# is defined to be "observed by the associated power
> > controller", so I'm not sure what the PCI controller's role in the
> > handing of it is at all.
>
> To me it seems the "switch" part stays at least partially powered
> and then in-band PME messages are used after host is woken up to
> figure out which WAKE# triggered?

Yes, that's my understanding too.

> > > 2. Optionally a WAKE# line from a PCI device configured as wakeirq
> > > for the PCI device driver
> > >
> > > In this case calling the PM runtime resume in the child
> > > PCI device will also wake up the parent PCI controller,
> > > and then the PCI controller can deal with it's children
> > >
> > > Seems like this series is pretty close to 1 above except
> > > we need to have a list of wakeirqs per device instead of
> > > just one. And option 2 should already work as long as the
> > > PCI device driver parses and configures the wakeirq.
> >
> > Well, this is confusing, because as I said above, option 1 doesn't
> > look relevant even.
>
> So isn't my option 1 above similar to the PCIe spec "Figure 5-4"
> case 2?

No, it isn't, because in that case there is no practical difference
between WAKE# and an in-band PME message sent by the device (Beacon)
from the software perspective.

WAKE# causes the switch to send a PME message upstream and that is
handled by the Root Complex through the standard mechanism already
supported by our existing PME driver (drivers/pci/pcie/pme.c).

> Anyways, let's standardize on the "Figure 5-4" naming
> from now to avoid confusion :)

OK

> > >> > Then in addition to that, we could do the following to allow
> > >> > PCI devices to request the wakeirq from the PCI controller:
> > >> >
> > >> > 1. PCI controller or framework implements a chained irq for
> > >> > the WAKE# lines assuming it can mask/unmask the WAKE# lines
> > >> >
> > >> > 2. PCI devices then can just request the wakeirq from the PCI
> > >> > controller
> > >> >
> > >> > And that's about it. Optionally we could leave out the dependency
> > >> > to having PCI devices implement PM runtime and just resume the
> > >> > parent (PCI controller) if PCI devices has not implemented
> > >> > PM runtime.
> > >>
> > >> So if my understanding is correct, DT should give you the WAKE# IRQ
> > >> for the given endpoint PCI device and you only are expected to request
> > >> it. The rest should just follow from the other pieces of information
> > >> in the DT.
> > >
> > > Yeah and it seems that we should allow configuring both cases
> > > 1 and 2 above.
> > >
> > >> With the quite obvious caveat that the same IRQ may be used as WAKE#
> > >> for multiple endpoint devices (which BTW need not be under the same
> > >> bridge even).
> > >
> > > And with the shared interrupts we can't do the masking/unmasking
> > > automatically..
> >
> > Or we need to use reference counting (so actually the wakeup IRQs are
> > not dedicated).
>
> Yeah. FYI, for the dedicated wakeirq cases I have, we need to keep
> them masked during runtime to avoid tons of interrupts as they
> are often wired to the RX pins.

OK

BTW, enable_irq_wake() should take care of the sharing, shouldn't it?

But the WAKE# thing is not just for waking up the system from sleep states,
it is for runtime PM's wakeup signaling too.

> > >> >> > >> >+ if (irq == -EPROBE_DEFER)
> > >> >> > > Braces here, please.
> > >> >> > ok, will fix in the next version.
> > >> >> >
> > >> >> > >
> > >> >> > >> >+ return irq;
> > >> >> > >> >+ /* Ignore other errors, since a missing wakeup is non-fatal. */
> > >> >> > >> >+ else if (irq < 0) {
> > >> >> > >> >+ dev_info(&pdev->dev, "cannot get wakeup interrupt: %d\n", irq);
> > >> >> > >> >+ return 0;
> > >> >> > >> >+ }
> > >> >> > >> >+
> > >> >> > >> >+ device_init_wakeup(&pdev->dev, true);
> > >> >> > > Why do you call this before dev_pm_set_dedicated_wake_irq()?
> > >> >> >
> > >> >> > hmmm, i thought so too, but it turns out the dedicated wake irq
> > >> >> > framework requires device_init_wakeup(dev, true) before attach the wake irq:
> > >> >> >
> > >> >> > int device_wakeup_attach_irq(struct device *dev,
> > >> >> > struct wake_irq *wakeirq)
> > >> >> > {
> > >> >> > struct wakeup_source *ws;
> > >> >> >
> > >> >> > ws = dev->power.wakeup;
> > >> >> > if (!ws) {
> > >> >> > dev_err(dev, "forgot to call device_init_wakeup?\n");
> > >> >> > return -EINVAL;
> > >> >> >
> > >> >>
> > >> >> Well, that's a framework issue, fair enough.
> > >> >>
> > >> >> That said, what if user space removes the wakeup source from under you
> > >> >> concurrently via sysfs? Tony?
> > >> >
> > >> > Hmm sounds racy, need to take a look.
> > >>
> > >> Not only racy, as I don't see anything to prevent user space from
> > >> making the dev->power.wakeup wakeup source go away via sysfs at any
> > >> time *after* the IRQ has been requested.
> > >
> > > Currently nothing happens with wakeirqs if there's no struct
> > > wakeup_source. On device_wakeup_enable() we call device_wakeup_attach()
> > > that just copies dev->power.wakeirq to ws->wakeirq. And when struct
> > > wake_source is freed the device should be active and wakeirq
> > > disabled. Or are you seeing other issues here?
> >
> > I'm suspicious about one thing, but I need to look deeper into the code. :-)

So we are fine except for the race and we need the wakeirq field in wakeup
sources to automatically arm the wakeup IRQs during suspend.

If I'm not mistaken, we only need something like the patch below (untested).

> OK. My response time will be laggy this week in case you find
> something that needs urgent fixing :)

No worries. :-)

---
drivers/base/power/wakeirq.c | 9 ++++-----
drivers/base/power/wakeup.c | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/base/power/wakeirq.c
===================================================================
--- linux-pm.orig/drivers/base/power/wakeirq.c
+++ linux-pm/drivers/base/power/wakeirq.c
@@ -33,7 +33,6 @@ static int dev_pm_attach_wake_irq(struct
struct wake_irq *wirq)
{
unsigned long flags;
- int err;

if (!dev || !wirq)
return -EINVAL;
@@ -45,12 +44,12 @@ static int dev_pm_attach_wake_irq(struct
return -EEXIST;
}

- err = device_wakeup_attach_irq(dev, wirq);
- if (!err)
- dev->power.wakeirq = wirq;
+ dev->power.wakeirq = wirq;
+ if (dev->power.wakeup)
+ device_wakeup_attach_irq(dev, wirq);

spin_unlock_irqrestore(&dev->power.lock, flags);
- return err;
+ return 0;
}

/**
Index: linux-pm/drivers/base/power/wakeup.c
===================================================================
--- linux-pm.orig/drivers/base/power/wakeup.c
+++ linux-pm/drivers/base/power/wakeup.c
@@ -303,7 +303,7 @@ int device_wakeup_attach_irq(struct devi
}

if (ws->wakeirq)
- return -EEXIST;
+ dev_err(dev, "Leftover wakeup IRQ found, overriding\n");

ws->wakeirq = wakeirq;
return 0;