Re: [Freedreno] drm/msm: 'pp done time out' errors after async commit changes

From: Rob Clark
Date: Wed Nov 06 2019 - 11:59:15 EST


On Wed, Nov 6, 2019 at 8:47 AM Jeffrey Hugo <jeffrey.l.hugo@xxxxxxxxx> wrote:
>
> On Wed, Nov 6, 2019 at 9:30 AM Rob Clark <robdclark@xxxxxxxxx> wrote:
> >
> > On Wed, Nov 6, 2019 at 1:13 AM Brian Masney <masneyb@xxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Nov 05, 2019 at 08:23:27AM -0800, Rob Clark wrote:
> > > > On Tue, Nov 5, 2019 at 2:08 AM Brian Masney <masneyb@xxxxxxxxxxxxx> wrote:
> > > > > The 'pp done time out' errors go away if I revert the following three
> > > > > commits:
> > > > >
> > > > > cd6d923167b1 ("drm/msm/dpu: async commit support")
> > > > > d934a712c5e6 ("drm/msm: add atomic traces")
> > > > > 2d99ced787e3 ("drm/msm: async commit support")
> > > > >
> > > > > I reverted the first one to fix a compiler error, and the second one so
> > > > > that the last patch can be reverted without any merge conflicts.
> > > > >
> > > > > I see that crtc_flush() calls mdp5_ctl_commit(). I tried to use
> > > > > crtc_flush_all() in mdp5_flush_commit() and the contents of the frame
> > > > > buffer dance around the screen like its out of sync. I renamed
> > > > > crtc_flush_all() to mdp5_crtc_flush_all() and removed the static
> > > > > declaration. Here's the relevant part of what I tried:
> > > > >
> > > > > --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
> > > > > +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
> > > > > @@ -171,7 +171,15 @@ static void mdp5_prepare_commit(struct msm_kms *kms, struct drm_atomic_state *st
> > > > >
> > > > > static void mdp5_flush_commit(struct msm_kms *kms, unsigned crtc_mask)
> > > > > {
> > > > > - /* TODO */
> > > > > + struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(kms));
> > > > > + struct drm_crtc *crtc;
> > > > > +
> > > > > + for_each_crtc_mask(mdp5_kms->dev, crtc, crtc_mask) {
> > > > > + if (!crtc->state->active)
> > > > > + continue;
> > > > > +
> > > > > + mdp5_crtc_flush_all(crtc);
> > > > > + }
> > > > > }
> > > > >
> > > > > Any tips would be appreciated.
> > > >
> > > >
> > > > I think this is along the lines of what we need to enable async commit
> > > > for mdp5 (but also removing the flush from the atomic-commit path)..
> > > > the principle behind the async commit is to do all the atomic state
> > > > commit normally, but defer writing the flush bits. This way, if you
> > > > get another async update before the next vblank, you just apply it
> > > > immediately instead of waiting for vblank.
> > > >
> > > > But I guess you are on a command mode panel, if I remember? Which is
> > > > a case I didn't have a way to test. And I'm not entirely about how
> > > > kms_funcs->vsync_time() should be implemented for cmd mode panels.
> > >
> > > Yes, this is a command-mode panel and there's no hardware frame counter
> > > available. The key to getting the display working on this phone was this
> > > patch:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bab52af6fe68c43b327a57e5ce5fc10eefdfadf
> > >
> > > > That all said, I think we should first fix what is broken, before
> > > > worrying about extending async commit support to mdp5.. which
> > > > shouldn't hit the async==true path, due to not implementing
> > > > kms_funcs->vsync_time().
> > > >
> > > > What I think is going on is that, in the cmd mode case,
> > > > mdp5_wait_flush() (indirectly) calls mdp5_crtc_wait_for_pp_done(),
> > > > which waits for a pp-done irq regardless of whether there is a flush
> > > > in progress. Since there is no flush pending, the irq never comes.
> > > > But the expectation is that kms_funcs->wait_flush() returns
> > > > immediately if there is nothing to wait for.
> > >
> > > I don't think that's happening in this case. I added some pr_info()
> > > statements to request_pp_done_pending() and mdp5_crtc_pp_done_irq().
> > > Here's the first two sets of messages that appear in dmesg:
> > >
> > > [ 14.018907] msm fd900000.mdss: pp done time out, lm=0
> > > [ 14.018993] request_pp_done_pending: HERE
> > > [ 14.074208] mdp5_crtc_pp_done_irq: HERE
> > > [ 14.074368] Console: switching to colour frame buffer device 135x120
> > > [ 14.138938] msm fd900000.mdss: pp done time out, lm=0
> > > [ 14.139021] request_pp_done_pending: HERE
> > > [ 14.158097] mdp5_crtc_pp_done_irq: HERE
> > >
> > > The messages go on like this with the same pattern.
> > >
> > > I tried two different changes:
> > >
> > > 1) I moved the request_pp_done_pending() and corresponding if statement
> > > from mdp5_crtc_atomic_flush() and into mdp5_crtc_atomic_begin().
> > >
> > > 2) I increased the timeout in wait_for_completion_timeout() by several
> > > increments; all the way to 5 seconds.
> >
> > increasing the timeout won't help, because the pp-done irq has already
> > come at the point where we wait for it..
> >
> > maybe the easy thing is just add mdp5_crtc->needs_pp, set to true
> > before requesting, and false when we get the irq.. and then
> > mdp5_crtc_wait_for_pp_done() just returns if needs_pp==false..
>
> On the otherhand, what about trying to make command mode panels
> resemble video mode panels slightly? Video mode panels have a vsync
> counter in hardware, which is missing from command mode - however it
> seems like the driver/drm framework would prefer such a counter.
> Would it be a reasonable idea to make a software counter, and just
> increment it every time the pp_done irq is triggered?
>
> I'm just thinking that we'll avoid issues long term by trying to make
> the code common, rather than diverging it for the two modes.
>

*possibly*, but I think we want to account somehow periods where
display is not updated.

fwiw, it isn't that uncommon for userspace to use vblanks to "keep
time" (drive animations for desktop switch, window
maximize/unmaximize, etc).. it could be a surprise when "vblank" is
not periodic.

BR,
-R