RE: [PATCH 2/5] hv: add helpers to handle hv_util device state

From: KY Srinivasan
Date: Mon Sep 21 2015 - 13:03:58 EST




> -----Original Message-----
> From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> Sent: Monday, September 21, 2015 9:44 AM
> To: KY Srinivasan <kys@xxxxxxxxxxxxx>
> Cc: Olaf Hering <olaf@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> devel@xxxxxxxxxxxxxxxxxxxxxx; apw@xxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
> jasowang@xxxxxxxxxx
> Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
>
> On Mon, Sep 21, 2015 at 04:34:56PM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Olaf Hering [mailto:olaf@xxxxxxxxx]
> > > Sent: Monday, September 21, 2015 3:26 AM
> > > To: KY Srinivasan <kys@xxxxxxxxxxxxx>; Greg KH
> > > <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx;
> > > apw@xxxxxxxxxxxxx; vkuznets@xxxxxxxxxx; jasowang@xxxxxxxxxx
> > > Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
> > >
> > > On Sun, Sep 20, Greg KH wrote:
> > >
> > > > Just use a lock, that's what it is there for.
> > >
> > > How would that help? It might help because it enforces ordering. But
> > > that requires that all three utils get refactored to deal with the
> > > introduced locking. I will let KY comment on this.
> > >
> > > The issue I see with fcopy is that after or while fcopy_respond_to_host
> > > runs an interrupt triggers which also calls into
> > > hv_fcopy_onchannelcallback. It was most likely caused by a logic change
> > > in "recent" vmbus updates because this did not happen before. At least,
> > > the fcopy hang was not seen earler. Maybe the bug did just not trigger
> > > up to now for other reasons...
> >
> > All util channels are bound to CPU 0. Just forcing all activity on CPU 0 may be
> the
> > simplest solution here. Besides, these are not performance critical services
> anyway.
> >
> > The problem you may have run into could be related to the fact that we
> could potentially
> > run the polling function on a CPU other than CPU 0.
>
> Again, this sounds like a locking issue, you have multiple
> threads/processes accessing the same data. Even if you bind it all to
> one cpu, this shows a real design problem.
>
> Use a lock to fix this properly. That way, when you stop using only one
> CPU, the code will "just work", and if you are really only on one CPU
> today, there will not be any lock contention.
>
> thanks,

Thanks Greg; will do.

Regards,

K. Y
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/