Re: Kernel 3.4.X NFS server regression

From: Boaz Harrosh
Date: Mon Jun 11 2012 - 10:45:47 EST


On 06/11/2012 05:11 PM, Jeff Layton wrote:

> On Mon, 11 Jun 2012 17:05:28 +0300
> Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
>
>> On 06/11/2012 04:51 PM, Jeff Layton wrote:
>>
>>>
>>> That was considered here, but the problem with the usermode helper is
>>> that you can't pass anything back to the kernel but a simple status
>>> code (and that's assuming that you wait for it to exit). In the near
>>> future, we'll need to pass back more info to the kernel for this, so
>>> the usermode helper callout wasn't suitable.
>>>
>>
>>
>> I have answered that in my mail. Repeated here again. Well you made
>> a simple mistake. Because it is *easy* to pass back any number and
>> size of information from user-mode.
>>
>> You just setup a sysfs entry points where the answers are written
>> back to. It's an easy trick to setup a thread safe, way with a
>> cookie but 90% of the time you don't have to. Say you set up
>> a structure of per-client (identified uniquely) then user mode
>> answers back per client, concurrency will not do any harm, since
>> you answer to the same question the same answer. ans so on. Each
>> problem it's own.
>>
>> If you want we can talk about this, it would be easy for me to setup
>> a toll free conference number we can all use.
>
> That helpful advice would have been welcome about 3-4 months ago when I
> first proposed this in detail. At that point you're working with
> multiple upcall/downcall mechanisms, which was something I was keen to
> avoid.
>
> I'm not opposed to moving in that direction, but it basically means
> you're going to rip out everything I've got here so far and replace it.
>
> If you're willing to do that work, I'll be happy to work with you on
> it, but I don't have the time or inclination to do that on my own right
> now.
>


No such luck. sorry. I wish I could, but coming from a competing server
company, you can imagine the priority of that ever happening.
(Even though I use the Linux-Server everyday for my development and
am putting lots of efforts into still, mainly in pnfs)

Hopefully re-examining the code, it could all be salvaged just the
same, only lots of code thrown a way.

But mean-while please address my concern below:
Boaz Harrosh wrote:

> One more thing, the most important one. We have already fixed that in the
> past and I was hoping the lesson was learned. Apparently it was not, and
> we are doomed to do this mistake for ever!!
>
> What ever crap fails times out and crashes, in the recovery code, we don't
> give a dam. It should never affect any Server-client communication.
>
> When the grace periods ends the clients gates opens period. *Any* error
> return from state recovery code must be carefully ignored and normal
> operations resumed. At most on error, we move into a mode where any
> recovery request from client is accepted, since we don't have any better
> data to verify it.
>
> Please comb recovery code to make sure any catastrophe is safely ignored.
> We already did that before and it used to work.


We should make sure that any state recovery code does not interfere with
regular operations. and fails gracefully / shuts up.

We used to have that, apparently it re-broke. Clients should always be granted
access, after grace period. And Server should be made sure not to fail in any
situation.

I would look into it but I'm not uptodate anymore, I wish you or Bruce could.

Thanks for your work so far, sorry to be bearer of bad news
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/