Re: [PATCH] CIFS: Decrease reconnection delay when switching nics

From: Stefan (metze) Metzmacher
Date: Wed Feb 27 2013 - 19:17:49 EST

Am 27.02.2013 23:44, schrieb Dave Chiluk:
> On 02/27/2013 04:40 PM, Steve French wrote:
>> On Wed, Feb 27, 2013 at 4:24 PM, Dave Chiluk <dave.chiluk@xxxxxxxxxxxxx> wrote:
>>> On 02/27/2013 10:34 AM, Jeff Layton wrote:
>>>> On Wed, 27 Feb 2013 12:06:14 +0100
>>>> "Stefan (metze) Metzmacher" <metze@xxxxxxxxx> wrote:
>>>>> Hi Dave,
>>>>>> When messages are currently in queue awaiting a response, decrease amount of
>>>>>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The current
>>>>>> wait time before attempting to reconnect is currently 2*SMB_ECHO_INTERVAL(120
>>>>>> seconds) since the last response was recieved. This does not take into account
>>>>>> the fact that messages waiting for a response should be serviced within a
>>>>>> reasonable round trip time.
>>>>> Wouldn't that mean that the client will disconnect a good connection,
>>>>> if the server doesn't response within 10 seconds?
>>>>> Reads and Writes can take longer than 10 seconds...
>>>> Where does this magic value of 10s come from? Note that a slow server
>>>> can take *minutes* to respond to writes that are long past the EOF.
>>> It comes from the desire to decrease the reconnection delay to something
>>> better than a random number between 60 and 120 seconds. I am not
>>> committed to this number, and it is open for discussion. Additionally
>>> if you look closely at the logic it's not 10 seconds per request, but
>>> actually when requests have been in flight for more than 10 seconds make
>>> sure we've heard from the server in the last 10 seconds.
>>> Can you explain more fully your use case of writes that are long past
>>> the EOF? Perhaps with a test-case or script that I can test? As far as
>>> I know writes long past EOF will just result in a sparse file, and
>>> return in a reasonable round trip time *(that's at least what I'm seeing
>>> with my testing). dd if=/dev/zero of=/mnt/cifs/a bs=1M count=100
>>> seek=100000, starts receiving responses from the server in about .05
>>> seconds with subsequent responses following at roughly .002-.01 second
>>> intervals. This is well within my 10 second value.
>> Note that not all Linux file systems support sparse files and
>> certainly there are cifs servers running on operating systems other
>> than Linux which have popular file systems which don't support sparse
>> files (e.g. FAT32 but there are many others) - in any case, writes
>> after end of file can take a LONG time if sparse files are not
>> supported and I don't know a good way for the client to know that
>> attribute of the server file system ahead of time (although we could
>> attempt to set the sparse flag, servers can and do lie)
> It doesn't matter how long it takes for the entire operation to
> complete, just so long as the server acks something in less than 10
> seconds. Now the question becomes, is there an OS out there that
> doesn't ack the request or doesn't ack the progress regularly.

This kind of ack can only be at the tcp layer not at the smb layer.


Attachment: signature.asc
Description: OpenPGP digital signature