Re: [PATCH 38/40] cifs: use workqueue instead of slow-work

From: Tejun Heo
Date: Tue Jan 19 2010 - 19:10:24 EST


Hello,

On 01/19/2010 09:20 PM, Jeff Layton wrote:
>> @@ -584,13 +583,13 @@ is_valid_oplock_break(struct smb_hdr *buf, struct TCP_Server_Info *srv)
>> pCifsInode->clientCanCacheAll = false;
>> if (pSMB->OplockLevel == 0)
>> pCifsInode->clientCanCacheRead = false;
>> - rc = slow_work_enqueue(&netfile->oplock_break);
>> - if (rc) {
>> - cERROR(1, ("failed to enqueue oplock "
>> - "break: %d\n", rc));
>> - } else {
>> - netfile->oplock_break_cancelled = false;
>> - }
>> +
>> + cifs_oplock_break_get(netfile);
>> + if (!queue_work(system_single_wq,
>> + &netfile->oplock_break))
>> + cifs_oplock_break_put(netfile);
>> + netfile->oplock_break_cancelled = false;
>> +
>> read_unlock(&GlobalSMBSeslock);
>> read_unlock(&cifs_tcp_ses_lock);
>> return true;
>
> This block of code looks problematic. This code is run by the
> cifs_demultiplex_thread (cifsd). We can't do an oplock_break_put in
> this context, since it might trigger a blocking SMB and cause a
> deadlock.

Okay, thanks for pointing it out.

> A while back, I backported this code to earlier kernels and used a
> standard workqueue there. What I did there was to only do the "get" if
> the queue_work succeeded, and then had the queued work take and
> immediately drop the GlobalSMBSeslock first thing. Yes, it's ugly, but
> it prevented the possible deadlock and didn't require adding anything
> like completion vars to the struct.

Hmmm... Why is locking GlobalSMBSeslock necessary?
cifs_oplock_break_get() can never fail and it seems that
is_valid_oplock_break() should be holding valid reference by the time
it enqueues the work, so wouldn't the following be sufficient?

if (queue_work(system_single_wq, &netfile->oplock_break))
cifs_oplock_break_get(netfile);

> Also, this change seems to have changed the logic a bit. The
> oplock_break_cancelled flag is being set to false unconditionally, and
> the printk was dropped. Not a big deal on the last part, but we can't
> really do much with errors in this codepath so it might be helpful to
> have some indication that there are problems here.

The thing is that slow_work_enqueue() can only fail if getting a
reference fails. In cifs' case, it always succeeds so there's no
failure case to handle there.

> Other than the above problems (which are easily fixable), this patch
> seems fine.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/