Re: Regression, bisected: sqlite locking failure on nfs

From: Trond Myklebust
Date: Mon Nov 01 2010 - 15:24:11 EST


On Mon, 2010-11-01 at 14:30 -0400, Chuck Lever wrote:
> On Nov 1, 2010, at 2:19 PM, Nick Bowler wrote:
>
> > On 2010-11-01 14:07 -0400, Chuck Lever wrote:
> >> On Nov 1, 2010, at 1:58 PM, Nick Bowler wrote:
> >>> After installing 2.6.37-rc1, attempting to use sqlite in any capacity on
> >>> NFS gives a locking error:
> >>>
> >>> % echo 'select * from blah;' | sqlite3 blah.sqlite
> >>> Error: near line 1: database is locked
> >>>
> >>> % echo 'create table blargh(INT);' | sqlite3 blargh.sqlite
> >>> Error: near line 1: database is locked
> >>>
> >>> The result is that a lot of high-profile applications which make use of
> >>> sqlite fail mysteriously. Bisection reveals the following, and
> >>> reverting the implicated commit solves the issue:
> >>
> >> Nick, thanks for the report. Is 2.6.37-rc1 running on your clients or
> >> on your server?
> >
> > Sorry for not being clear: the client is running 2.6.37-rc1. The
> > server is running RHEL 5.5.
> >
> >> Does anything interesting appear in the kernel log when your test case
> >> fails?
> >
> > There are no unusual messages on the client... but I just logged into
> > the server and I see lots of messages of the following form:
> >
> > nfsd: request from insecure port (192.168.8.199:35766)!
> > nfsd: request from insecure port (192.168.8.199:35766)!
> > nfsd: request from insecure port (192.168.8.199:35766)!
> > nfsd: request from insecure port (192.168.8.199:35766)!
> > nfsd: request from insecure port (192.168.8.199:35766)!
> >
> > (192.168.8.199 is the address of the failing client). I can only assume
> > that these are a result of my recent issues, since I don't have access
> > to the system log (with timestamps) on that machine.
>
> That's the problem this patch is supposed to prevent. I'll investigate further.
>

I suspect nlmclnt_lookup_host() is to blame. It appears to be the _only_
thing in the kernel that actually sets this 'srcaddr' field, and it sets
it to

const struct sockaddr source = {
.sa_family = AF_UNSPEC,
};

You triggered the bug by removing the line

transport->srcaddr.ss_family = family;

from xs_create_sock().

Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/