Re: [Linux-cachefs] 3.0.3 64-bit Crash running fscache/cachefilesd

From: Mark Moseley
Date: Mon Sep 26 2011 - 17:02:16 EST


On Mon, Sep 26, 2011 at 4:32 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
> Mark Moseley <moseleymark@xxxxxxxxx> wrote:
>
>> I thought I'd be extra-helpful by getting that trace with a 3.0.4
>> kernel but got a completely different error this time (there was
>> nothing logged above this though). There was a
>> '__fscache_read_or_alloc_pages' crash for the previous boot too,
>> though it went for about 2.5 hours that time (with an empty cache
>> partition though).
>
> I'm fairly certain I know what the cause of this one is: Invalidation upon
> server change isn't handled correctly.  NFS tries to invalidate a file by
> discarding that file's attachment to the cache - without first clearing up the
> operations it has outstanding on the cache for that file.
>
> I'm working on adding formal invalidation at the moment.
>
> The attached patch may get you more precise information.  The first hunk is the
> main catcher.
>
> David
> ---
> diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
> index 9905350..48c63b8 100644
> --- a/fs/fscache/cookie.c
> +++ b/fs/fscache/cookie.c
> @@ -452,6 +452,13 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
>
>                _debug("RELEASE OBJ%x", object->debug_id);
>
> +               if (atomic_read(&object->n_reads)) {
> +                       spin_unlock(&cookie->lock);
> +                       printk(KERN_ERR "FS-Cache: Cookie '%s' still has outstanding reads\n",
> +                              cookie->def->name);
> +                       BUG();
> +               }
> +
>                /* detach each cache object from the object cookie */
>                spin_lock(&object->lock);
>                hlist_del_init(&object->cookie_link);
> diff --git a/fs/fscache/page.c b/fs/fscache/page.c
> index b8b62f4..f087051 100644
> --- a/fs/fscache/page.c
> +++ b/fs/fscache/page.c
> @@ -496,6 +496,7 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
>        if (fscache_submit_op(object, &op->op) < 0)
>                goto nobufs_unlock;
>        spin_unlock(&cookie->lock);
> +       ASSERTCMP(object->cookie, ==, cookie);
>
>        fscache_stat(&fscache_n_retrieval_ops);
>
> @@ -513,6 +514,26 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
>                goto error;
>
>        /* ask the cache to honour the operation */
> +       if (!object->cookie) {
> +               const char prefix[] = "fs-";
> +               printk(KERN_ERR "%sobject: OBJ%x\n",
> +                      prefix, object->debug_id);
> +               printk(KERN_ERR "%sobjstate=%s fl=%lx wbusy=%x ev=%lx[%lx]\n",
> +                      prefix, fscache_object_states[object->state],
> +                      object->flags, work_busy(&object->work),
> +                      object->events,
> +                      object->event_mask & FSCACHE_OBJECT_EVENTS_MASK);
> +               printk(KERN_ERR "%sops=%u inp=%u exc=%u\n",
> +                      prefix, object->n_ops, object->n_in_progress,
> +                      object->n_exclusive);
> +               printk(KERN_ERR "%sparent=%p\n",
> +                      prefix, object->parent);
> +               printk(KERN_ERR "%scookie=%p [pr=%p nd=%p fl=%lx]\n",
> +                      prefix, object->cookie,
> +                      cookie->parent, cookie->netfs_data, cookie->flags);
> +       }
> +       ASSERTCMP(object->cookie, ==, cookie);
> +
>        if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) {
>                fscache_stat(&fscache_n_cop_allocate_pages);
>                ret = object->cache->ops->allocate_pages(
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Ok, patched and running now. This same box was running 3.0.3 over the
weekend but it died without a stacktrace (and I had set it up to not
start cachefilesd on boot for the next boot). After I get the trace
for 3.0.4, I'll boot back into 3.0.3 and see if I can get that
previous trace again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/