Re: [PATCH] libceph: ceph_pagelist_append might sleep whileatomic

From: Jim Schutt
Date: Tue May 14 2013 - 13:44:50 EST


On 05/14/2013 10:44 AM, Alex Elder wrote:
> On 05/09/2013 09:42 AM, Jim Schutt wrote:
>> Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
>> holding a lock, but it's spoiled because ceph_pagelist_addpage() always
>> calls kmap(), which might sleep. Here's the result:
>
> I finally took a close look at this today, Jim. Sorry
> for the delay.
>

No worries - thanks for taking a look.

> The issue is formatting the reconnect message--which will
> hold an arbitrary amount of data and therefore which we'll
> need to do some allocation (and kmap) for--in the face of
> having to hold the flock spinlock while doing so.
>
> And as you found, ceph_pagelist_addpage(), which is called
> by ceph_pagelist_append(), calls kmap() even if it doesn't
> need to allocate anything. This means that despite reserving
> the pages, those pages are in the free list and because they'll
> need to be the subject of kmap() their preallocation doesn't
> help.
>
> Your solution was to pre-allocate a buffer, format the locks
> into that buffer while holding the lock, then append the
> buffer contents to a pagelist after releasing the lock. You
> check for a changing (increasing) lock count while you format
> the locks, which is good.
>
> So... Given that, I think your change looks good. It's a shame
> we can't format directly into the pagelist buffer but this won't
> happen much so it's not a big deal. I have a few small suggestions,
> below.
>
> I do find some byte order bugs though. They aren't your doing,
> but I think they ought to be fixed first, as a separate patch
> that would precede this one. The bug is that the lock counts
> that are put into the buffer (num_fcntl_locks and num_flock_locks)
> are not properly byte-swapped. I'll point it out inline
> in your code, below.
>
> I'll say that what you have is OK. Consider my suggestions, and
> if you choose not to fix the byte order bugs, please let me know.

I'll happily fix up a v2 series with your suggestions addressed.
Thanks for catching those issues. Stay tuned...

Thanks -- Jim


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/