Re: [PATCH -v4 3/4] lib, Make gen_pool memory allocator lockless

From: Huang Ying
Date: Thu Apr 28 2011 - 23:05:12 EST


On 04/29/2011 09:11 AM, Mathieu Desnoyers wrote:
> * Huang Ying (ying.huang@xxxxxxxxx) wrote:
>> Hi, Mathieu,
>>
>> Thanks for your comments.
>>
>> On 04/28/2011 10:37 PM, Mathieu Desnoyers wrote:
>>> * Huang Ying (ying.huang@xxxxxxxxx) wrote:
>> [snip]
>>>>
>>>> +/**
>>>> + * gen_pool_for_each_chunk - iterate over chunks of generic memory pool
>>>> + * @chunk: the struct gen_pool_chunk * to use as a loop cursor
>>>> + * @pool: the generic memory pool
>>>> + *
>>>> + * Not lockless, proper mutual exclusion is needed to use this macro
>>>> + * with other gen_pool function simultaneously.
>>>> + */
>>>> +#define gen_pool_for_each_chunk(chunk, pool) \
>>>> + list_for_each_entry_rcu(chunk, &(pool)->chunks, next_chunk)
>>>
>>> Is it just me or this macro is never used ? Maybe you should consider
>>> removing it.
>>
>> This macro is not used in this patch. But it is used in 4/4 of the
>> patchset to free the backing pages before destroy the pool.
>
> Depending on how frequently you want to use it, you might want to use
> list_for_each_entry_rcu directly rather than a macro wrapper. E.g. for
> 2-3 uses, adding a macro just obfuscates the code IMHO (e.g. you don't
> know it iterates on a RCU list by looking at the caller code).

Yes. gen_pool_for_each_chunk() is not a good wrapper. I just don't want
to expose too much implementation details to users, after all, we are
working on library code. Maybe something like below is better?

void gen_pool_for_each_chunk(struct gen_pool *pool, void (*func)(struct
gen_pool *pool, struct gen_pool_chunk *chunk)) {
rcu_read_lock();
list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
func(pool, chunk);
rcu_read_unlock();
}

>>
>> [snip]
>>>> @@ -108,43 +226,50 @@ EXPORT_SYMBOL(gen_pool_destroy);
>>>> * @size: number of bytes to allocate from the pool
>>>> *
>>>> * Allocate the requested number of bytes from the specified pool.
>>>> - * Uses a first-fit algorithm.
>>>> + * Uses a first-fit algorithm. Can not be used in NMI handler on
>>>> + * architectures without NMI-safe cmpxchg implementation.
>>>> */
>>>> unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>>>> {
>>>> - struct list_head *_chunk;
>>>> struct gen_pool_chunk *chunk;
>>>> - unsigned long addr, flags;
>>>> + unsigned long addr;
>>>> int order = pool->min_alloc_order;
>>>> - int nbits, start_bit, end_bit;
>>>> + int nbits, start_bit = 0, end_bit, remain;
>>>> +
>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>> + BUG_ON(in_nmi());
>>>> +#endif
>>>>
>>>> if (size == 0)
>>>> return 0;
>>>>
>>>> nbits = (size + (1UL << order) - 1) >> order;
>>>> -
>>>> - read_lock(&pool->lock);
>>>> - list_for_each(_chunk, &pool->chunks) {
>>>> - chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
>>>> + rcu_read_lock();
>>>> + list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>>> + if (size > atomic_read(&chunk->avail))
>>>> + continue;
>>>>
>>>> end_bit = (chunk->end_addr - chunk->start_addr) >> order;
>>>> -
>>>> - spin_lock_irqsave(&chunk->lock, flags);
>>>> - start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
>>>> - nbits, 0);
>>>> - if (start_bit >= end_bit) {
>>>> - spin_unlock_irqrestore(&chunk->lock, flags);
>>>> +retry:
>>>> + start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit,
>>>> + start_bit, nbits, 0);
>>>> + if (start_bit >= end_bit)
>>>> continue;
>>>> + remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
>>>> + if (remain) {
>>>> + remain = bitmap_clear_ll(chunk->bits, start_bit,
>>>> + nbits - remain);
>>>> + BUG_ON(remain);
>>>> + goto retry;
>>>> }
>>>>
>>>> addr = chunk->start_addr + ((unsigned long)start_bit << order);
>>>> -
>>>> - bitmap_set(chunk->bits, start_bit, nbits);
>>>> - spin_unlock_irqrestore(&chunk->lock, flags);
>>>> - read_unlock(&pool->lock);
>>>> + size = nbits << order;
>>>> + atomic_sub(size, &chunk->avail);
>>>> + rcu_read_unlock();
>>>
>>> I don't really like seeing a rcu_read_unlock() within a rcu list
>>> iteration (even if it comes right before a "return"). Doing:
>>>
>>> unsigned long addr = 0;
>>>
>>> rcu_read_lock();
>>> list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>> if (...)
>>> continue;
>>> ...
>>> addr = ...;
>>> break;
>>> }
>>> rcu_read_unlock();
>>> return addr;
>>>
>>> Would be more symmetric, and would remove one return path, which makes
>>> the code easier to modify in the future.
>>
>> Unlock in loop is common in Linux kernel. Sometimes it makes code
>> cleaner (but not always). Yes, for this case, we can avoid unlock in
>> loop easily. But for the next case it is not so clean.
>
> See comment below,
>
>>
>>>> return addr;
>>>> }
>>>> - read_unlock(&pool->lock);
>>>> + rcu_read_unlock();
>>>> return 0;
>>>> }
>>>> EXPORT_SYMBOL(gen_pool_alloc);
>>>> @@ -155,33 +280,73 @@ EXPORT_SYMBOL(gen_pool_alloc);
>>>> * @addr: starting address of memory to free back to pool
>>>> * @size: size in bytes of memory to free
>>>> *
>>>> - * Free previously allocated special memory back to the specified pool.
>>>> + * Free previously allocated special memory back to the specified
>>>> + * pool. Can not be used in NMI handler on architectures without
>>>> + * NMI-safe cmpxchg implementation.
>>>> */
>>>> void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>>>> {
>>>> - struct list_head *_chunk;
>>>> struct gen_pool_chunk *chunk;
>>>> - unsigned long flags;
>>>> int order = pool->min_alloc_order;
>>>> - int bit, nbits;
>>>> + int start_bit, nbits, remain;
>>>>
>>>> - nbits = (size + (1UL << order) - 1) >> order;
>>>> -
>>>> - read_lock(&pool->lock);
>>>> - list_for_each(_chunk, &pool->chunks) {
>>>> - chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>> + BUG_ON(in_nmi());
>>>> +#endif
>>>>
>>>> + nbits = (size + (1UL << order) - 1) >> order;
>
> you could add:
>
> remain = nbits;
>
>>>> + rcu_read_lock();
>>>> + list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>>> if (addr >= chunk->start_addr && addr < chunk->end_addr) {
>>>> BUG_ON(addr + size > chunk->end_addr);
>>>> - spin_lock_irqsave(&chunk->lock, flags);
>>>> - bit = (addr - chunk->start_addr) >> order;
>>>> - while (nbits--)
>>>> - __clear_bit(bit++, chunk->bits);
>>>> - spin_unlock_irqrestore(&chunk->lock, flags);
>>>> - break;
>>>> + start_bit = (addr - chunk->start_addr) >> order;
>
> You could turn this:
>
>>>> + remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
>>>> + BUG_ON(remain);
>>>> + size = nbits << order;
>>>> + atomic_add(size, &chunk->avail);
>
> into:
>
> remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
> size = nbits << order;
> atomic_add(size, &chunk->avail);
> break;
>
>
>>>> + rcu_read_unlock();
>>>
>>> Same comment as above apply here.
>>
>> It is harder to remove unlock in loop here. An extra variable should be
>> used to indicate that something is freed from the pool. Do you think it
>> is cleaner to just keep the unlock in loop here?
>>
>> Best Regards,
>> Huang Ying
>>
>>> + return;
>>> }
>>> }
>
> And turn this:
>
>>> - BUG_ON(nbits > 0);
>>> - read_unlock(&pool->lock);
>>> + rcu_read_unlock();
>>> + BUG();
>
> into:
>
> BUG_ON(remain);
> rcu_read_unlock();
>
> Does that look OK to you ? On the plus side, you end up having a single
> BUG_ON() in the function.

I am afraid this make code a little harder to be understood. Why do you
hate unlock in loop so much? It is common in kernel and I think most
kernel developers are familiar with it.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/