Re: [PATCH] bsg: call idr_pre_get() outside the lock.

From: Naohiro Aota
Date: Sat Sep 18 2010 - 23:12:04 EST


Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:

> On Wed, 01 Sep 2010 23:26:01 +0900
> Naohiro Aota <naota@xxxxxxxxx> wrote:
>
>> The idr_pre_get() kernel-doc says "This function should be called
>> prior to locking and calling the idr_get_new* functions.", but the
>> idr_pre_get() calling in bsg_register_queue() is put inside
>> mutex_lock(). Let's fix it.
>>
>
> The idr_pre_get() kerneldoc is wrong. Or at least, misleading.
>
> The way this all works is that we precharge the tree via idr_pre_get()
> and we do this outside locks so we can use GFP_KERNEL. Then we take
> the lock (a spinlock!) and then try to add an element to the tree,
> which will consume objects from the pool which was prefilled by
> idr_pre_get().
>
> There's an obvious race here where someone else can get in and steal
> objects from the prefilled pool. We can fix that with a retry loop:
>
>
> again:
> if (idr_pre_get(..., GFP_KERNEL) == NULL)
> return -ENOMEM; /* We're really out-of-memory */
> spin_lock(lock);
> if (idr_get_new(...) == -EAGAIN) {
> spin_unlock(lock);
> goto again; /* Someone stole our preallocation! */
> }
> ...
>
> this way we avoid the false -ENOMEM which the race would have caused.
> We only declare -ENOMEM when we're REALLY out of memory.
>
>
> But none of this is needed when a sleeping lock is used (as long as the
> sleeping lock isn't taken on the the VM pageout path, of course). In
> that case we can use the sleeping lock to prevent the above race.
>
>> diff --git a/block/bsg.c b/block/bsg.c
>> index 82d5882..5fd8dd1 100644
>> --- a/block/bsg.c
>> +++ b/block/bsg.c
>> @@ -1010,13 +1010,11 @@ int bsg_register_queue(struct request_queue *q, struct device *parent,
>> bcd = &q->bsg_dev;
>> memset(bcd, 0, sizeof(*bcd));
>>
>> - mutex_lock(&bsg_mutex);
>> -
>> ret = idr_pre_get(&bsg_minor_idr, GFP_KERNEL);
>> - if (!ret) {
>> - ret = -ENOMEM;
>> - goto unlock;
>> - }
>> + if (!ret)
>> + return -ENOMEM;
>> +
>> + mutex_lock(&bsg_mutex);
>>
>> ret = idr_get_new(&bsg_minor_idr, bcd, &minor);
>> if (ret < 0)
>
> So the old code was OK.
>
> The new code, however, is not OK because it is vulnerable to the above
> race wherein another CPU or thread comes in and steals all of this
> thread's preallocation.

I see. Thank you for your comment.

> The idr_pre_get() kerneldoc is wrong. Or at least, misleading.

Then we can fix the kerneldoc like this?

----