Re: [PATCH 1/1] block: CFQ refcounting fix

From: Brian King
Date: Wed Aug 31 2005 - 09:00:00 EST


Jens Axboe wrote:
> On Wed, Aug 31 2005, Brian King wrote:
>
>>Jens Axboe wrote:
>>
>>>On Tue, Aug 30 2005, brking@xxxxxxxxxx wrote:
>>>
>>>
>>>>I ran across a memory leak related to the cfq scheduler. The cfq
>>>>init function increments the refcnt of the associated request_queue.
>>>>This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
>>>>only calls the elevator exit function when its refcnt goes to zero, the
>>>>request_q never gets cleaned up. It didn't look like other io schedulers were
>>>>incrementing this refcnt, so I removed the refcnt increment and it fixed the
>>>>memory leak for me.
>>>>
>>>>To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
>>>>attribute to scan "- - -" repeatedly on a scsi host and watch the memory
>>>>vanish.
>>>
>>>
>>>Yeah, that actually looks like a dangling reference. I assume you tested
>>>this properly?
>>
>>Yes. I applied the patch, booted my system (which was crashing on
>>bootup before due to out of memory errors due to the leak) ran the
>>scan a few times and verified /proc/meminfo didn't continually
>>decrease like without it, and rebooted again. If there is anything
>>else you would like me to do, I would be happy to do so.
>
>
> I think you need to remove the blk_put_queue() in cfq_put_cfqd() as
> well, otherwise I don't see how this can work without looking at freed
> memory. I'll audit the other paths as well.

Good catch. Here is an updated patch.


--
Brian King
eServer Storage I/O
IBM Linux Technology Center

I ran across a memory leak related to the cfq scheduler. The cfq
init function increments the refcnt of the associated request_queue.
This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
only calls the elevator exit function when its refcnt goes to zero, the
request_q never gets cleaned up. It didn't look like other io schedulers were
incrementing this refcnt, so I removed the refcnt increment and it fixed the
memory leak for me.

To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
attribute to scan "- - -" repeatedly on a scsi host and watch the memory
vanish.

Signed-off-by: Brian King <brking@xxxxxxxxxx>
---

linux-2.6-bjking1/drivers/block/cfq-iosched.c | 3 ---
1 files changed, 3 deletions(-)

diff -puN drivers/block/cfq-iosched.c~cfq_refcnt_fix drivers/block/cfq-iosched.c
--- linux-2.6/drivers/block/cfq-iosched.c~cfq_refcnt_fix 2005-08-30 17:26:55.000000000 -0500
+++ linux-2.6-bjking1/drivers/block/cfq-iosched.c 2005-08-31 08:48:30.000000000 -0500
@@ -2260,8 +2260,6 @@ static void cfq_put_cfqd(struct cfq_data
if (!atomic_dec_and_test(&cfqd->ref))
return;

- blk_put_queue(q);
-
cfq_shutdown_timer_wq(cfqd);
q->elevator->elevator_data = NULL;

@@ -2318,7 +2316,6 @@ static int cfq_init_queue(request_queue_
e->elevator_data = cfqd;

cfqd->queue = q;
- atomic_inc(&q->refcnt);

cfqd->max_queued = q->nr_requests / 4;
q->nr_batching = cfq_queued;
_