Re: [PATCH] mm: moving dirty pages balancing to pdfludh entirely

From: Bret Towe
Date: Tue Jul 04 2006 - 18:23:24 EST


On 7/4/06, Nikita Danilov <nikita@xxxxxxxxxxxxx> wrote:
Ananiev, Leonid I writes:
> Nikita Danilov writes:
> > When queue is congested---it is, because meta-data (indirect blocks in
> > ext[23] case) have to be read in synchronously before page can be
> paged
> > out (see comment in mm/vmscan.c:pageout()).
>
> Actually a queue is congested ---it is, because the queue is too long or
> bit BDI_write[read]_congested is set.
>
> > But much more importantly: when direct reclaim skips writing dirty
> pages
> > from tail of the inactive list,
>
> The direct reclaim does not skip any page in pdflush thread because
> may_write_to_queue() returns true
> if (current->flags & PF_SWAPWRITE) and: __pdflush() sets this flag:
> See pfflush.c: __pdflush() first line
> current->flags |= PF_FLUSHER | PF_SWAPWRITE;

Hm.. indeed it is. But this is quite strange. This means, that if some
device queues are congested, pdflush threads will be stuck waiting for
these queues to drain, and as there is only limited number of pdflush
threads in the system, write-out to the non-congested devices will not
progress too.

Doing large amounts of writeback from pdflush threads makes situation
only worse: suppose you have more than MAX_PDFLUSH_THREADS devices on
the system, and large number of writing threads. If some devices become
congested, then *all* pdflush threads may easily stuck waiting on queue
congestion and there will be no IO going on against other devices. This
would be especially bad, if system is a mix of slow and fast devices.

In the original code, threads writing into fast devices are not impacted
by congestion of slow devices.

are you sure about that? cause that sounded alot like an issue
i saw with slow usb devices (mainly a usb hd on a usb 1.1 connection)
the usb device would fill up with write queue and local io to say /dev/hda
would basicly stop and the system would be rather useless till the usb
hd would finish writing out what it was doing
usally would take several hundred megs of data to get it to do it

ive not tried it in ages so maybe its been fixed since ive last tried it
dont recall the kernel version at the time but it wasnt more than a
year ago

You can deal with that particular situation in your patch by checking
return value of

pdflush_operation(background_writeout, 0);

and falling back to synchronous write-back if it fails to find worker
thread.

>
> > Wouldn't this interfere with current->backing_dev_info logic?
> > Maybe pdflush threads should set this field too?
> It is not need to set current->backing_dev_info for pdflush because

Yes, that was silly proposal. I think your patch contains very useful
idea, but it cannot be applied to all file systems. Maybe
wait-for-pdflush can be made optional, depending on the file system
type?

> PF_SWAPWRITE is set for pdflush.
> The proposed patch does not concern of backing_dev_info logic.
>
> Leonid

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/