Re: [PATCH] fix task dirty balancing

From: KAMEZAWA Hiroyuki
Date: Sat Jul 05 2008 - 01:58:50 EST


On Wed, 02 Jul 2008 22:27:18 +0200
Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:

> On Wed, 2008-07-02 at 17:26 +0900, YAMAMOTO Takashi wrote:
> > hi,
> >
> > task_dirty_inc doesn't seem to be called properly for
> > filesystems which don't use set_page_dirty for write(2).
> > eg. ext2 w/o nobh option.
>
> I'm thinking this is an ext2 bug. So I'd rather it'd just call
> set_page_dirty() like a proper filesystem instead of doing things like
> this.
>
> And I certainly don't like exporting task_dirty_inc() - filesystems and
> the like should not have to know about things like that.
>
Hmm, a bit complicated for me.

At first, there are 2 __set_page_dirty() in the kernel.
- mm/page-writeback.c: __set_page_dirty()
.... set_page_dirty() calls this.
- fs/buffer.c : __set_page_dirty()
.... __set_page_dirty_buffers() and mark_buffer_dirty() calls this.

Why per-task dirty acconitng is done in mm/page-writeback.c::set_page_dirty() ?

It seems other accounting is done in the fs/buffer.c: __set_page_dirty()

The purpose of task-dirty accounting is different from others ?

= fs/buffer.c
697 static int __set_page_dirty(struct page *page,
698 struct address_space *mapping, int warn)
699 {
700 if (unlikely(!mapping))
701 return !TestSetPageDirty(page);
702
703 if (TestSetPageDirty(page))
704 return 0;
705
706 write_lock_irq(&mapping->tree_lock);
707 if (page->mapping) { /* Race with truncate? */
708 WARN_ON_ONCE(warn && !PageUptodate(page));
709
710 if (mapping_cap_account_dirty(mapping)) {
711 __inc_zone_page_state(page, NR_FILE_DIRTY);
712 __inc_bdi_stat(mapping->backing_dev_info,
713 BDI_RECLAIMABLE);
714 task_io_account_write(PAGE_CACHE_SIZE);
715 }
716 radix_tree_tag_set(&mapping->page_tree,
717 page_index(page), PAGECACHE_TAG_DIRTY);
718 }
719 write_unlock_irq(&mapping->tree_lock);
720 __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
721
722 return 1;
==

And task-dirty-limit don't have to take care of following 2 case ?
- __set_page_dirty_nobuffers(struct page *page) (increment BDI_RECRAIMABLE)
- test_set_page_writeback() (increment BDI_RECLAIMABLE)

Thanks,
-Kame



> Of course I'm utterly ignorant of filesystems, hence lets include more
> clue-full people.
>
> > YAMAMOTO Takashi
> >
> >
> > Signed-off-by: YAMAMOTO Takashi <yamamoto@xxxxxxxxxxxxx>
> > ---
> >
> > commit e68f05bf56d0652c107bba1cff3f8491e41a2117
> > Author: YAMAMOTO Takashi <yamamoto@xxxxxxxxxxxxx>
> > Date: Wed Jul 2 16:17:33 2008 +0900
> >
> > fix dirty balancing for tasks.
> >
> > call task_dirty_inc when dirtying a page with mark_buffer_dirty.
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 4788a9e..2f1c7c6 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1219,8 +1219,9 @@ void mark_buffer_dirty(struct buffer_head *bh)
> > return;
> > }
> >
> > - if (!test_set_buffer_dirty(bh))
> > - __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0);
> > + if (!test_set_buffer_dirty(bh) &&
> > + __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0))
> > + task_dirty_inc(current);
> > }
> >
> > /*
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index bd91987..61d0aec 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -95,6 +95,7 @@ int wakeup_pdflush(long nr_pages);
> > void laptop_io_completion(void);
> > void laptop_sync_completion(void);
> > void throttle_vm_writeout(gfp_t gfp_mask);
> > +void task_dirty_inc(struct task_struct *);
> >
> > /* These are exported to sysctl. */
> > extern int dirty_background_ratio;
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 29b1d1e..4dc85d0 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -176,10 +176,11 @@ void bdi_writeout_inc(struct backing_dev_info *bdi)
> > }
> > EXPORT_SYMBOL_GPL(bdi_writeout_inc);
> >
> > -static inline void task_dirty_inc(struct task_struct *tsk)
> > +void task_dirty_inc(struct task_struct *tsk)
> > {
> > prop_inc_single(&vm_dirties, &tsk->dirties);
> > }
> > +EXPORT_SYMBOL_GPL(task_dirty_inc);
> >
> > /*
> > * Obtain an accurate fraction of the BDI's portion.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/