Re: [PATCH 06/13] writeback: bdi write bandwidth estimation

From: Peter Zijlstra
Date: Wed Nov 24 2010 - 09:13:03 EST


On Wed, 2010-11-24 at 21:46 +0800, Wu Fengguang wrote:
> On Wed, Nov 24, 2010 at 09:42:09PM +0800, Peter Zijlstra wrote:
> > On Wed, 2010-11-24 at 21:20 +0800, Wu Fengguang wrote:
> > > > (jiffies - bdi->write_bandwidth_update_time < elapsed)
> > >
> > > this will be true if someone else has _done_ overlapped estimation,
> > > otherwise it will equal:
> > >
> > > jiffies - bdi->write_bandwidth_update_time == elapsed
> > >
> > > Sorry the comment needs updating.
> >
> > Right, but its racy as hell..
>
> Yeah, for N concurrent dirtiers, plus the background flusher, only
> one is able to update write_bandwidth[_update_time]..

Wrong, nr_cpus are, they could all observe the old value before seeing
the update of the variable.

Why not something like the below, which keeps the stamps per bdi and
serializes on a lock (trylock, you only need a single updater at any one
time anyway):

probably got the math wrong, but the idea should be clear, you can even
add an explicit bdi_update_bandwidth_stamps() function which resets the
stamps to the current situation in order to skip periods of low
throughput (that would need to do spin_lock).

---
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 4ce34fa..de690c3 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -40,6 +40,7 @@ typedef int (congested_fn)(void *, int);
enum bdi_stat_item {
BDI_RECLAIMABLE,
BDI_WRITEBACK,
+ BDI_WRITTEN,
NR_BDI_STAT_ITEMS
};

@@ -88,6 +89,11 @@ struct backing_dev_info {

struct timer_list laptop_mode_wb_timer;

+ spinlock_t bw_lock;
+ unsigned long bw_time_stamp;
+ unsigned long bw_write_stamp;
+ int write_bandwidth;
+
#ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
struct dentry *debug_stats;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 027100d..a934fe9 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -661,6 +661,11 @@ int bdi_init(struct backing_dev_info *bdi)
bdi->dirty_exceeded = 0;
err = prop_local_init_percpu(&bdi->completions);

+ spin_lock_init(&bdi->bw_lock);
+ bdi->bw_time_stamp = jiffies;
+ bdi->bw_write_stamp = 0;
+ bdi->write_bandwidth = 100 << (20 - PAGE_SHIFT); /* 100 MB/s */
+
if (err) {
err:
while (i--)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b840afa..f3f5c24 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -219,6 +219,7 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
*/
static inline void __bdi_writeout_inc(struct backing_dev_info *bdi)
{
+ __inc_bdi_state(bdi, BDI_WRITTEN);
__prop_inc_percpu_max(&vm_completions, &bdi->completions,
bdi->max_prop_frac);
}
@@ -238,6 +239,35 @@ void task_dirty_inc(struct task_struct *tsk)
prop_inc_single(&vm_dirties, &tsk->dirties);
}

+void bdi_update_write_bandwidth(struct backing_dev_info *bdi)
+{
+ unsigned long time_now, write_now;
+ long time_delta, write_delta;
+ long bw;
+
+ if (!spin_try_lock(&bdi->bw_lock))
+ return;
+
+ write_now = bdi_stat(bdi, BDI_WRITTEN);
+ time_now = jiffies;
+
+ write_delta = write_now - bdi->bw_write_stamp;
+ time_delta = time_now - bdi->bw_time_stamp;
+
+ /* rate-limit, only update once every 100ms */
+ if (time_delta < HZ/10 || !write_delta)
+ goto unlock;
+
+ bdi->bw_write_stamp = write_now;
+ bdi->bw_time_stamp = time_now;
+
+ bw = write_delta * HZ / time_delta;
+ bdi->write_bandwidth = (bdi->write_bandwidth + bw + 3) / 4;
+
+unlock:
+ spin_unlock(&bdi->bw_lock);
+}
+
/*
* Obtain an accurate fraction of the BDI's portion.
*/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/