Re: 2.2.8_andrea1.bz2

Zack Weinberg (zack@rabi.columbia.edu)
Thu, 13 May 1999 00:09:03 -0400


Andrea Arcangeli wrote:

> BTW, I seen the buffer.c changes of 2.2.8. You have killed (not fixed ;)
> flushtime. At least you could have removed also flushtime from the struct
> buffer_head to avoid wasting time in useless initializations ;). In 2.2.8
> `update` does only a little sync of ndirty buffers every 5 sec (the 5 sec
> depends by -f option of update). This is _not_ how things should go
> according to me. And syncing back inodes and superblock has to be done
> _only_ for integrity of the filesystem (not for the kernel stability) and
> so it's `update` that has to do that, not bdflush. If nobody will do that
> we'll run faster but if the system will crash with some filesystem mounted
> we'll be in troubles...
>
> I also don't agree with syncing back some dirty buffer every 5 sec via
> bdflush. That's sure _not_ the way to get performances. If the system is
> idle and nobody is going to grow dirty buffers there's no need to flush
> them to disk (unless they are very old, and the only reasons to flush old
> buffers is trying to get filesystem integrity after a crash without losing
> too much caching performances). And _only_ in the case we'll then go low
> on memory it's shrink_mmap that has to flush dirty buffers to disk.

Since that was my patch, allow me to defend it a little...

One of the design goals was to eliminate the need for the user-space
update process. It simply calls bdflush(1, 0) every five seconds,
which (in 2.2.7) would do almost the same thing as what kflushd does
when awakened by wakeup_bdflush. There is no particular reason why
kflushd can't wake up every five seconds itself. Does it make more
sense now?

Killing flushtime was an accident. I erroneously thought the inner
loop in sync_old_buffers() and the inner loop in bdflush() were
identical. [I swear I looked at them side-by-side and didn't see it!]
It is my personal opinion - and at least one other person concurs -
that there should be a hard limit on the length of time before any
dirty buffer is flushed back to disk. flushtime is a rational way to
achieve that. You casually brush off filesystem integrity (also
user data integrity; all writes go through the buffer cache) as
unimportant; that is a very strange priority to me.

I've sent the appended patch to Linus already; it addresses what I
think are the bugs in the 2.2.8 bdflush. I'd be interested to hear
what you think of it.

zw

p.s. I am not on linux-kernel. Please cc: me directly.

--- buffer.c.228 Wed May 12 18:23:34 1999
+++ buffer.c Wed May 12 18:39:03 1999
@@ -1580,7 +1580,7 @@
* with running out of free buffers for loop's "real" device).
*/

-static inline void sync_old_buffers(void)
+static inline void sync_old_buffers(int all)
{
int i;
int ndirty = 0;
@@ -1618,7 +1618,7 @@
bh = lru_list[BUF_DIRTY];
if(bh)
for (i = nr_buffers_type[BUF_DIRTY];
- i-- > 0 && ndirty < bdf_prm.b_un.ndirty;
+ i-- > 0 && (all || ndirty < bdf_prm.b_un.ndirty);
bh = next) {
/* We may have stalled while waiting for
I/O to complete. */
@@ -1644,22 +1644,31 @@
next->b_count++;
bh->b_count++;
ndirty++;
+ /* If not short of memory, flush only older buffers */
+ if (all && time_before(jiffies, bh->b_flushtime))
+ continue;
+#ifdef DEBUG
+ nwritten++;
+#endif
bh->b_flushtime = 0;
if (MAJOR(bh->b_dev) == LOOP_MAJOR) {
ll_rw_block(wrta_cmd,1, &bh);
wrta_cmd = WRITEA;
- if (buffer_dirty(bh))
+ if (buffer_dirty(bh)) {
--ndirty;
+#ifdef DEBUG
+ --nwritten;
+#endif
+ }
}
else
ll_rw_block(WRITE, 1, &bh);
bh->b_count--;
next->b_count--;
}
- /* If we didn't write anything, but there are still
- * dirty buffers, then make the next write to a
- * loop device to be a blocking write.
- * This lets us block--which we _must_ do! */
+ /* If we didn't write anything and there are dirty
+ * buffers, make the next write to a loop device be a
+ * blocking write. This lets us block--which we _must_ do! */
if (ndirty == 0
&& nr_buffers_type[BUF_DIRTY] > 0 && wrta_cmd != WRITE) {
wrta_cmd = WRITE;
@@ -1732,6 +1741,7 @@
{
long remaining = HZ * bdf_prm.b_un.interval;
struct task_struct *tsk = current;
+ int all;

/*
* We have a bare-bones task_struct, and really should fill
@@ -1774,11 +1784,13 @@
sync_supers(0);
sync_inodes(0);
remaining = HZ * bdf_prm.b_un.interval;
- }
+ all = 1;
+ } else
+ all = 0;

/* Keep flushing till there aren't very many dirty buffers */
do {
- sync_old_buffers();
+ sync_old_buffers(all);
} while(nr_buffers_type[BUF_DIRTY] > nr_buffers * bdf_prm.b_un.nfract/100);

wake_up(&bdflush_done);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/