Re: Badblocks and no free pages...

Doug Ledford (dledford@dialnet.net)
Sun, 4 May 1997 06:01:44 -0500 (CDT)


On 4 May 1997 colin@colin.pgp.com wrote:

> I'm running 2.0.30 on a 32 Meg Triton chipset system with two IDE
> drives. (Which I've recently configured to be on the different buses,
> one sharing with a seldom-used CD-ROM drive.)
>
> Anyway, someone, and I forget who, suggested that four concurrent
> badblocks invocations (one for each quarter of the disk) were a great
> way to shake out general I/O flakiness. So I fired it up on my
> secondary 2G disk (/dev/hdc), and then RTFM'd and realized what the "-w"
> flag meant. Oops. Good think I picked the harmless disk to trash.

That was I who suggested this. In the original suggestion, I did point
out that the script I posted would destroy the data on the partition, so
either use an unused disk, or have backups handy that are known to be
good.

> Anyway, badblocks writes patterns (0xAA, 0x55, 0xFF and 0x00) to the
> disk and the re-reads them, checking for errors. Four running in parallel
> slows down the disk, since it has to seek so much and builds up long I/O
> queues and really gives the I/O subsystem a hell of a workout.

Which was the intent of the operation. It really works better on SCSI
devices with tagged queueing enabled since they will order several write
to the same area together, then skip to another area, etc. In this way,
it reduces the number of seeks required tremendously.

> A few things I noticed. First, that is a *very* good way to elicit
> "Couldn't get a free page....." messages. I bumped /proc/sys/vm/freepages
> from the default 64/96/128 to 128/192/256 and it reduced, but did
> not eliminate these messages.

Yep.

> Second, while writing, the system is *amazingly* sluggish. I watched
> xterm refresh its window at a vertical speed of a few pixels per second.
> The mouse doesn't move. It takes a minute to switch consoles.
> It takes seconds to log in on a text console.

Yep. All four of the programs are trying their best to completely fill
the available buffer memory. In short, during the write phase, you are
RAM defficient unless you have gigabytes of RAM in your machine or are
running very small tests.

> This doesn't happen while reading back to verify, only while writing.
> Does anyone have any idea what's going on? Is the system clogged with
> write-behind buffers and thrashing, or something?

Yep.

> Can I report the following as bugs:
> - The fact that badblocks doesn't help prevent accidents is a bit unfortunate.
> - The fact that it generates "Couldn't get a free page" seems bad.
> In particular, why should this happen during writing? What needs to
> do an atomic page allocation?

It's all because we are filling up all available RAM with write behind
buffers. Whenever it can't get a free page, it simply waits for some to
become available. Not a bug really, just shows us that the program is
writing as fast as it can.

> - The unusable sluggishness of the machine is a bug.

It was never intended to be something that you would run during normal
usage, it's a shake down, tear the drives and controllers apart type test
that should be run when you are aware of what these types of tests do to
machine performance and are prepared to wait for it to finish before
actually trying to do anything :)

*****************************************************************************
* Doug Ledford * Unix, Novell, Dos, Windows 3.x, *
* dledford@dialnet.net 873-DIAL * WfW, Windows 95 & NT Technician *
* PPP access $14.95/month *****************************************
* Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
* communities. Sign-up online at * Web page creation and hosting, other *
* 873-9000 V.34 * services available, call for info. *
*****************************************************************************