Re: dedicated logging devices

From: Xuan Baldauf (technik--reiserfs@exmail.de)
Date: Fri Jun 09 2000 - 13:31:39 EST


Chris Meadors wrote:

> Yeah, at least on the RAID equipment I use basicly has some sort of
> standard RAM (SIMM/DIMM) that it uses for caching. There is an option to
> connect a battery to the controller. If the power fails while there is
> unwritten data in the cache it can be flushed to disk when the power
> returns.
>
> Using EEPROM memory as a logging device wouldn't be a good idea. These
> devices tend to have a limited number of rewrites and will begin to
> introduce bit errors as they wear out.
>
> Question: On a journaling file system do you want unwritten journal data
> written to disk after power is restored?

It is about reordering: if you have nvram, you underlying device (hd, raid, etc.) is
allowed to re-order the write requests as it chooses (and therefore gain
performance), even if the OS wanted it otherwise. It's an atomicy problem. AFAIK, on
journaling filesystems, you first write your metadata change to the log and the
write it to where it belongs to. If you have a crash inbetween, the transaction
information exists or does not exist, but if it exists, it exists as whole
(including all metadata changes). So if all changes are stored in the log, you can
replay the changes, and because the transaction consisted of all changes, or none,
you have always a consistent fs.

(Chris, please correct me if I'm wrong)

Now imagine this case: you do a rename, which may be traditionally following:

(1) remove the file from the old directory
(2) add the file to the new directory

If the crash is between (1) and (2), your file is lost. In journaling fs', the
picture is:

(1) write the transaction information
(2) remove the file from the old directory
(3) add the file to the new directory

So if the crash happens after (1), the action can be replayed, because it is stored
in the journal, if not, it will not be replayed, and your file is still in the old
directory.

But due to speed constraints, you drive or raid may want to reorder to:

(2) remove the file from the old directory
(1) write the transaction information
(3) add the file to the new directory

If now the power fails between (2) and (1), all the efforts of journaling are lost,
you have a corrupted fs and because it's a journaling filesystem, fsck even won't be
run. That is why (1) (the log data) has to be written. It's equal wether it will be
written before or after the crash, but it has to be written if (2) or (3) have been
written.

That's why your drive may not reorder (option 1) or it must replay the transaction
information (option 2). Otherwise you have a corrupted filesystem, because you may
have interrupted an atomic operation, which is against the definition of an atomic
operation. ;o)

Note: all the journaling is only done for metadata (in reiserfs), the real data
(files) is subject to the consequences of the crash. (The files are too short, or if
crashing in the middle of a HD write, some data may be wrong).

>
>
> I guess if the OS thinks it has hit the physical disk then it would be
> fine to commit it. So what if you didn't have a battery backed cache, and
> data the OS thought was on disk didn't make it?
>
> Should a journaled FS always have uncorrupt data in the files no matter
> what state the various caches are in?
>
> (I'm just trying to get my mind wrapped around this stuff, it is kinda new
> to me.)
>
> On Fri, 9 Jun 2000, Andreas Dilger wrote:
>
> > Xuan, you write:
> > > Is it reasonable to use the 50 bytes of /dev/nvram for logging or is it just
> > > too small?
> >
> > It is definitely too small to write any transactions there, although it
> > may be possible to use it for a bitmap of some sort (400 bits). However,
> > my understanding would be that the CMOS NVRAM would be much too slow to
> > use reasonably, and it only has a limited number of writes, so using it
> > for part of a filesystem will surely mean death for it. Correct me if
> > I'm wrong for modern motherboard NVRAM.
> >
> > The NVRAM that is being referred to is usually battery-backed RAM, so
> > it is very fast, can handle lots of write cycles, and has a fairly long
> > lifetime when the power is disconnected (uses rechargeable batteries).
> >
> > Cheers, Andreas
> >
>
> --
> ...and sometimes, late at night I get these twitches. Like dead people get.
> (Or, as I prefer to call them, perfect computer users)
> --The BOFH

Xuân. :o)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:20 EST