Write-back from inside FS - need suggestions

From: Artem Bityutskiy
Date: Fri Sep 28 2007 - 05:17:25 EST


Hi,

we are writing anew flash FS (UBIFS) and need some advise/suggestion.
Brief FS info and the code are available at
http://www.linux-mtd.infradead.org/doc/ubifs.html.

At any point of time we may have a plenty of cached stuff which have to
be written back later to the flash media: dirty pages an dirty inodes.
This is what we call "liability" - current set of dirty pages and
inodes UBIFS must be able to write back on demand.

The problem is that we cannot do accurate flash space accounting due
to several reasons:
1. Wastage - some smal random amount of flash space at ends or
eraseblocks cannot be used.
2. Compression - we do not know how well will the pages be compressed,
so we do not know how much flash space will they consume.

So, if our current liability is X, we do not know exactly how much
flash space (Y) it will take. All we can do is to introduce some
pessimistic, worst-case function Y = F(X). This pessimistic function
assumes that pages won't be compressible, and it assumes worst-case
wastage. In real life it is hardly going to happen, but possible.
The functiion is really bad and may lead to huge over-estimations
like 40%.

So, if we are, say, in ->prepare_write(), we have to decide whether
there is enough flash space to write-back this page later. We do not
want to fail with -ENOSPC when,say, pdflush writes the page back. So
we use our pessimistic function F(X) to decide whether we have enough
space or not. If there is a plenty of flash space, the F(X) says "yes",
and just we proceed. The question is what do we do if F(X) says "no"?

If we just return -ENOSPC, the flash space utilization becomes too
poor, just because F() is really rough. We do have space in most
real-life cases. All we have to do in this case is to lessen our
liability. IOW, we have to flush few dirty inodes/pages, then we'd
be able to proceed.

So my question is: how can we flush _few_ oldest dirty pages/inodes
while we are inside UBIFS (e.g., in ->prepare_write(), ->mkdir(),
->link(), etc)?

I failed to find VFS calls which would do this. Stuff like
sync_sb_inodes() is not exactly what we need. Should we implement
a similar function? Since we have to call it from inside UBIFS, which
means we are holding i_mutex and the inode is locked, the function
has to be smart enough not to wait on this inode, but wait on other
inodes if needed.

A solution like kicking pdflush to do the job and wait on a waitqueue
would probably also work, but I'd prefer to do this from the context
of current task.

Should we have our own list of inodes and call write_inode_now() for
dirty ones? But I'd prefer to let VFS pick oldest victims.

So I'm asking for ideas which would work and be acceptable by the
community later.

Thanks!

--
Best Regards,
Artem Bityutskiy (ÐÑÑÑÐ ÐÐÑÑÑÐÐÐ)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/