Re: get_fs_excl/put_fs_excl/has_fs_excl

From: Jamie Lokier
Date: Mon Apr 27 2009 - 10:51:25 EST

Next message: Davide Libenzi: "Re: Re-implement MCE log ring buffer as per-CPU ring buffer"
Previous message: Jan Kara: "[PATCH 6/8] vfs: Rename fsync_super() to sync_filesystem() (version 4)"
In reply to: Theodore Tso: "Re: get_fs_excl/put_fs_excl/has_fs_excl"
Next in thread: Theodore Tso: "Re: get_fs_excl/put_fs_excl/has_fs_excl"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Theodore Tso wrote:
> *) Do we only care about processes whose I/O priority is below the
> default? (i.e., either in the idle class, or in a low-priority
> best efforts class) What if the concern is a real-time process
> which is being blocked by a default I/O priority process taking its
> time while holding some fs-wide resource?
>
> If the answer to the previous question is no, it becomes more
> reasonable to consider bump the submission priority of the process
> in question to the highest priority "best efforts" level. After
> all, if this truly is a "filesystem-wide" resource, then no one is
> going to make forward progress relating to this block device unless
> and until the filesystem-wide lock is resolved. Also, if we don't
> allow this situation to return to userspace, presumably the
> kernel-code involved will only be writing to the block-device in
> question. (This might not be entirely true if in the case of the
> sendfile(2) syscall, but currently we can only read from
> filesystems with sendfile, and so presumably a filesystem would
> never call get_fs_excl why servicing a sendfile request.)
>
> *) Is implementing the bulk of this in the cfq scheduler really the
> best place to do this? To explore something completely different,
> what if the filesystem simply explicitly set I/O priority levels in
> its block I/O submissions, and provided optional callback functions
> which could be used by the page writeback routines to determine the
> appropriate I/O priority level that should be used given a
> particular filesystem and inode number. (That actually could be
> used to provide another cool function --- we could expose to
> userspace the concept that particular inode should always have its
> I/O go out with a higher priority, perhaps via chattr flag.)
>
> Basically, the argument here is that we already have the
> appropriate mechanism for ordering I/O requests, which is I/O
> priority mechanism, and the policy really needs to be set by the
> filesystem --- and it might be far more than just "do we have a
> filesystem-wide exclusive lock" or not.

Personally, I'm interested in the following:

- A process with RT I/O priority and RT CPU priority is reading
a series of files from disk. It should be very reliable at this.

- Other normal I/O priority and normal CPU priority processes are
reading and writing the disk.

I would like the first process to have a guaranteed minimum I/O
performance: it should continuously make progress, even when it needs
to read some file metadata which overlaps a page affected by the other
processes. I don't mind all the interference from disk head seeks and
so on, but I would like the I/O that the first process depends on to
have RT I/O priority - including when it's waiting on I/O initiated by
another process and the normal I/O priority queue is full.

So, I'm not exactly sure, but I think what I need for that is:

- I/O priority boosting (re-queuing in the elevator) to fix the
inversion when waiting on I/O which was previously queued with
normal I/O priority, and

- Task priority boosting when waiting on a filesystem resource
which is held by a normal priority task.

(I'm not sure if generic task priority boosting is already addressed to some
extent in the RT-PREEMPT Linux tree.)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Davide Libenzi: "Re: Re-implement MCE log ring buffer as per-CPU ring buffer"
Previous message: Jan Kara: "[PATCH 6/8] vfs: Rename fsync_super() to sync_filesystem() (version 4)"
In reply to: Theodore Tso: "Re: get_fs_excl/put_fs_excl/has_fs_excl"
Next in thread: Theodore Tso: "Re: get_fs_excl/put_fs_excl/has_fs_excl"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]