Re: UTF-8 and case-insensitivity

From: Andy Lutomirski
Date: Tue Feb 17 2004 - 20:12:29 EST


Linus Torvalds wrote:
int magic_open(
/* Input arguments */
const char *pathname,
unsigned long flags,
mode_t mode,

/* output arguments */
int *fd,
struct stat *st,
int *successful_path_length);

ie the system call would:

- look up as far into the pathname (using _exact_ lookup) as possible
- return the error code of the last failure
- the "flags" could be extended so that you can specify that you mustn't traverse ".." or symlinks (ie those would count as failures)

but also:

- fill in the "struct stat" information for the last _successful_ pathname component.
- fill in the "fd" with a fd of the last _successful_ pathname component.
- tell how much of the pathname it could traverse.

Aside from just case-insensitivity, I imagine this could give lots of other benefits:

- file servers that don't want to follow symlinks can do it quickly.
- Apache could serve things like http://www.foo.com/a/b/c/d.php/e/f/g a lot faster.
- a flag to avoid traversing mountpoints could help someone
- a flag for root to see _through_ mountpoints would make it possible to clean up initramfs and such that got mounted over, or to do other useful and currently impossible tasks. (e.g. I could see what's under my devfs mount...)

I would be nice to see this added even if it's not the perfect solution for samba :)

BTW, here's a thought for solving samba's negative lookup problem:

int ugly_stat(char *pattern, struct stat *st, char *match_out)

Pattern would be some description of what the filename should look like. Something like:

- pattern is an array of slash-delimited groups of characters separated by nulls and terminated by two nulls. For example, ugly_stat("F/f\0O/o\0O/o\0\0", ...) finds a file called foo, case-insensitively in English, while ugly_stat("F\0i\0l\0e\011/22/33") finds "File" followed by either 11, 22, or 33.
- the dcache problem is easy: don't use it. All Andrew wants (I think) is proof that there is no such file or the name if there is one. Samba can cache it itself; I don't think the kernel should involve itself in trying to cache this.
- ugly_stat does not traverse directories -- that's why the slash trick is safe.
- st gets the stat data, and match_out gets the filename if any
- if there are multiple matches, one is arbitrarily selected.

If the file-system doesn't have specific support for this, then either VFS or the caller could emulate it (probably VFS -- it would avoid lots of syscalls).

Would ugly_stat + magic_open be sufficient?

--Andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/