Re: [PATCH] x86_64: fix delayed signals

From: Linus Torvalds
Date: Sat Jul 12 2008 - 16:48:52 EST




On Sat, 12 Jul 2008, Török Edwin wrote:
>
> A bit off-topic, but something I noticed during the tests:
> In my original test I have rm-ed the files right after launching dd in
> the background, yet it still continued to write to the disk.
> I can understand that if the file is opened O_RDWR, you might seek back
> and read what you wrote, so Linux needs to actually do the write,
> but why does it insist on writing to the disk, on a file opened with
> O_WRONLY, after the file itself got unlinked?

Linux itself doesn't insist on writing to disk. In fact, at least with
traditional UNIX filesystems (eg minix, ext2) the deleted writes would be
undone.

But some filesystems can't just invalidate dirty buffers (some won't do it
for meta-data, others won't do it for _any_ data). So again, this
behaviour depends on the filesystem. And sadly, the more "advanced"
filesystem, the worse it usually behaves here.


> I have my filesystems mounted as noatime already.
> But yes, I am using different filesystems, the x86-64 box has reiserfs,
> and the x86-32 box has xfs.
>
> > You can try to limit the amount of dirty data in flight by tweaking
> > /proc/sys/vm/dirty*ratio
>
> I have these in my /etc/rc.local:
> echo 5 > /proc/sys/vm/dirty_background_ratio
> echo 10 >/proc/sys/vm/dirty_ratio

That matches the modern defaults. You can try playing with them if you
want to. And yes, it's worth testing nr_requests too.

> > Ok, that is definitel not related to signals at all. You're simply stuck
> > waiting for IO - or perhaps some fundamental filesystem semaphore which is
> > held while some IO needs to be flushed.
>
> AFAICT reiserfs still uses the BKL, could that explain why one I/O
> delays another?

The BKL should be ok in this respect - it gets automatically dropped when
doing synchronous waiting (this is somethign that will possibly go away as
we try to convince people to get rid of the BKL, but it certainly hasn't
happened yet).

So it actually gets worse with other locks - semaphores or mutexes - that
stay held over IO. And reiserfs has a journal lock (and a "commit" lock),
but I don't know how they are held and whether this could be part of the
issue.

> > This is also why your trace on just 'kill_pgrp' and 'detach_pid' is not
> > interesting. It's _normal_ to have a delay between them. It can happen
> > because the process blocks (or catches) signals, but it will also happen
> > if some system call waits for disk.
>
> Is there a way to trace what happens between those 2 functions?

You could try to trace not just those functions, but scheduling events
too. Or yes, do something special-caed.

Trying to figure out latencies in the block trace is likely also going to
be interesting (although you won't see any signal issues there - but any
long read latencies will automatically tend to imply latency issues not
just for signals, but for pretty much any operations).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/