Re: [PATCH v2] Pre-emption control for userspace

From: Khalid Aziz
Date: Tue Mar 25 2014 - 15:16:45 EST

Next message: Tejun Heo: "Re: [PATCH v4 0/4] ata: ahci_platform related cleanups"
Previous message: Wilfried Klaebe: "Re: [PATCH] staging: rtl8188eu: remove spaces, correct counts to unbreak P2P ioctls"
In reply to: Eric W. Biederman: "Re: [PATCH v2] Pre-emption control for userspace"
Next in thread: Eric W. Biederman: "Re: [PATCH v2] Pre-emption control for userspace"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/25/2014 12:59 PM, ebiederm@xxxxxxxxxxxx wrote:

Khalid Aziz <khalid.aziz@xxxxxxxxxx> writes:

This patch adds a way for a thread to request additional timeslice from
the scheduler if it is about to be preempted, so it could complete any
critical task it is in the middle of. .......

Let me see if I understand the problem. Your simulated application has
a ridiculous number of threads (1000) all contending for a single lock
with fairly long lock hold times between 600 and 20000 clocks assuming
no cache line misses. So 1000 threads contending for about 10usec or
1/100 of a tick when HZ=1000. Giving you something like 1 chance in
100 of being preempted while holding the lock. With 1000 threads
those sound like pretty bad odds.

This problem does not happen because threads are holding the lock for too long, rather it happens when a thread does a large number of things in its loop and one small part of it requires it to hold a lock. So it holds the lock for a very short time but what can happen is thread is executing in non-critical section of its loop, it finally gets to the critical section just as its timeslice is about to end, it grabs the lock and is pre-empted right away. Now we start building a convoy of threads that want the same lock. This problem can be avoided if the locking thread could be given additional time to complete its critical section, release the lock and yield the processor if it indeed was granted amnesty by the scheduler.

Either your test program is a serious exageration of what your userspace
is doing or this looks like an application design problem.

If you are referring to the test I included in the patch, that test is designed to exaggerate this problem just so it could test the functionality. It is nothing more than a test for this functionality so we can test for any regressions in future.

I am sorry no number of kernel patches can fix a stupid userspace
application, and what is worse it looks like this approach will make
the situation worse for applications that aren't stupid. Because they
will now suffer from much less predictability in how long they have to
wait for the cpu.

New code in scheduler kicks in only for the threads that have explicitly asked kernel to use this functionality by sending it the address of a shared memory location. There is a very quick bail out at the top of new resched routine to check for this and not affect every thread.

Maybe if this was limited to a cooperating set of userspace
tasks/threads this might not be too bad. As this exists I have users
who would hunt me down with malicious intent if this code ever showed up
on our servers, because it would make life for every other application
on the server worse.

Yes, it is indeed limited to a cooperating set of userspace tasks/threads. Tasks/threads will explicitly choose to use this feature. It is a no-op for every one else.

The only two sane versions of this I can see are (a) having the
scheduler write the predicted next preemption time into the vdso page so
your thread can yield preemptively before taking the lock if it doesn't
look like it has enough time,

This was discussed as an option when I posted first version of this patch and dismissed due to lack of reliability of such a data.

or (b) limiting this to just a small
cooperating set of threads in a single cgroup.

and that is almost what this patch does. It is not limited to a cgroup, rather to the tasks/threads that ask to use this feature.

As you have not limited the effects of this patch and as this will make
latencies worse for every other program on a system I think this is a
horrible approach. This really is not something you can do unless all
of the threads that could be affected are in the same code base, which
is definitely not the case here.

So for the general horrible idea.
Nacked-With-Extreme-Prejudice-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

Cooperative multitasking sucked in Windows 3.1 and it would be much
worse now. Please stop the crazy. Linux is challenging enough to
comprehend as it is, and I can't possibly see this patch makes anything
more predictable.

Eric

I completely agree with you on cooperative multitasking. I am definitely not trying to do anything like that.

Thanks,
Khalid

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tejun Heo: "Re: [PATCH v4 0/4] ata: ahci_platform related cleanups"
Previous message: Wilfried Klaebe: "Re: [PATCH] staging: rtl8188eu: remove spaces, correct counts to unbreak P2P ioctls"
In reply to: Eric W. Biederman: "Re: [PATCH v2] Pre-emption control for userspace"
Next in thread: Eric W. Biederman: "Re: [PATCH v2] Pre-emption control for userspace"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]