RE: Bug 71331 - mlock yields processor to lower priority process

From: jimmie.davis
Date: Thu Mar 27 2014 - 00:21:15 EST




-----Original Message-----
From: Andy Lutomirski [mailto:luto@xxxxxxxxxxxxxx]
Sent: Wednesday, March 26, 2014 7:40 PM
To: Davis, Bud @ SSG - Link; umgwanakikbuti@xxxxxxxxx
Cc: oneukum@xxxxxxx; artem_fetishev@xxxxxxxx; peterz@xxxxxxxxxxxxx; kosaki.motohiro@xxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: Bug 71331 - mlock yields processor to lower priority process

On 03/21/2014 07:50 AM, jimmie.davis@xxxxxxxxxx wrote:
>
> ________________________________________
> From: Mike Galbraith [umgwanakikbuti@xxxxxxxxx]
> Sent: Friday, March 21, 2014 9:41 AM
> To: Davis, Bud @ SSG - Link
> Cc: oneukum@xxxxxxx; artem_fetishev@xxxxxxxx; peterz@xxxxxxxxxxxxx; kosaki.motohiro@xxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: Bug 71331 - mlock yields processor to lower priority process
>
> On Fri, 2014-03-21 at 14:01 +0000, jimmie.davis@xxxxxxxxxx wrote:
>
>> If you call mlock () from a SCHED_FIFO task, you expect it to return
>> when done. You don't expect it to block, and your task to be
>> pre-empted.
>
> Say some of your pages are sitting in an nfs swapfile orbiting Neptune,
> how do they get home, and what should we do meanwhile?
>
> -Mike
>
> Two options.
>
> #1. Return with a status value of EAGAIN.
>
> or
>
> #2. Don't return until you can do it.
>
> If SCHED_FIFO is used, and mlock() is called, the intention of the user is very clear. Run this task until
> it is completed or it blocks (and until a bit ago, mlock() did not block).
>
> SCHED_FIFO users don't care about fairness. They want the system to do what it is told.

I use mlock in real-time processes, but I do it in a separate thread.

Seriously, though, what do you expect the kernel to do? When you call
mlock on a page that isn't present, the kernel will *read* that page.
mlock will, therefore, block until the IO finishes.

Some time around 3.9, the behavior changed a little bit: IIRC mlock used
to hold mmap_sem while sleeping. Or maybe just mmap with MCL_FUTURE did
that. In any case, the mlock code is less lock-happy than it was. Is
it possible that you have two threads, and the non-mlock-calling thread
got blocked behind mlock, so it looked better?

--Andy

===================================================================================================================


Andy,

The example code submitted into bugzilla (chase back on the thread a bit, there is a reference) shows the problem.

Two threads, TaskA (high priority) and TaskB (low priority). Assigned to the same processor, explicitly for the guarantee that only one of them can execute at a time. TaskA becomes eligible to run. As part of its processing ( which the normal end is a call to sem_wait() ), it calls mlock(). TaskA then blocks, and TaskB begins running. But wait, the system is designed that TaskA will run until it is done (thus SCHED_FIFO and a priority less than TaskB). TaskA, a higher priority task is suspended and TaskB starts running. And in the code that lead me on this endeavor :) {consisting of a lot of Ada threads}, the result was a segfault due to half-processed data by TaskA.

This is what I call 'blocking'; the thread is no longer running and the scheduler puts someone else in the processor. I don't mean 'takes a long time until it returns'. Takes a long time is fine, the system design relies on priority based scheduling and cpu affinity to ensure ordered access to application data.

mlock() now blocks. I don't care how long mlock() takes, what I care about is the lower priority process pre-empting me. Only a limited number of syscalls block; those that do are documented and usually have a way to obtain blocking or non-blocking behavior.

Can I change the system to deal with mlock() being a blocking syscall ? Yes, but this is a situation where working code, that meets the API has stopped working.

Thanks for looking at it.

Regards,
Bud Davis






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/