Re: [uClinux-dev] [PATCH 5/7] NOMMU: Avoiding duplicate icache flushes of shared maps

From: David Howells
Date: Tue Dec 15 2009 - 05:53:28 EST


Jamie Lokier <jamie@xxxxxxxxxxxxx> wrote:

> This looks like it won't work in the following sequence:
>
> process A maps MAP_SHARED, PROT_READ|PROT_EXEC (flushes icache)
> process B maps MAP_SHARED, PROT_READ|PROT_WRITE
> and proceeds to modify the data
> process C maps MAP_SHARED, PROT_READ|PROT_EXEC (no icache flush)

Assuming all the above refer to the same piece of RAM, there's no reason that
process A will will continue to function correctly executing from the first
mapping if process B writes to that RAM through the second mapping.

There's also no point doing an icache flush unless you first flush the dcache
back to the RAM - and we don't know to do that because the O/S does not know
whether the RAM has been changed. So we'd have to do an unconditional dcache
flush too for the entire RAM segment.

I'd prefer to leave this to the writers. If they're mad enough to write
shared code that undergoes runtime modification, and then want to run it on
NOMMU...

So my question back to you is: would it work anyway?

Note that some arches have a specific cache flushing system call. Perhaps
this should be extended to all.

> What about icache flushes in these cases:
>
> When using mprotect() PROT_READ|PROT_WRITE -> PROT_READ|PROT_EXEC,
> e.g. as an FDPIC implementation may do when updating PLT entries.

There is no mprotect() on NOMMU, at least not at the moment. It may be
reasonable to add support for someone turning on/off the PROT_EXEC and
PROT_WRITE bits, and make it flush dcache to RAM when WRITE is turned off, and
flush the icache when EXEC is turned on, in that order.

However, as Mike said, we don't do this in FDPIC. The code sections are
immutable blobs, and are mapped MAP_PRIVATE, PROT_READ|PROT_EXEC from the
start. That way, mmap() will share them for us and even do XIP without special
support in userspace. FDPIC uses a non-executable GOT in the data area, and
loads the function pointer and new GOT pointer out of it before making a call.

> And when calling msync(), like this:
>
> process A maps MAP_SHARED, PROT_READ|PROT_EXEC (flushes icache)
> process B maps MAP_SHARED, PROT_READ|PROT_WRITE
> and proceeds to modify the data
> process A calls msync()
> and proceeds to execute the modified contents

Similarly, we don't provide msync(). On NOMMU, memory mappings cannot be
shared from disks that aren't based direct-access (quasi-)memory (e.g. ramfs,
MTD).

We could, perhaps, partially implement msync() to flush the appropriate caches.
We might even be able to add extra flags to msync() so that it can flush just
the CPU caches - that would obviate the need for separate syscalls for this
purpose.

> Do you think the mprotect() and msync() calls should flush icache in
> those cases?

I don't see that msync() should flush the icache at all. It's purpose is to
flush data to the backing store.

Also, don't forget that under NOMMU conditions, you have no idea if the data
has been modified.

> But in the first example above, I don't see how process C could be
> expected to know it must flush icache, and process B could just be an
> "optimised with writable mmap" file copy, so it shouldn't have
> responsibility for icache either.

It's manually executing off of a MAP_SHARED region, a region that others have
open for write. It has to look after its own semantics. This applied too to
process A.

> If seen arguments for it, and arguments that the executing process can
> be expected to explicitly flush icache itself in those cases because
> it knows what it is doing. (Personally I lean towards the kernel
> should be doing it. IRIX interestingly offers both alternatives, with
> a PROT_EXEC_NOFLUSH).

I disagree, at least in the case of MAP_SHARED regions. You need to manage
your own coherency. Again, see process A vs process B.

> Or is icache fully flushed on every context switch on all nommu
> architectures anyway, and defined to do so?

That would be a sure performance killer, and, in any case, wouldn't help on an
SMP system.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/