Re: [PATCH 2/9] mm: arm ready for split ptlock

From: Hugh Dickins
Date: Tue Oct 25 2005 - 01:32:52 EST


On Mon, 24 Oct 2005, Nicolas Pitre wrote:
> On Sat, 22 Oct 2005, Russell King wrote:
> > On Sat, Oct 22, 2005 at 05:22:20PM +0100, Hugh Dickins wrote:
> > > Signal handling's preserve and restore of iwmmxt context currently
> > > involves reading and writing that context to and from user space, while
> > > holding page_table_lock to secure the user page(s) against kswapd. If
> > > we split the lock, then the structure might span two pages, secured by
> > > different locks. That would be manageable; but it seems simpler just
> > > to read into and write from a kernel stack buffer, copying that out and
> > > in without locking (the structure is 160 bytes in size, and here we're
> > > near the top of the kernel stack). Or would the overhead be noticeable?
> >
> > Please contact Nicolas Pitre about that - that was my suggestion,
> > but ISTR apparantly the overhead is too high.
>
> Going through a kernel buffer will simply double the overhead. Let's
> suppose it should not be a big enough issue to stop the patch from being
> merged though (and it looks cleaner that way). However I'd like for the
> WARN_ON((unsigned long)frame & 7) to remain as both the kernel and user
> buffers should be 64-bit aligned.

Okay, thanks. I can submit a patch to restore the WARN_ON later
(not today). Though that seems very odd to me, can you explain? I can
understand that the original kernel context needs to be 64-bit aligned,
and perhaps the iwmmxt_task_copy copy of it (I explicitly align that
buffer). But I can't see why the saved copy in userspace would need
to be 64-bit aligned, if it's just __copy_to_user'ed and __copy_from_
user'ed. Or is it also accessed in some other, direct way?

As to the overhead, let's see if it's serious or not in practice:
let me know if you find it to be a significant slowdown - thanks.

> > > arm_syscall's cmpxchg emulation use pte_offset_map_lock, instead of
> > > pte_offset_map and mm-wide page_table_lock; and strictly, it should now
> > > also take mmap_sem before descending to pmd, to guard against another
> > > thread munmapping, and the page table pulled out beneath this thread.
> >
> > Now that I look at it, it's probably buggy - if the page isn't already
> > dirty, it will modify without the COW action. Again, please contact
> > Nicolas about this.
>
> I don't see how standard COW could not happen. The only difference with
> a true write fault as if we used put_user() is that we bypassed the data
> abort vector and the code to get the FAR value. Or am I missing
> something?

It's certainly not buggy in the way that I thought (and I believe rmk
was thinking): you are checking pte_write, correctly within the lock,
so COW shouldn't come into it at all - it'll only work if the page is
already writable by the user.

But then I'm puzzled by your reply, saying you don't see how standard
COW could not happen.

Plus it seems a serious limitation: mightn't this be an area of executable
text that it has to write into, but is most likely readonly? Or an area
of data made readonly by fork? And is the alignment assured, that the
long will fit in one page only?

The better way to do it, I think, would be to use ptrace's
access_process_vm (it is our own mm, but that's okay).

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/