Re: [PATCH 1/1] mm: msync: fix issues of sys_msync on tmpfs

From: Zumeng Chen
Date: Wed Feb 01 2012 - 03:53:35 EST

ä 2012å01æ22æ 04:53, Hugh Dickins åé:
On Fri, 20 Jan 2012, Zumeng Chen wrote:

This patch fixes two issues as follows:
Two issues? You write of a cache aliasing issue, I don't see a second.
Hi Hugh,

I'm very sorry for delay reply due to the Spring Festival holiday.
Yes, I missed one due to just a cosmetic one :-)
For some filesystem with fsync == noop_fsync, there is not so much thing
to do, so sys_msync just passes by for all arches but some CPUs.

For some CPUs with cache aliases(dmesg|grep alias), it maybe has an issue,
which reported by msync test suites in ltp-full when the memory of memset
used by msync01 runs into cache alias randomly.

Consider the following scenario used by msync01 in ltp-full:
fildes = open(TEMPFILE, O_RDWR | O_CREAT, 0666))< 0);
.../* initialization fildes by write(fildes); */
addr = mmap(0, page_sz, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED,
fildes, 0);
/* set buf with memset */
memset(addr + OFFSET_1, 1, BUF_SIZE);

/* msync the addr before using, or MS_SYNC*/
msync(addr, page_sz, MS_ASYNC)
Actually, that MS_ASYNC msync() does nothing at all but validate its
arguments these days, even on filesystems with an effective fsync().
We write out dirty pages to disk in good time, msync() or not.
You are absolutely right here, although MS_SYNC is still
failure in ltp's msync01.

/* Tries to read fildes */
lseek(fildes, (off_t) OFFSET_1, SEEK_SET) != (off_t) OFFSET_1) {
nread = read(fildes, read_buf, sizeof(read_buf));
My grasp of cache aliasing issues is very poor, please excuse me if I'm
wrong; but I don't think a patch to msync() should be necessary at all
(and ltp's msync01 test is testing nothing more than args to msync()).
Agreed, you are right. It is not a good way to fix this issue
in mm level.
In the case of tmpfs, that read() system call above should route through
to mm/shmem.c do_shmem_file_read(), which contains these same lines as
the more common mm/filemap.c do_generic_file_read():

/* If users can be writing to this page using arbitrary
* virtual addresses, take care about potential aliasing
* before reading the page on the kernel side.
if (mapping_writably_mapped(mapping))
Yes, this is good start here, thanks Hugh.
The mapping_writably_mapped() test ought to be succeeding in this case
(please check with printk's, perhaps some change has messed that up),
so flush_dcache_page(page) should be called, and any aliasing issues
resolved before the data is copied back to userspace.
I guess we have to flush from INDEX_BASE to waysize instead
of only current page passed from msync.
I assume your problem is on MIPS, since you copied Ralf and linux-mips:
if the MIPS flush_dcache_page() is not working right, then you'll need
to follow up with them.
Since some MIPS CPUs have cache alias, so we have to
flush_dcache_range instead of flush_dcache_page to avoid
cache alias. I'll send to Ralf for more knowledges.



/* Then test the result */
if (read_buf[count] != 1) {

The test result is random too for CPUs with cache alias. So in this
situation, we have to flush the related vma to make sure the read is

Signed-off-by: Zumeng Chen<zumeng.chen@xxxxxxxxxxxxx>
mm/msync.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/mm/msync.c b/mm/msync.c
index 632df45..0021a7e 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -13,6 +13,14 @@
+/* Cache aliases should be taken into accounts when msync. */
+#ifdef cpu_has_dc_aliases
+#define CPU_HAS_CACHE_ALIAS cpu_has_dc_aliases

* MS_SYNC syncs the entire file - including mappings.
@@ -78,6 +86,28 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
file = vma->vm_file;
start = vma->vm_end;
+ /*
+ * For some filesystems with fsync == noop_fsync, msync just
+ * passes by but some CPUs.
+ * For CPUs with cache alias, msync has to flush the related
+ * vma explicitly to make sure data coherency between memory
+ * and cache, which includes MS_SYNC or MS_ASYNC. That is to
+ * say, cache aliases should not be an async factor, so does
+ * msync on other arches without cache aliases.
+ */
+ if (file&& file->f_op&& file->f_op->fsync == noop_fsync) {
+ flush_cache_range(vma, vma->vm_start,
+ vma->vm_end);
+ if (start>= end) {
+ error = 0;
+ goto out_unlock;
+ }
+ vma = find_vma(mm, start);
+ continue;
+ }
if ((flags& MS_SYNC)&& file&&
(vma->vm_flags& VM_SHARED)) {

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at