Re: [RFC][PATCH] vmalloc: simplify vread()/vwrite()

From: Wu Fengguang
Date: Wed Jan 06 2010 - 22:21:42 EST


On Thu, Jan 07, 2010 at 10:57:36AM +0800, KAMEZAWA Hiroyuki wrote:
> On Thu, 7 Jan 2010 10:50:54 +0800
> Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
>
> > > > The changes are:
> > > > - remove the vmlist walk and rely solely on vmalloc_to_page()
> > > > - replace the VM_IOREMAP check with (page && page_is_ram(pfn))
> > > >
> > > > The VM_IOREMAP check is introduced in commit d0107eb07320b for per-cpu
> > > > alloc. Kame, would you double check if this change is OK for that
> > > > purpose?
> > > >
> > > I think VM_IOREMAP is for avoiding access to device configuration area and
> > > unexpected breakage in device. Then, VM_IOREMAP are should be skipped by
> > > the caller. (My patch _just_ moves the avoidance of callers to vread()/vwrite())
> >
> > "device configuration area" is not RAM, so testing of RAM would be
> > able to skip them?
> >
> Sorry, that's an area what I'm not sure.

I believe the fundamental correctness requirement would be: to avoid
accessing _PAGE_CACHE_UC/WC pages by building _PAGE_CACHE_WB mapping
to them(which kmap_atomic do), whether it be physical RAM or device
address area.

For example, /dev/mem allows access to
- device areas, with proper _PAGE_CACHE_* via ioremap_cache()
- _low_ physical RAM, which it directly reuse the 1:1 kernel mapping

> But, page_is_ram() implementation other than x86 seems not very safe...
> (And it seems that it's not defiend in some archs.)

Ah didn't know archs other than x86. And hotplug won't update e820 as
well..

> > >
> > > > The page_is_ram() check is necessary because kmap_atomic() is not
> > > > designed to work with non-RAM pages.
> > > >
> > > I think page_is_ram() is not a complete method...on x86, it just check
> > > e820's memory range. checking VM_IOREMAP is better, I think.
> >
> > (double check) Not complete or not safe?
> >
> I think not-safe because e820 doesn't seem to be updated.
>
> > EFI seems to not update e820 table by default. Ying, do you know why?

Ying just confirmed that the kernel didn't update e820 with EFI memmap
because the bootloader will fake one e820 table based on EFI memmap,
in order to run legacy/unmodified kernel.

>
> I hope all this kinds can be fixed by kernel/resource.c in generic way....
> Now, each archs have its own.

Agreed.

> > > > Even for a RAM page, we don't own the page, and cannot assume it's a
> > > > _PAGE_CACHE_WB page. So I wonder whether it's necessary to do another
> > > > patch to call reserve_memtype() before kmap_atomic() to ensure cache
> > > > consistency?
> > > >
> > > > TODO: update comments accordingly
> > > >
> > >
> > > BTW, f->f_pos problem on 64bit machine still exists and this patch is still
> > > hard to test. I stopped that because anyone doesn't show any interests.
> >
> > I'm using your patch :)
> >
> > I feel most inconfident on this patch, so submitted it for RFC first.
> > I'll then submit a full patch series including your f_pos fix.
> >
> Thank you, it's helpful.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/