Re: [RFC] atomic highmem kmap page pinning

From: Minchan Kim
Date: Thu Mar 05 2009 - 17:23:56 EST

Next message: Dave Hansen: "Re: [RFC][PATCH 00/11] track files for checkpointability"
Previous message: Maurice Volaski: "e1000 is subtly incompatible with jumbo frames"
In reply to: Nicolas Pitre: "Re: [RFC] atomic highmem kmap page pinning"
Next in thread: Russell King - ARM Linux: "Re: [RFC] atomic highmem kmap page pinning"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Mar 5, 2009 at 1:57 PM, Nicolas Pitre <nico@xxxxxxx> wrote:
> On Thu, 5 Mar 2009, Minchan Kim wrote:
>
>> On Wed, 04 Mar 2009 21:37:43 -0500 (EST)
>> Nicolas Pitre <nico@xxxxxxx> wrote:
>>
>> > My assertion is that the cost is negligible. ÂThis is why I'm asking you
>> > why you think this is a big cost.
>>
>> Of course, I am not sure whether it's big cost or not.
>> But I thought it already is used in many fs, driver.
>> so, whether it's big cost depends on workload type .
>>
>> However, This patch is needed for VIVT and no coherent cache.
>> Is right ?
>>
>> If it is right, it will add unnessary overhead in other architecture
>> which don't have this problem.
>>
>> I think it's not desirable although it is small cost.
>> If we have a other method which avoids unnessary overhead, It would be better.
>> Unfortunately, I don't have any way to solve this, now.
>
> OK. ÂWhat about this patch then:

It looks good to me except one thing below.
Reviewed-by: MinChan Kim <minchan.kim@xxxxxxxxx>

> From c4db60c3a2395476331b62e08cf1f64fc9af8d54 Mon Sep 17 00:00:00 2001
> From: Nicolas Pitre <nico@xxxxxxx>
> Date: Wed, 4 Mar 2009 22:49:41 -0500
> Subject: [PATCH] atomic highmem kmap page pinning
>
> Most ARM machines have a non IO coherent cache, meaning that the
> dma_map_*() set of functions must clean and/or invalidate the affected
> memory manually before DMA occurs. ÂAnd because the majority of those
> machines have a VIVT cache, the cache maintenance operations must be
> performed using virtual
> addresses.
>
> When a highmem page is kunmap'd, its mapping (and cache) remains in place
> in case it is kmap'd again. However if dma_map_page() is then called with
> such a page, some cache maintenance on the remaining mapping must be
> performed. In that case, page_address(page) is non null and we can use
> that to synchronize the cache.
>
> It is unlikely but still possible for kmap() to race and recycle the
> virtual address obtained above, and use it for another page before some
> on-going cache invalidation loop in dma_map_page() is done. In that case,
> the new mapping could end up with dirty cache lines for another page,
> and the unsuspecting cache invalidation loop in dma_map_page() might
> simply discard those dirty cache lines resulting in data loss.
>
> For example, let's consider this sequence of events:
>
> Â Â Â Â- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
>
> Â Â Â Â--> Â Â - vaddr = page_address(page) is non null. In this case
> Â Â Â Â Â Â Â Âit is likely that the page has valid cache lines
> Â Â Â Â Â Â Â Âassociated with vaddr. Remember that the cache is VIVT.
>
> Â Â Â Â Â Â Â Â--> Â Â for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âinvalidate_cache_line(i);
>
> Â Â Â Â*** preemption occurs in the middle of the loop above ***
>
> Â Â Â Â- kmap_high() is called for a different page.
>
> Â Â Â Â--> Â Â - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
> Â Â Â Â Â Â Â Â Âis called. ÂThe pkmap_count value for the page passed
> Â Â Â Â Â Â Â Â Âto dma_map_page() above happens to be 1, so the page
> Â Â Â Â Â Â Â Â Âis unmapped. ÂBut prior to that, flush_cache_kmaps()
> Â Â Â Â Â Â Â Â Âcleared the cache for it. ÂSo far so good.
>
> Â Â Â Â Â Â Â Â- A fresh pkmap entry is assigned for this kmap request.
> Â Â Â Â Â Â Â Â ÂThe Murphy law says this pkmap entry will eventually
> Â Â Â Â Â Â Â Â Âhappen to use the same vaddr as the one which used to
> Â Â Â Â Â Â Â Â Âbelong to the other page being processed by
> Â Â Â Â Â Â Â Â Âdma_map_page() in the preempted thread above.
>
> Â Â Â Â- The kmap_high() caller start dirtying the cache using the
> Â Â Â Â Âjust assigned virtual mapping for its page.
>
> Â Â Â Â*** the first thread is rescheduled ***
>
> Â Â Â Â Â Â Â Â Â Â Â Â- The for(...) loop is resumed, but now cached
> Â Â Â Â Â Â Â Â Â Â Â Â Âdata belonging to a different physical page is
> Â Â Â Â Â Â Â Â Â Â Â Â Âbeing discarded !
>
> And this is not only a preemption issue as ARM can be SMP as well,
> making the above scenario just as likely. Hence the need for some kind
> of pkmap page pinning which can be used in any context, primarily for
> the benefit of dma_map_page() on ARM.
>
> This provides the necessary interface to cope with the above issue if
> ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
> unchanged.
>
> Signed-off-by: Nicolas Pitre <nico@xxxxxxxxxxx>
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b36b83b..cc61399 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -67,6 +67,25 @@ pte_t * pkmap_page_table;
>
> Âstatic DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);
>
> +/*
> + * Most architectures have no use for kmap_high_get(), so let's abstract
> + * the disabling of IRQ out of the locking in that case to save on a
> + * potential useless overhead.
> + */
> +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> +#define spin_lock_kmap() Â Â Â Â Â Â spin_lock_irq(&kmap_lock)
> +#define spin_unlock_kmap() Â Â Â Â Â spin_unlock_irq(&kmap_lock)
> +#define spin_lock_kmap_any(flags) Â Âspin_lock_irqsave(&kmap_lock, flags)
> +#define spin_unlock_kmap_any(flags) Âspin_unlock_irqrestore(&kmap_lock, flags)
> +#else
> +#define spin_lock_kmap() Â Â Â Â Â Â spin_lock(&kmap_lock)
> +#define spin_unlock_kmap() Â Â Â Â Â spin_unlock(&kmap_lock)
> +#define spin_lock_kmap_any(flags) Â Â\
> + Â Â Â do { spin_lock(&kmap_lock); (void)(flags); } while (0)
> +#define spin_unlock_kmap_any(flags) Â\
> + Â Â Â do { spin_unlock(&kmap_lock); (void)(flags); } while (0)
> +#endif
> +
> Âstatic void flush_all_zero_pkmaps(void)
> Â{
> Â Â Â Âint i;
> @@ -113,9 +132,9 @@ static void flush_all_zero_pkmaps(void)
> Â*/
> Âvoid kmap_flush_unused(void)
> Â{
> - Â Â Â spin_lock(&kmap_lock);
> + Â Â Â spin_lock_kmap();
> Â Â Â Âflush_all_zero_pkmaps();
> - Â Â Â spin_unlock(&kmap_lock);
> + Â Â Â spin_unlock_kmap();
> Â}
>
> Âstatic inline unsigned long map_new_virtual(struct page *page)
> @@ -145,10 +164,10 @@ start:
>
> Â Â Â Â Â Â Â Â Â Â Â Â__set_current_state(TASK_UNINTERRUPTIBLE);
> Â Â Â Â Â Â Â Â Â Â Â Âadd_wait_queue(&pkmap_map_wait, &wait);
> - Â Â Â Â Â Â Â Â Â Â Â spin_unlock(&kmap_lock);
> + Â Â Â Â Â Â Â Â Â Â Â spin_unlock_kmap();
> Â Â Â Â Â Â Â Â Â Â Â Âschedule();
> Â Â Â Â Â Â Â Â Â Â Â Âremove_wait_queue(&pkmap_map_wait, &wait);
> - Â Â Â Â Â Â Â Â Â Â Â spin_lock(&kmap_lock);
> + Â Â Â Â Â Â Â Â Â Â Â spin_lock_kmap();
>
> Â Â Â Â Â Â Â Â Â Â Â Â/* Somebody else might have mapped it while we slept */
> Â Â Â Â Â Â Â Â Â Â Â Âif (page_address(page))
> @@ -184,29 +203,59 @@ void *kmap_high(struct page *page)
> Â Â Â Â * For highmem pages, we can't trust "virtual" until
> Â Â Â Â * after we have the lock.
> Â Â Â Â */
> - Â Â Â spin_lock(&kmap_lock);
> + Â Â Â spin_lock_kmap();
> Â Â Â Âvaddr = (unsigned long)page_address(page);
> Â Â Â Âif (!vaddr)
> Â Â Â Â Â Â Â Âvaddr = map_new_virtual(page);
> Â Â Â Âpkmap_count[PKMAP_NR(vaddr)]++;
> Â Â Â ÂBUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 2);
> - Â Â Â spin_unlock(&kmap_lock);
> + Â Â Â spin_unlock_kmap();
> Â Â Â Âreturn (void*) vaddr;
> Â}
>
> ÂEXPORT_SYMBOL(kmap_high);
>
> +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> +/**
> + * kmap_high_get - pin a highmem page into memory
> + * @page: &struct page to pin
> + *
> + * Returns the page's current virtual memory address, or NULL if no mapping
> + * exists. ÂWhen and only when a non null address is returned then a
> + * matching call to kunmap_high() is necessary.
> + *
> + * This can be called from any context.
> + */
> +void *kmap_high_get(struct page *page)
> +{
> + Â Â Â unsigned long vaddr, flags;
> +
> + Â Â Â spin_lock_kmap_any(flags);
> + Â Â Â vaddr = (unsigned long)page_address(page);
> + Â Â Â if (vaddr) {
> + Â Â Â Â Â Â Â BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
> + Â Â Â Â Â Â Â pkmap_count[PKMAP_NR(vaddr)]++;
> + Â Â Â }
> + Â Â Â spin_unlock_kmap_any(flags);
> + Â Â Â return (void*) vaddr;
> +}
> +#endif

Let's add empty function for architecture of no ARCH_NEEDS_KMAP_HIGH_GET,

> +
> Â/**
> Â* kunmap_high - map a highmem page into memory
> Â* @page: &struct page to unmap
> + *
> + * If ARCH_NEEDS_KMAP_HIGH_GET is not defined then this may be called
> + * only from user context.
> Â*/
> Âvoid kunmap_high(struct page *page)
> Â{
> Â Â Â Âunsigned long vaddr;
> Â Â Â Âunsigned long nr;
> + Â Â Â unsigned long flags;
> Â Â Â Âint need_wakeup;
>
> - Â Â Â spin_lock(&kmap_lock);
> + Â Â Â spin_lock_kmap_any(flags);
> Â Â Â Âvaddr = (unsigned long)page_address(page);
> Â Â Â ÂBUG_ON(!vaddr);
> Â Â Â Ânr = PKMAP_NR(vaddr);
> @@ -232,7 +281,7 @@ void kunmap_high(struct page *page)
> Â Â Â Â Â Â Â Â */
> Â Â Â Â Â Â Â Âneed_wakeup = waitqueue_active(&pkmap_map_wait);
> Â Â Â Â}
> - Â Â Â spin_unlock(&kmap_lock);
> + Â Â Â spin_unlock_kmap_any(flags);
>
> Â Â Â Â/* do wake-up, if needed, race-free outside of the spin lock */
> Â Â Â Âif (need_wakeup)
>

--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dave Hansen: "Re: [RFC][PATCH 00/11] track files for checkpointability"
Previous message: Maurice Volaski: "e1000 is subtly incompatible with jumbo frames"
In reply to: Nicolas Pitre: "Re: [RFC] atomic highmem kmap page pinning"
Next in thread: Russell King - ARM Linux: "Re: [RFC] atomic highmem kmap page pinning"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]