Re: [PATCH 01/10] mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN

From: Hyeonggon Yoo
Date: Wed Apr 06 2022 - 11:09:31 EST


On Wed, Apr 06, 2022 at 09:29:19AM +0200, Arnd Bergmann wrote:
> On Wed, Apr 6, 2022 at 1:59 AM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 05, 2022 at 02:57:49PM +0100, Catalin Marinas wrote:
> > > In preparation for supporting a dynamic kmalloc() minimum alignment,
> > > allow architectures to define ARCH_KMALLOC_MINALIGN independently of
> > > ARCH_DMA_MINALIGN. In addition, always define ARCH_DMA_MINALIGN even if
> > > an architecture does not override it.
> > >
> >
> > [ +Cc slab maintainer/reviewers ]
> >
> > I get why you want to set minimum alignment of kmalloc() dynamically.
> > That's because cache line size can be different and we cannot statically
> > know that, right?
> >
> > But I don't get why you are trying to decouple ARCH_KMALLOC_MINALIGN
> > from ARCH_DMA_MINALIGN. kmalloc'ed buffer is always supposed to be DMA-safe.
> >
> > I'm afraid this series may break some archs/drivers.
> >
> > in Documentation/dma-api-howto.rst:
> > > 2) ARCH_DMA_MINALIGN
> > >
> > > Architectures must ensure that kmalloc'ed buffer is
> > > DMA-safe. Drivers and subsystems depend on it. If an architecture
> > > isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
> > > the CPU cache is identical to data in main memory),
> > > ARCH_DMA_MINALIGN must be set so that the memory allocator
> > > makes sure that kmalloc'ed buffer doesn't share a cache line with
> > > the others. See arch/arm/include/asm/cache.h as an example.
> > >
> > > Note that ARCH_DMA_MINALIGN is about DMA memory alignment
> > > constraints. You don't need to worry about the architecture data
> > > alignment constraints (e.g. the alignment constraints about 64-bit
> > > objects).
> >
> > If I'm missing something, please let me know :)
>
> It helps in two ways:
>
> - you can start with a relatively large hardcoded ARCH_DMA_MINALIGN
> of 128 or 256 bytes, depending on what the largest possible line size
> is for any machine you want to support, and then drop that down to
> 32 or 64 bytes based on runtime detection. This should always be safe,
> and it means a very sizable chunk of wasted memory can be recovered.
>

I agree this part.

> - On systems that are fully cache coherent, there is no need to align
> kmallloc() allocations for DMA safety at all, on these, we can drop the
> size even below the cache line. This does not apply on most of the
> cheaper embedded or mobile SoCs, but it helps a lot on the machines
> you'd find in a data center.

Now I get the point. Thank you for explanation!
Going to review this series soon.

>
> Arnd

--
Thanks,
Hyeonggon