Re: [dm-crypt] dm-crypt: Performance Regression 2.6.37 ->2.6.38-rc8

From: Mario 'BitKoenig' Holbe
Date: Tue Mar 08 2011 - 14:24:28 EST


On Tue, Mar 08, 2011 at 06:35:01PM +0100, Milan Broz wrote:
> On 03/08/2011 05:45 PM, Mario 'BitKoenig' Holbe wrote:
> > dm-crypt in 2.6.38 changed to per-CPU workqueues to increase it's
> > performance by parallelizing encryption to multiple CPUs.
> > This modification seems to cause (massive) performance drops for
> > multiple parallel dm-crypt instances...
> Well, it depends. I never suggested this kind of workaround because
> you basically hardcoded (in device stacking) how many parallel instances
> (==cpu cores ideally) of dmcrypt can run effectively.

Yes. But it was the best to get :)

> With current design the IO is encrypted by the cpu which submitted it.
...
> If you use one dmcrypt instance over RAID0, you will now get probably
> much more better throughput. (Even with one process generating IOs
> the bios are, surprisingly, submitted on different cpus. But this time
> it runs really in parallel.)

Mh, not really. I just tested this with kernels fresh booted into
emergency and udev started to create device nodes:

# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo1 /dev/sdc
...
# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo4 /dev/sdf
# mdadm -B -l raid0 -n 4 -c 256 /dev/md/foo /dev/mapper/foo[1-4]
# dd if=/dev/md/foo of=/dev/null bs=1M count=20k

2.6.37: 291MB/s 2.6.38: 139MB/s

# mdadm -B -l raid0 -n 4 -c 256 /dev/md/foo /dev/sd[c-f]
# cryptsetup -c aes-xts-plain -s 256 -h sha256 -d /dev/urandom create foo /dev/md/foo
# dd if=/dev/mapper/foo of=/dev/null bs=1M count=20k

2.6.37: 126MB/s 2.6.38: 138MB/s

So... performance drops on .37 (as expected) and nothing changes on .38
(unlike expected).

Those results, btw., differ dramatically when using tmpfs-backed
loop-devices instead of hard disks:

raid0 over crypted loops:
2.6.37: 285MB/s 2.6.38: 324MB/s
crypted raid0 over loops:
2.6.37: 119MB/s 2.6.38: 225MB/s

Here we have indeed changing results - even if they are not what one
would expect.

All those constructs are read-only and hence can be tested on any
somewhat available block device. Setting devices read-only would
probably be a good idea to compensate being short on sleep or whatever.

> Maybe we can find some compromise but I basically prefer current design,
> which provides much more better behaviour for most of configurations.

Hmmm...


regards
Mario
--
File names are infinite in length where infinity is set to 255 characters.
-- Peter Collinson, "The Unix File System"

Attachment: signature.asc
Description: Digital signature