Possible RAID6 regression with ASYNC_TX_DMA enabled in 4.1

From: Maxime Ripard
Date: Thu May 07 2015 - 09:00:10 EST


Hi,

I'm currently trying to add support for the PQ operations on the
marvell XOR engine, in dmaengine, obviously to be able to use async_tx
to offload these operations.

I'm testing these patches with a RAID6 array with 4 disks.

However, since the commit 59fc630b8b5f ("RAID5: batch adjacent full
stripe write", every write to that array fails with the following
stacktrace.

http://code.bulix.org/eh8iew-88342?raw

It seems to be generated by that warning here:

http://lxr.free-electrons.com/source/crypto/async_tx/async_tx.c#L173

And indeed, if we dump the status of depend_tx here, it's already been
acked.

That doesn't happen if ASYNC_TX_DMA is disabled, hence using the
software version of it, instead of relying on our XOR engine. It
doesn't happen on any commit prior to the one mentionned above, with
the exact same changes applied. These changes are meant to be
contributed, so I can definitely push them somewhere if needed.

I don't really know where to look for though, the change that is
causing this is probably the change in ops_run_reconstruct6, but I'm
not sure that this partial revert alone would work with regard to the
rest of the patch.

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

Attachment: signature.asc
Description: Digital signature