Re: 4.7.0-rc7 ext4 error in dx_probe

From: Darrick J. Wong
Date: Mon Aug 08 2016 - 12:56:56 EST


On Mon, Aug 08, 2016 at 12:08:18PM -0400, Theodore Ts'o wrote:
> On Sun, Aug 07, 2016 at 11:28:10PM -0700, Darrick J. Wong wrote:
> >
> > I have one lingering concern -- is it a bug that two processes could be
> > computing the checksum of a buffer simultaneously? I would have thought ext4
> > would serialize that kind of buffer_head access...
>
> Do we know how this is happening? We've always depended on the VFS to
> provide this exclusion. The only way we should be modifying the
> buffer_head at the same time if two CPU's are trying to modify the
> directory at the same time, and that should _never_ be happening, even
> with the new directory parallism code, unless the file system has
> given permission and intends to do its own fine-grained locking.

It's a combination of two things, I think. The first is that the
checksum calculation routine (temporarily) set the checksum field to
zero during the computation, which of course is a no-no. The patch
fixes that problem and should go in.

The second problem is that we now can have multiple lookups at the same
time, which means that there can be more than one CPU calling into
dx_probe on the same directory blocks at the same time. There isn't any
locking on the buffer heads between readers, so we can end up with
ext4_read_dirblock racing with itself to verify the block. It's perhaps
a little inefficient for multiple threads to be checksumming the same
block, but only turns deadly if you combine it with the first problem.

--D

>
> - Ted