Re: [PATCH v2] dax: fix NULL pointer in __dax_pmd_fault()

From: Dan Williams
Date: Tue Sep 22 2015 - 23:01:05 EST


On Tue, Sep 22, 2015 at 4:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Sep 22, 2015 at 02:25:19PM -0700, Dan Williams wrote:
>> On Tue, Sep 22, 2015 at 2:13 PM, Andrew Morton
>> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > On Tue, 22 Sep 2015 13:36:22 -0600 Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> wrote:
>> >
>> >> The following commit:
>> >>
>> >> commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for
>> >> DAX")
>> >>
>> >> moved some code in __dax_pmd_fault() that was responsible for zeroing
>> >> newly allocated PMD pages. The new location didn't properly set up
>> >> 'kaddr', though, so when run this code resulted in a NULL pointer BUG.
>> >>
>> >> Fix this by getting the correct 'kaddr' via bdev_direct_access().
>> >
>> > Why the heck didn't gcc warn?
>> >
>> > I had a fiddle:
>> >
>> > --- a/fs/dax.c~a
>> > +++ a/fs/dax.c
>> > @@ -529,15 +529,18 @@ int __dax_pmd_fault(struct vm_area_struc
>> > unsigned long pmd_addr = address & PMD_MASK;
>> > bool write = flags & FAULT_FLAG_WRITE;
>> > long length;
>> > - void __pmem *kaddr;
>> > + void *kaddr;
>> > pgoff_t size, pgoff;
>> > sector_t block, sector;
>> > unsigned long pfn;
>> > int result = 0;
>> >
>> > +// printk("%p\n", kaddr);
>> > +
>> > /* Fall back to PTEs if we're going to COW */
>> > if (write && !(vma->vm_flags & VM_SHARED))
>> > return VM_FAULT_FALLBACK;
>> > + printk("%p\n", kaddr);
>> > /* If the PMD would extend outside the VMA */
>> > if (pmd_addr < vma->vm_start)
>> > return VM_FAULT_FALLBACK;
>> >
>> > gcc warns about the first printk, but not about the second. So that
>> > "if (...) return ..." seems to have defeated gcc uninitialized-var
>> > detection. wtf?
>> >
>> >> --- a/fs/dax.c
>> >> +++ b/fs/dax.c
>> >> @@ -569,8 +569,20 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>> >> if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
>> >> goto fallback;
>> >>
>> >> + sector = bh.b_blocknr << (blkbits - 9);
>> >> +
>> >> if (buffer_unwritten(&bh) || buffer_new(&bh)) {
>> >> int i;
>> >> +
>> >> + length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn,
>> >> + bh.b_size);
>> >> + if (length < 0) {
>> >> + result = VM_FAULT_SIGBUS;
>> >> + goto out;
>> >> + }
>> >> + if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
>> >> + goto fallback;
>> >> +
>> >> for (i = 0; i < PTRS_PER_PMD; i++)
>> >> clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
>> >> wmb_pmem();
>> >
>> > hm, that's a lot of copy-n-paste. Do we really need to run
>> > bdev_direct_access() twice? Will `kaddr' and `pfn' change?
>> >
>>
>> They shouldn't change, but I'm working on a fix for handling the race
>> of unbinding the pmem device while that kaddr is in use (unbind
>> invalidates kaddr).
>
> Exactly what does "unbinding the pmem device" mean,

echo namespace0.0 > /sys/bus/nd/drivers/nd_pmem/unbind

> and why can
> (parts of) the pmem device "go away" when there are active
> references to it?

Normally we have outstanding i/o requests to hold off
blk_cleanup_queue(), but in the dax case we don't have any mechanism
(yet) to flag the queue as busy. I have some patches to add a
percpu_refcount for this purpose.

>
>> The proposal is a dax_map_bh()/dax_unmap_bh()
>> interface to temporarily pin the mapping around each usage.
>
> Which mapping? The bufferhead maps file offset to filesystem block
> addresses, so I'm not sure what problem you are actually refering
> to here...

The kaddr is coming from the devm_memremap() in the pmem driver that
gets unmapped after the device is released by the driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/